Query goal: | Find base tables that implement recording multivalued attribute values with the help of repeating group of columns. Find base tables that have more than one columns with the same type and field size and the difference between the columns are the numbers in the column names (column1, column2, etc.). |
Notes about the query: | Query assumes that repeating columns have names that differ from each other only by numbers. In case of the string_agg function, the line break (br) tag is used as a part of the separator for the better readability in case the query result is displayed in a web browser. The query excludes columns that have year or month number at the end of the name. |
Query type: | Problem detection (Each row in the result could represent a flaw in the design) |
Query reliability: | Medium (Medium number of false-positive results) |
Query license: | MIT License |
Fixing suggestion: | Instead of repeating groups one should have multiple tables or should use a column with an array type. Create a dependent table or use a column with an array type. For instance, instead of table Photo(photo_id, name, tag1, tag2, tag3) create tables: Photo(photo_id, name) Photo_tag(photo_id, tag, seq) |
Data source: | INFORMATION_SCHEMA only |
SQL query: | Click on query to copy it
WITH columns AS (SELECT c.table_schema, c.table_name, c.column_name, c.ordinal_position, translate(c.column_name,'0123456789','') AS column_name_stripped, CASE WHEN data_type='numeric' THEN data_type || '(' || numeric_precision || ',' || numeric_scale || ')' WHEN character_maximum_length IS NOT NULL THEN data_type || '(' || character_maximum_length || ')' WHEN datetime_precision IS NOT NULL AND data_type<>'date' THEN data_type || '(' || datetime_precision || ')' ELSE data_type END AS data_type FROM INFORMATION_SCHEMA.columns AS c INNER JOIN INFORMATION_SCHEMA.tables AS t USING (table_schema, table_name) WHERE t.table_type='BASE TABLE' AND c.table_schema NOT IN (SELECT schema_name FROM INFORMATION_SCHEMA.schemata WHERE schema_name<>'public' AND schema_owner='postgres' AND schema_name IS NOT NULL) AND column_name=regexp_replace(column_name, '(_[[:digit:]]{2}$|_[[:digit:]]{4}$)', '', 'g')), cloned as (SELECT table_schema, table_name, data_type, string_agg(column_name, ',' ORDER BY ordinal_position) AS columns, Count(*) AS number_of_columns FROM columns GROUP BY table_schema, table_name, column_name_stripped, data_type HAVING Count(*)>1) SELECT table_schema, table_name, Count(*) AS number_of_groups, Sum(number_of_columns) AS number_of_columns, string_agg(columns || ' ' || data_type, ';<br>') AS all_columns FROM cloned GROUP BY table_schema, table_name ORDER BY Sum(number_of_columns) DESC, Count(*) DESC, table_schema, table_name; |
Collection name | Collection description |
---|---|
Find problems about base tables | A selection of queries that return information about the data types, field sizes, default values as well as general structure of base tables. Contains all the types of queries - problem detection, software measure, and general overview |
Find problems automatically | Queries, that results point to problems in the database. Each query in the collection produces an initial assessment. However, a human reviewer has the final say as to whether there is a problem or not . |
Category name | Category description |
---|---|
Comfortability of database evolution | Queries of this category provide information about the means that influence database evolution. |
Database design antipatterns | Queries of this category provide information about possible occurrences of SQL database design antipatterns. |
Result quality depends on names | Queries of this category use names (for instance, column names) to try to guess the meaning of a database object. Thus, the goodness of names determines the number of false positive and false negative results. |
Structure of base tables | Queries of this category provide information about the structuring of base tables at the database conceptual level |
Reference |
---|
This is one of the antipatterns from the Bill Karwin's book of SQL antipatterns. See Chapter 8: Multicolumn Attributes. |
Smell "Multicolumn attribute": Sharma, T., Fragkoulis, M., Rizou, S., Bruntink, M. and Spinellis, D.: Smelly relations: measuring and understanding database schema quality. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, pp. 55-64. ACM, (2018). |