Query goal: | "Split a base table column into multiple columns based on the values in some other column. Each such newly created column has the name, a part of which is a data value from the original tables."(Bill Karwin) Find base tables that have more than one columns with the same type and field size and the difference between the columns are the year or month number at the end of the column name (two or four numbers, preceded by an underscore). |
Notes about the query: | The query assumes that the names of cloned columns differ from each other only by the numbers that are used in the names. In case of the string_agg function, the line break (br) tag is used as a part of the separator for the better readability in case the query result is displayed in a web browser. |
Query type: | Problem detection (Each row in the result could represent a flaw in the design) |
Query reliability: | Medium (Medium number of false-positive results) |
Query license: | MIT License |
Fixing suggestion: | Create a separate table based on the columns. |
Data source: | INFORMATION_SCHEMA only |
SQL query: | Click on query to copy it
WITH columns AS (SELECT c.table_schema, c.table_name, c.column_name, c.ordinal_position, regexp_replace(column_name, '(_[[:digit:]]{2}$|_[[:digit:]]{4}$)', '', 'g') AS column_name_stripped, CASE WHEN data_type='numeric' THEN data_type || '(' || numeric_precision || ',' || numeric_scale || ')' WHEN character_maximum_length IS NOT NULL THEN data_type || '(' || character_maximum_length || ')' WHEN datetime_precision IS NOT NULL AND data_type<>'date' THEN data_type || '(' || datetime_precision || ')' ELSE data_type END AS data_type FROM INFORMATION_SCHEMA.columns AS c INNER JOIN INFORMATION_SCHEMA.tables AS t USING (table_schema, table_name) WHERE t.table_type='BASE TABLE' AND c.table_schema NOT IN (SELECT schema_name FROM INFORMATION_SCHEMA.schemata WHERE schema_name<>'public' AND schema_owner='postgres' AND schema_name IS NOT NULL)), cloned as (SELECT table_schema, table_name, data_type, string_agg(column_name, ',' ORDER BY ordinal_position) AS columns, Count(*) AS number_of_columns FROM columns GROUP BY table_schema, table_name, column_name_stripped, data_type HAVING Count(*)>1) SELECT table_schema, table_name, Count(*) AS number_of_groups, Sum(number_of_columns) AS number_of_clones, string_agg(columns || ' ' || data_type, ';<br>') AS all_cloned_columns FROM cloned GROUP BY table_schema, table_name ORDER BY Sum(number_of_columns) DESC, Count(*) DESC, table_schema, table_name; |
Collection name | Collection description |
---|---|
Find problems automatically | Queries, that results point to problems in the database. Each query in the collection produces an initial assessment. However, a human reviewer has the final say as to whether there is a problem or not . |
Category name | Category description |
---|---|
Database design antipatterns | Queries of this category provide information about possible occurrences of SQL database design antipatterns. |
Result quality depends on names | Queries of this category use names (for instance, column names) to try to guess the meaning of a database object. Thus, the goodness of names determines the number of false positive and false negative results. |
Structure of base tables | Queries of this category provide information about the structuring of base tables at the database conceptual level |
Reference |
---|
This is one of the antipatterns from the Bill Karwin's book of SQL antipatterns. See Chapter 8: Metadata Tribbles. |
Sharma, T., Fragkoulis, M., Rizou, S., Bruntink, M. and Spinellis, D.: Smelly relations: measuring and understanding database schema quality. In: 40th International Conference on Software Engineering: Software Engineering in Practice, pp. 55–64. ACM, (2018). https://doi.org/10.1145/3183519.3183529 (Multicolumn attribute) |
Mistake (5): https://www.red-gate.com/simple-talk/sql/database-administration/five-simple--database-design-errors-you-should-avoid/ |
Dintyala, P., Narechania, A., Arulraj, J.: SQLCheck: automated detection and diagnosis of SQL anti-patterns. In: 2020 ACM SIGMOD International Conference on Management of Data, pp. 2331–2345. (2020). https://doi.org/10.1145/3318464.3389754 (Multi-Valued Attribute) |