Find, based on the data that users have recoreded in a database, cases where a multi-valued attribute in a conceptual data model is implemented as a textual column of a base table. Expected values in the column are strings that contain attribute values, separated by commas or other separation characters.
Notes
The test creates a user-defined function that should be dropped in the end (see the garbage collection).
Type
Problem detection (Each row in the result could represent a flaw in the design)
Create a separate table with a foreign key constraint referencing to the present table.
Data Source
INFORMATION_SCHEMA+system catalog
SQL Query
CREATE OR REPLACE FUNCTION f_check_format_comma_separated_list()
RETURNS TABLE (table_schema VARCHAR(128), table_name VARCHAR(128), column_name
VARCHAR(128)) AS $$
DECLARE
sql_stmt TEXT;
cnt BIGINT;
varchar_columns RECORD;
BEGIN
RAISE NOTICE 'Detecting possible occurrences of the antipattern "Format
Comma-Separated Lists"';
FOR varchar_columns IN SELECT c.table_schema, c.table_name, c.column_name
FROM INFORMATION_SCHEMA.columns AS c INNER JOIN INFORMATION_SCHEMA.tables AS t USING
(table_schema, table_name) WHERE c.data_type IN ('character varying', 'text') AND
t.table_type='BASE TABLE' AND c.table_schema NOT IN (SELECT schema_name
FROM INFORMATION_SCHEMA.schemata
WHERE schema_name<>'public' AND
schema_owner='postgres' AND schema_name IS NOT NULL) AND
c.column_name NOT ILIKE '%aadress%' AND
c.column_name NOT ILIKE '%address%' AND
c.column_name NOT ILIKE '%elukoht%' AND
c.column_name NOT ILIKE '%kirjeldus%' AND
c.column_name NOT ILIKE '%tekst%' AND
c.column_name NOT ILIKE '%description%' AND
c.column_name NOT ILIKE '%kommentaar%' AND
c.column_name NOT ILIKE '%comment%' AND
c.column_name NOT ILIKE '%diagnoos%' AND
c.column_name NOT ILIKE '%diagnosis%' AND
c.column_name NOT ILIKE '%ringkon%' AND
c.column_name NOT ILIKE '%constituency%'
ORDER BY c.table_schema, c.table_name LOOP
table_schema:= varchar_columns.table_schema;
table_name:= varchar_columns.table_name;
column_name:= varchar_columns.column_name;
sql_stmt:='SELECT Count(' || quote_ident(column_name) || ') AS c FROM
' || quote_ident(table_schema) ||'.' || quote_ident(table_name) || ' WHERE '||
quote_ident(column_name) || '~''(.+)([,;]{1}.+)+''';
EXECUTE sql_stmt INTO cnt;
IF cnt>0 THEN
RETURN NEXT;
END IF;
END LOOP;
RAISE NOTICE 'Detection completed';
RETURN;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER
SET search_path = information_schema, pg_temp;
SELECT * FROM f_check_format_comma_separated_list();
DROP FUNCTION IF EXISTS f_check_format_comma_separated_list();
Categories
This query is classified under the following categories:
Name
Description
Data at the database logical level
Queries of this category provide information about data in base tables.
Database design antipatterns
Queries of this category provide information about possible occurrences of SQL database design antipatterns.
Structure of base tables
Queries of this category provide information about the structuring of base tables at the database conceptual level
Further reading and related materials:
Reference
This is one of the antipatterns from the Bill Karwin's book of SQL antipatterns. See Chapter 2: Jaywalking.
Smell "Compund attribute": Sharma, T., Fragkoulis, M., Rizou, S., Bruntink, M. and Spinellis, D.: Smelly relations: measuring and understanding database schema quality. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, pp. 55-64. ACM, (2018).