Find the pairs of table columns that name is different only by one symbol and that have the same type and/or domain.
Notes
The query finds the pairs of columns of the same table where the Levenshtein distance between the names is one. The query uses a function from the fuzzystrmatch extension. The query removes digits from the names before the comparison, i.e., the result does not contain the pairs like col1 and col2.
Type
Problem detection (Each row in the result could represent a flaw in the design)
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
WITH columns AS (SELECT nspname AS table_schema, relname AS table_name,
CASE WHEN relkind='r' THEN 'BASE TABLE'
WHEN relkind='v' THEN 'VIEW'
WHEN relkind='m' THEN 'MATERIALIZED VIEW'
WHEN relkind='f' THEN 'FOREIGN TABLE'
WHEN relkind='p' THEN 'PARTITIONED TABLE'
END AS table_type,
translate(attname,'01234567890','') AS column_name_stripped,
pg_type.typname AS column_type,
domain_type.typname AS column_domain_type,
string_agg(attname, ', ' ORDER BY attname) AS columns
FROM pg_class INNER JOIN pg_namespace ON pg_class.relnamespace=pg_namespace.oid
INNER JOIN pg_attribute ON pg_class.oid=pg_attribute.attrelid
INNER JOIN pg_type ON pg_attribute.atttypid =pg_type.oid
LEFT JOIN pg_type AS domain_type ON domain_type.oid=pg_type.typbasetype
WHERE attnum>=1
AND relkind IN ('r','v','m','f','p')
AND nspname NOT IN (SELECT schema_name
FROM INFORMATION_SCHEMA.schemata
WHERE schema_name<>'public' AND
schema_owner='postgres' AND schema_name IS NOT NULL)
GROUP BY table_schema, table_name, table_type, column_name_stripped, column_type, column_domain_type
)
SELECT c.table_schema, c.table_type, c.table_name, c.column_name_stripped AS col1, c.column_type AS col1_type, c.column_domain_type AS col1_domain_type,
c.columns AS col1_family,
c2.column_name_stripped AS col2, c2.column_type AS col2_type, c2.column_domain_type AS col2_domain_type, c2.columns AS col2_family
FROM columns AS c INNER JOIN columns AS c2 USING (table_schema, table_name)
WHERE c.column_name_stripped>c2.column_name_stripped
AND levenshtein_less_equal(c.column_name_stripped,c2.column_name_stripped,1)=1
AND c.column_type IS NOT DISTINCT FROM c2.column_type
AND c.column_domain_type IS NOT DISTINCT FROM c2.column_domain_type
ORDER BY table_type, table_schema, table_name;
DROP EXTENSION IF EXISTS fuzzystrmatch;
Categories
This query is classified under the following categories:
Name
Description
Naming
Queries of this category provide information about the style of naming.