Goal This query performs an intra-table analysis to detect potential schema ambiguities. It identifies pairs of columns within the same table that exhibit both high textual similarity and structural equivalence. Specifically, it flags pairs where the names have a Levenshtein edit distance of exactly one and the columns share the same data type or domain. This combination suggests a high probability of typographical errors (e.g., status vs statuss), inconsistent naming (singular vs. plural), or improper denormalization (e.g., item1 vs item2), all of which undermine schema clarity.
Notes The query uses a function from the fuzzystrmatch extension. The query removes digits from the names before the comparison, i.e., the result does not contain the pairs like col1 and col2.
Type Problem detection (Each row in the result could represent a flaw in the design)
Reliability Low (Many false-positive results)
License MIT License
Data Source INFORMATION_SCHEMA+system catalog
SQL Query
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
WITH columns AS (SELECT nspname AS table_schema, relname AS table_name, 
CASE WHEN relkind='r' THEN 'BASE TABLE'
WHEN relkind='v' THEN 'VIEW'
WHEN relkind='m' THEN 'MATERIALIZED VIEW'
WHEN relkind='f' THEN 'FOREIGN TABLE'
WHEN relkind='p' THEN 'PARTITIONED TABLE'
END AS table_type, 
translate(attname,'01234567890','') AS column_name_stripped,
pg_type.typname AS column_type,
domain_type.typname AS column_domain_type,
string_agg(attname, ',
' ORDER BY attname) AS columns FROM pg_class INNER JOIN pg_namespace ON pg_class.relnamespace=pg_namespace.oid INNER JOIN pg_attribute ON pg_class.oid=pg_attribute.attrelid INNER JOIN pg_type ON pg_attribute.atttypid =pg_type.oid LEFT JOIN pg_type AS domain_type ON domain_type.oid=pg_type.typbasetype WHERE attnum>=1 AND relkind IN ('r','v','m','f','p') AND nspname NOT IN (SELECT schema_name FROM INFORMATION_SCHEMA.schemata WHERE schema_name<>'public' AND schema_owner='postgres' AND schema_name IS NOT NULL) GROUP BY table_schema, table_name, table_type, column_name_stripped, column_type, column_domain_type ) SELECT c.table_schema, c.table_type, c.table_name, c.column_name_stripped AS col1, c.column_type AS col1_type, c.column_domain_type AS col1_domain_type, c.columns AS col1_family, c2.column_name_stripped AS col2, c2.column_type AS col2_type, c2.column_domain_type AS col2_domain_type, c2.columns AS col2_family FROM columns AS c INNER JOIN columns AS c2 USING (table_schema, table_name) WHERE c.column_name_stripped>c2.column_name_stripped AND levenshtein_less_equal(c.column_name_stripped,c2.column_name_stripped,1)=1 AND c.column_type IS NOT DISTINCT FROM c2.column_type AND c.column_domain_type IS NOT DISTINCT FROM c2.column_domain_type ORDER BY table_type, table_schema, table_name; DROP EXTENSION IF EXISTS fuzzystrmatch;
Categories

This query is classified under the following categories:

NameDescription
NamingQueries of this category provide information about the style of naming.