This query performs an intra-table analysis to detect potential schema ambiguities. It identifies pairs of columns within the same table that exhibit both high textual similarity and structural equivalence. Specifically, it flags pairs where the names have a Levenshtein edit distance of exactly one and the columns share the same data type or domain. This combination suggests a high probability of typographical errors (e.g., status vs statuss), inconsistent naming (singular vs. plural), or improper denormalization (e.g., item1 vs item2), all of which undermine schema clarity.
Notes
The query uses a function from the fuzzystrmatch extension. The query removes digits from the names before the comparison, i.e., the result does not contain the pairs like col1 and col2.
Type
Problem detection (Each row in the result could represent a flaw in the design)
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
WITH columns AS (SELECT nspname AS table_schema, relname AS table_name,
CASE WHEN relkind='r' THEN 'BASE TABLE'
WHEN relkind='v' THEN 'VIEW'
WHEN relkind='m' THEN 'MATERIALIZED VIEW'
WHEN relkind='f' THEN 'FOREIGN TABLE'
WHEN relkind='p' THEN 'PARTITIONED TABLE'
END AS table_type,
translate(attname,'01234567890','') AS column_name_stripped,
pg_type.typname AS column_type,
domain_type.typname AS column_domain_type,
string_agg(attname, ', ' ORDER BY attname) AS columns
FROM pg_class INNER JOIN pg_namespace ON pg_class.relnamespace=pg_namespace.oid
INNER JOIN pg_attribute ON pg_class.oid=pg_attribute.attrelid
INNER JOIN pg_type ON pg_attribute.atttypid =pg_type.oid
LEFT JOIN pg_type AS domain_type ON domain_type.oid=pg_type.typbasetype
WHERE attnum>=1
AND relkind IN ('r','v','m','f','p')
AND nspname NOT IN (SELECT schema_name
FROM INFORMATION_SCHEMA.schemata
WHERE schema_name<>'public' AND
schema_owner='postgres' AND schema_name IS NOT NULL)
GROUP BY table_schema, table_name, table_type, column_name_stripped, column_type, column_domain_type
)
SELECT c.table_schema, c.table_type, c.table_name, c.column_name_stripped AS col1, c.column_type AS col1_type, c.column_domain_type AS col1_domain_type,
c.columns AS col1_family,
c2.column_name_stripped AS col2, c2.column_type AS col2_type, c2.column_domain_type AS col2_domain_type, c2.columns AS col2_family
FROM columns AS c INNER JOIN columns AS c2 USING (table_schema, table_name)
WHERE c.column_name_stripped>c2.column_name_stripped
AND levenshtein_less_equal(c.column_name_stripped,c2.column_name_stripped,1)=1
AND c.column_type IS NOT DISTINCT FROM c2.column_type
AND c.column_domain_type IS NOT DISTINCT FROM c2.column_domain_type
ORDER BY table_type, table_schema, table_name;
DROP EXTENSION IF EXISTS fuzzystrmatch;
Categories
This query is classified under the following categories:
Name
Description
Naming
Queries of this category provide information about the style of naming.