查找比较 table 列的最佳匹配项
Find the top matches comparing table columns
我有一个数据库,其中包含来自不同来源的多达 400 个 table。我需要按列相似性对 excel 文件中的那些 table 进行分组(考虑到 table 具有 0、1、2 或所有具有相同名称的列)。挑战如下:
fac.table_1 have columns C1, C2, C3, C4 and C5
dim.table_2 has columns C1, C3, and C5
stg.table_3 has columns C1, C6, and C7
stg.table_4 has columns C2, and C99
...
预期结果应该是:
sch_name | table_name | ncols | nmatches
dim | table_2 | 3 | 3
stg | table_3 | 3 | 1
stg | table_4 | 2 | 1
我认为方法是将类似此代码的代码与 COUNT 或 INTERSECT 一起使用,在 WHERE 中插入我想与其他人比较的 table 名称:
SELECT
schemas.name sch_nm,
tables.name tb_nm,
columns.name col_nm
FROM sys.tables
LEFT JOIN sys.columns ON tables.object_id = columns.object_id
LEFT JOIN sys.schemas ON tables.schema_id = schemas.schema_id
您想统计列名在另一个table中存在的列数?
select sch_name, tbl_name,
ncols = count(*),
nmatches = sum(case when col_cnt > 1 then 1 else 0 end),
percentage = sum(case when col_cnt > 1 then 1 else 0 end) * 100 / count(*)
from
(
select sch_name = s.name,
tbl_name = t.name,
col_name = c.name,
col_cnt = count(c.name) over(partition by c.name)
from sys.schemas s
inner join sys.tables t on s.schema_id = t.schema_id
inner join sys.columns c on t.object_id = c.object_id
where t.name in ('table1', 'table2', 'table3', 'table4')
) c
where tbl_name not in ('table1')
group by sch_name, tbl_name
order by c.tbl_name;
结果:
sch_name
tbl_name
ncols
nmatches
fac
table_1
5
4
dim
table_2
3
3
stg
table_3
3
1
stg
table_4
2
1
我有一个数据库,其中包含来自不同来源的多达 400 个 table。我需要按列相似性对 excel 文件中的那些 table 进行分组(考虑到 table 具有 0、1、2 或所有具有相同名称的列)。挑战如下:
fac.table_1 have columns C1, C2, C3, C4 and C5
dim.table_2 has columns C1, C3, and C5
stg.table_3 has columns C1, C6, and C7
stg.table_4 has columns C2, and C99
...
预期结果应该是:
sch_name | table_name | ncols | nmatches
dim | table_2 | 3 | 3
stg | table_3 | 3 | 1
stg | table_4 | 2 | 1
我认为方法是将类似此代码的代码与 COUNT 或 INTERSECT 一起使用,在 WHERE 中插入我想与其他人比较的 table 名称:
SELECT
schemas.name sch_nm,
tables.name tb_nm,
columns.name col_nm
FROM sys.tables
LEFT JOIN sys.columns ON tables.object_id = columns.object_id
LEFT JOIN sys.schemas ON tables.schema_id = schemas.schema_id
您想统计列名在另一个table中存在的列数?
select sch_name, tbl_name,
ncols = count(*),
nmatches = sum(case when col_cnt > 1 then 1 else 0 end),
percentage = sum(case when col_cnt > 1 then 1 else 0 end) * 100 / count(*)
from
(
select sch_name = s.name,
tbl_name = t.name,
col_name = c.name,
col_cnt = count(c.name) over(partition by c.name)
from sys.schemas s
inner join sys.tables t on s.schema_id = t.schema_id
inner join sys.columns c on t.object_id = c.object_id
where t.name in ('table1', 'table2', 'table3', 'table4')
) c
where tbl_name not in ('table1')
group by sch_name, tbl_name
order by c.tbl_name;
结果:
sch_name | tbl_name | ncols | nmatches |
---|---|---|---|
fac | table_1 | 5 | 4 |
dim | table_2 | 3 | 3 |
stg | table_3 | 3 | 1 |
stg | table_4 | 2 | 1 |