获取两行之间不同的列

Question

我有一个包含 60 列的 table company。目标是创建一个工具来查找、比较和消除此 table.

中的重复项

示例：我发现 2 家公司可能相同，但我需要知道这两行之间的哪些值（列）不同才能继续。

我认为可以逐列 x 60 进行比较，但我正在寻找更简单、更通用的解决方案。

类似于：

SELECT * FROM company where co_id=22
SHOW DIFFERENCE
SELECT * FROM company where co_id=33

结果应该是不同的列名。

Answer 1

这里有一个存储过程，应该可以帮助您完成大部分工作...

虽然这应该有效 "as is"，但它没有错误检查，您应该添加它。

它获取 table 中的所有列并循环遍历它们。不同之处在于不同项目的数量多于一个。此外，输出为：

差异数的计数
每列的消息存在差异

return 具有差异的列的行集可能更有用。无论如何，祝你好运！

用法：

SELECT showdifference('public','company','co_id',22,33)


CREATE OR REPLACE FUNCTION showdifference(p_schema text, p_tablename text,p_idcolumn text,p_firstid integer, p_secondid integer)
  RETURNS INTEGER AS
$BODY$ 
DECLARE
    l_diffcount INTEGER;
    l_column text;
    l_dupcount integer;
    column_cursor CURSOR FOR select column_name from information_schema.columns where table_name = p_tablename and table_schema = p_schema and column_name <> p_idcolumn;
BEGIN


    -- need error checking here, to ensure the table and schema exist and the columns exist

    -- Should also check that the records ids exist.

    -- Should also check that the column type of the id field is integer


    -- Set the number of differences to zero.

    l_diffcount := 0;

    -- use a cursor to iterate over the columns found in information_schema.columns
    -- open the cursor

    OPEN column_cursor;

    LOOP
        FETCH column_cursor INTO l_column;
        EXIT WHEN NOT FOUND;

        -- build a query to see if there is a difference between the columns. If there is raise a notice
        EXECUTE 'select count(distinct  ' || quote_ident(l_column) || ' ) from ' || quote_ident(p_schema) || '.' || quote_ident(p_tablename) || ' where ' || quote_ident(p_idcolumn) || ' in ('|| p_firstid || ',' || p_secondid ||')'
        INTO l_dupcount;



        IF l_dupcount > 1 THEN
        -- increment the counter
        l_diffcount := l_diffcount +1;
        RAISE NOTICE  '% has % differences', l_column, l_dupcount ; -- for "real" you might want to return a rowset and could do something here

        END IF;


    END LOOP;




    -- close the cursor
    CLOSE column_cursor;


    RETURN l_diffcount;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE STRICT
  COST 100;

Answer 2

为此，您可以使用行的中间 key/value 表示，使用 JSON 函数或使用 hstore 扩展（现在仅具有历史意义）。 JSON 内置于每个合理的最新版本的 PostgreSQL 中，而 hstore 必须使用 CREATE EXTENSION 安装在数据库中。

演示：

CREATE TABLE table1 (id int primary key, t1 text, t2 text, t3 text);

让我们插入主键不同的两行和另一列 (t3)。

INSERT INTO table1 VALUES 
 (1,'foo','bar','baz'),
 (2,'foo','bar','biz');

json

的解决方案

首先用原始行号得到行的 key/value 表示，然后我们根据原始行号对行进行配对，然后过滤掉具有相同 "value" 列

的那些

WITH rowcols AS (
  select rn,  key, value
  from (select row_number() over () as rn,
   row_to_json(table1.*) as r from table1) AS s
  cross join lateral json_each_text(s.r)
)
select r1.key from rowcols r1 join rowcols r2
on (r1.rn=r2.rn-1 and r1.key = r2.key)
where r1.value <> r2.value;

示例结果：

key 
-----
 id
 t3

hstore 的解决方案

SELECT skeys(h1-h2) from 
  (select hstore(t.*) as h1 from table1 t where id=1) h1
 CROSS JOIN
  (select hstore(t.*) as h2 from table1 t where id=2) h2;

h1-h2逐个键求差，skeys()将结果作为集合输出。

结果：

 skeys 
-------
 id
 t3

select-list 可以使用 skeys((h1-h2)-'id'::text) 进行细化以始终删除 id，作为主键，显然行之间总是不同的。

获取两行之间不同的列

Get columns that differ between 2 rows

postgresql

duplicate-removal

postgresql-9.1

json

hstore 的解决方案