有没有办法计算 SQL 中多列的唯一值的数量
Is there a way to count the number of unique value across multiple columns in SQL
我想根据tx_id
统计唯一值的个数,这里是部分原始数据:
table : Treatment Record
+------------------+-----------+----------------+------------------+
| SN | tx_id | pa3 | pa4 |
+------------------+-----------+----------------+------------------+
| I2120210007014 | 149362 | V16F2021117016 | V15S2021145018 |
| I2120210007014 | 149362 | V15S2021144019 | V15S2021145018 |
| I2120210007014 | 149362 | V16F2021117017 | V15S2021145018 |
| I2120210007014 | 149362 | V16F2021117017 | V15S2021145018 |
| I2120210007014 | 149362 | V16F2021117017 | V15S2021145018 |
| I2120210007014 | 148716 | V15C2021116010 | V15C20211091016|
+------------------+-----------+----------------+------------------+
例如,结果应如下所示:
+------------------+-----------+----------------+-------+
| SN | tx_id | V16F | V15S | V15C |
+------------------+-----------+-------+--------+-------+
| I2120210007014 | 149362 | 2 | 2 | 0 |
| I2120210007014 | 148716 | 0 | 0 | 2 |
+------------------+-----------+----------------+-------+
从原始数据中,您可以看到有两个不同的 tx_id
,我用它来识别每个组。因此,例如,所有 tx_id = '149362
都在同一组中。
而在pa3
和pa4
的列中,有2个不同的组,可以通过查看前4个字符来分类,如“V16F”、“V15S”。而且,我还得数一数同一组中不同措辞的数量。例如,您可以看到 pa3
列包含 V16F2021117016
、V15S2021144019
、V16F2021117017
,而 pa4
列仅包含 。因此,有V15S2021145018
.
因此,我们为“V16F”组计算 2 个,为“V15S”组计算 2 个。您可能会注意到计数不是基于 pa3
和 pa4
列,而是基于最后 4 个字符。例如V16F2021117016
和V16F2021117017
,它们属于同一个组,“V16F”,但是不同的词,因为最后4个字符分别是'7016'和'7017'。
但是我暂时找不到出路,只能在下面输入一些sql代码。希望有人能帮助我。
SELECT tx_id,
sum(case when val like 'V16F%' then 1 else 0 end),
sum(case when val2 like 'V15S%' then 1 else 0 end)
FROM ( select tx_id, pa3 as val, pa4 as val2 from Cool group by pa3, pa4)
GROUP BY tx_id
这是错误的输出:
+------------------+-----------+----------------+
| SN | tx_id | V16F | V15S |
+------------------+-----------+-------+--------+
| I2120210007014 | 149362 | 3 | 3 |
| I2120210007014 | 148716 | 0 | 0 |
+------------------+-----------+----------------+
最简单的方法是使用UNION ALL
获取1列中的所有pa3
和pa4
然后聚合:
SELECT SN, tx_id,
COUNT(DISTINCT CASE WHEN pa LIKE 'V16F%' THEN pa END) V16F,
COUNT(DISTINCT CASE WHEN pa LIKE 'V15S%' THEN pa END) V15S,
COUNT(DISTINCT CASE WHEN pa LIKE 'V15C%' THEN pa END) V15C
FROM (
SELECT SN, tx_id, pa3 pa FROM tablename
UNION ALL
SELECT SN, tx_id, pa4 pa FROM tablename
) t
GROUP BY SN, tx_id
或者,使用 UNION
,删除重复行,因此不需要 DISTINCT
:
SELECT SN, tx_id,
COUNT(CASE WHEN pa LIKE 'V16F%' THEN pa END) V16F,
COUNT(CASE WHEN pa LIKE 'V15S%' THEN pa END) V15S,
COUNT(CASE WHEN pa LIKE 'V15C%' THEN pa END) V15C
FROM (
SELECT SN, tx_id, pa3 pa FROM tablename
UNION
SELECT SN, tx_id, pa4 pa FROM tablename
) t
GROUP BY SN, tx_id
可以进一步简化为:
SELECT SN, tx_id,
SUM(pa LIKE 'V16F%') V16F,
SUM(pa LIKE 'V15S%') V15S,
SUM(pa LIKE 'V15C%') V15C
FROM (
SELECT SN, tx_id, pa3 pa FROM tablename
UNION
SELECT SN, tx_id, pa4 pa FROM tablename
) t
GROUP BY SN, tx_id
另一种方法是直接使用具有适用于此示例数据的更复杂逻辑的条件聚合:
SELECT SN, tx_id,
COUNT(DISTINCT CASE WHEN pa3 LIKE 'V16F%' THEN pa3 END) +
COUNT(DISTINCT CASE WHEN pa4 LIKE 'V16F%' THEN pa4 END) -
SUM(pa3 = pa4) V16F,
COUNT(DISTINCT CASE WHEN pa3 LIKE 'V15S%' THEN pa3 END) +
COUNT(DISTINCT CASE WHEN pa4 LIKE 'V15S%' THEN pa4 END) -
SUM(pa3 = pa4) V15S,
COUNT(DISTINCT CASE WHEN pa3 LIKE 'V15C%' THEN pa3 END) +
COUNT(DISTINCT CASE WHEN pa4 LIKE 'V15C%' THEN pa4 END) -
SUM(pa3 = pa4) V15C
FROM tablename
GROUP BY SN, tx_id
参见demo。
我想根据tx_id
统计唯一值的个数,这里是部分原始数据:
table : Treatment Record
+------------------+-----------+----------------+------------------+
| SN | tx_id | pa3 | pa4 |
+------------------+-----------+----------------+------------------+
| I2120210007014 | 149362 | V16F2021117016 | V15S2021145018 |
| I2120210007014 | 149362 | V15S2021144019 | V15S2021145018 |
| I2120210007014 | 149362 | V16F2021117017 | V15S2021145018 |
| I2120210007014 | 149362 | V16F2021117017 | V15S2021145018 |
| I2120210007014 | 149362 | V16F2021117017 | V15S2021145018 |
| I2120210007014 | 148716 | V15C2021116010 | V15C20211091016|
+------------------+-----------+----------------+------------------+
例如,结果应如下所示:
+------------------+-----------+----------------+-------+
| SN | tx_id | V16F | V15S | V15C |
+------------------+-----------+-------+--------+-------+
| I2120210007014 | 149362 | 2 | 2 | 0 |
| I2120210007014 | 148716 | 0 | 0 | 2 |
+------------------+-----------+----------------+-------+
从原始数据中,您可以看到有两个不同的 tx_id
,我用它来识别每个组。因此,例如,所有 tx_id = '149362
都在同一组中。
而在pa3
和pa4
的列中,有2个不同的组,可以通过查看前4个字符来分类,如“V16F”、“V15S”。而且,我还得数一数同一组中不同措辞的数量。例如,您可以看到 pa3
列包含 V16F2021117016
、V15S2021144019
、V16F2021117017
,而 pa4
列仅包含 。因此,有V15S2021145018
.
因此,我们为“V16F”组计算 2 个,为“V15S”组计算 2 个。您可能会注意到计数不是基于 pa3
和 pa4
列,而是基于最后 4 个字符。例如V16F2021117016
和V16F2021117017
,它们属于同一个组,“V16F”,但是不同的词,因为最后4个字符分别是'7016'和'7017'。
但是我暂时找不到出路,只能在下面输入一些sql代码。希望有人能帮助我。
SELECT tx_id,
sum(case when val like 'V16F%' then 1 else 0 end),
sum(case when val2 like 'V15S%' then 1 else 0 end)
FROM ( select tx_id, pa3 as val, pa4 as val2 from Cool group by pa3, pa4)
GROUP BY tx_id
这是错误的输出:
+------------------+-----------+----------------+
| SN | tx_id | V16F | V15S |
+------------------+-----------+-------+--------+
| I2120210007014 | 149362 | 3 | 3 |
| I2120210007014 | 148716 | 0 | 0 |
+------------------+-----------+----------------+
最简单的方法是使用UNION ALL
获取1列中的所有pa3
和pa4
然后聚合:
SELECT SN, tx_id,
COUNT(DISTINCT CASE WHEN pa LIKE 'V16F%' THEN pa END) V16F,
COUNT(DISTINCT CASE WHEN pa LIKE 'V15S%' THEN pa END) V15S,
COUNT(DISTINCT CASE WHEN pa LIKE 'V15C%' THEN pa END) V15C
FROM (
SELECT SN, tx_id, pa3 pa FROM tablename
UNION ALL
SELECT SN, tx_id, pa4 pa FROM tablename
) t
GROUP BY SN, tx_id
或者,使用 UNION
,删除重复行,因此不需要 DISTINCT
:
SELECT SN, tx_id,
COUNT(CASE WHEN pa LIKE 'V16F%' THEN pa END) V16F,
COUNT(CASE WHEN pa LIKE 'V15S%' THEN pa END) V15S,
COUNT(CASE WHEN pa LIKE 'V15C%' THEN pa END) V15C
FROM (
SELECT SN, tx_id, pa3 pa FROM tablename
UNION
SELECT SN, tx_id, pa4 pa FROM tablename
) t
GROUP BY SN, tx_id
可以进一步简化为:
SELECT SN, tx_id,
SUM(pa LIKE 'V16F%') V16F,
SUM(pa LIKE 'V15S%') V15S,
SUM(pa LIKE 'V15C%') V15C
FROM (
SELECT SN, tx_id, pa3 pa FROM tablename
UNION
SELECT SN, tx_id, pa4 pa FROM tablename
) t
GROUP BY SN, tx_id
另一种方法是直接使用具有适用于此示例数据的更复杂逻辑的条件聚合:
SELECT SN, tx_id,
COUNT(DISTINCT CASE WHEN pa3 LIKE 'V16F%' THEN pa3 END) +
COUNT(DISTINCT CASE WHEN pa4 LIKE 'V16F%' THEN pa4 END) -
SUM(pa3 = pa4) V16F,
COUNT(DISTINCT CASE WHEN pa3 LIKE 'V15S%' THEN pa3 END) +
COUNT(DISTINCT CASE WHEN pa4 LIKE 'V15S%' THEN pa4 END) -
SUM(pa3 = pa4) V15S,
COUNT(DISTINCT CASE WHEN pa3 LIKE 'V15C%' THEN pa3 END) +
COUNT(DISTINCT CASE WHEN pa4 LIKE 'V15C%' THEN pa4 END) -
SUM(pa3 = pa4) V15C
FROM tablename
GROUP BY SN, tx_id
参见demo。