Select 个案的不同计数
Select distinct count across cases
我有一个 VITALS
table 患者的数据库。此 table 包含每个患者唯一的 patient ID (PATID)
和 height variable (HT)
。单个患者可能记录了 >1
身高。
我正在尝试 return 计算身高范围 (e.g., 68-72", 72-76", etc.)
内和跨身高范围内的独特 PATIDs
的数量。每个 PATID
应该算作 *only once*
。然而,我发现,如果一个病人有多个身高记录,他们将在一个范围内被计算一次,但如果他们的身高超过范围,他们将被计算两次 - 每个范围一次。
例如,如果患者的身高记录为 68、72 和 73,他们将在 68-72 范围内被计算一次,在 72-76 范围内计算一次。我可以看出这是在发生,因为我们有 3054 个唯一的 PATID,但是查询 return 的计数总和 >5000。
我的代码是:
SELECT
CASE
when "HT" >0 and "HT" <=4 then '0-4'
when "HT" >4 and "HT" <=8 then '4-8'
when "HT" >8 and "HT" <=12 then '8-12'
when "HT" >12 and "HT" <=16 then '12-16'
when "HT" >16 and "HT" <=20 then '16-20'
when "HT" >20 and "HT" <=24 then '29-24'
when "HT" >24 and "HT" <=28 then '24-28'
when "HT" >28 and "HT" <=32 then '28-32'
when "HT" >32 and "HT" <=36 then '32-36'
when "HT" >36 and "HT" <=40 then '36-40'
when "HT" >40 and "HT" <=44 then '40-44'
when "HT" >44 and "HT" <=48 then '44-48'
when "HT" >48 and "HT" <=52 then '48-52'
when "HT" >52 and "HT" <=56 then '52-56'
when "HT" >56 and "HT" <=60 then '56-60'
when "HT" >60 and "HT" <=64 then '60-64'
when "HT" >64 and "HT" <=68 then '64-68'
when "HT" >68 and "HT" <=72 then '68-72'
when "HT" >72 and "HT" <=76 then '72-76'
when "HT" >76 and "HT" <=80 then '76-80'
when "HT" >80 and "HT" <=84 then '80-84'
when "HT" >84 and "HT" <=88 then '84-88'
when "HT" IS NULL then 'Null'
else '>88'
END AS "Height Range",
COUNT(DISTINCT vital."PATID") AS "Count"
FROM dbo."VITAL" vital
GROUP BY 1;
如果患者有多个记录,您必须选择所需的记录。
一个解决方案是将源更改为仅获得最大高度,如下所示:
FROM (select "PATID", max("HT") "HT" from dbo."VITAL" GROUP BY "PATID") vital
或者您可以取记录的最小值或平均值 - 适当的解决方案取决于您的要求。
您可能会在子查询中折叠重复项 在您进行计数之前:
SELECT CASE
WHEN "HT" IS NULL THEN 'Null'
WHEN "HT" <= 4 THEN '0-4'
WHEN "HT" <= 8 THEN '4-8'
WHEN "HT" <= 12 THEN '8-12'
WHEN "HT" <= 16 THEN '12-16'
WHEN "HT" <= 20 THEN '16-20'
WHEN "HT" <= 24 THEN '29-24'
WHEN "HT" <= 28 THEN '24-28'
WHEN "HT" <= 32 THEN '28-32'
WHEN "HT" <= 36 THEN '32-36'
WHEN "HT" <= 40 THEN '36-40'
WHEN "HT" <= 44 THEN '40-44'
WHEN "HT" <= 48 THEN '44-48'
WHEN "HT" <= 52 THEN '48-52'
WHEN "HT" <= 56 THEN '52-56'
WHEN "HT" <= 60 THEN '56-60'
WHEN "HT" <= 64 THEN '60-64'
WHEN "HT" <= 68 THEN '64-68'
WHEN "HT" <= 72 THEN '68-72'
WHEN "HT" <= 76 THEN '72-76'
WHEN "HT" <= 80 THEN '76-80'
WHEN "HT" <= 84 THEN '80-84'
WHEN "HT" <= 88 THEN '84-88'
ELSE '>88'
END AS "Height Range",
count(*) AS "Count" -- DISTINCT not needed any more
FROM (
SELECT DISTINCT ON ("PATID") -- get greatest "HT" per patient
"PATID", "HT"
FROM dbo."VITAL"
ORDER BY "PATID", "HT" DESC NULLS LAST
) sub
GROUP BY 1;
我还从您的 CASE
语句中删除了多余的检查 - 假设负高度是不可能的(您应该有一个 CHECK
约束)。
DISTINCT ON
的详细解释:
- Select first row in each GROUP BY group?
或者在子查询中使用聚合,如 。
我有一个 VITALS
table 患者的数据库。此 table 包含每个患者唯一的 patient ID (PATID)
和 height variable (HT)
。单个患者可能记录了 >1
身高。
我正在尝试 return 计算身高范围 (e.g., 68-72", 72-76", etc.)
内和跨身高范围内的独特 PATIDs
的数量。每个 PATID
应该算作 *only once*
。然而,我发现,如果一个病人有多个身高记录,他们将在一个范围内被计算一次,但如果他们的身高超过范围,他们将被计算两次 - 每个范围一次。
例如,如果患者的身高记录为 68、72 和 73,他们将在 68-72 范围内被计算一次,在 72-76 范围内计算一次。我可以看出这是在发生,因为我们有 3054 个唯一的 PATID,但是查询 return 的计数总和 >5000。
我的代码是:
SELECT
CASE
when "HT" >0 and "HT" <=4 then '0-4'
when "HT" >4 and "HT" <=8 then '4-8'
when "HT" >8 and "HT" <=12 then '8-12'
when "HT" >12 and "HT" <=16 then '12-16'
when "HT" >16 and "HT" <=20 then '16-20'
when "HT" >20 and "HT" <=24 then '29-24'
when "HT" >24 and "HT" <=28 then '24-28'
when "HT" >28 and "HT" <=32 then '28-32'
when "HT" >32 and "HT" <=36 then '32-36'
when "HT" >36 and "HT" <=40 then '36-40'
when "HT" >40 and "HT" <=44 then '40-44'
when "HT" >44 and "HT" <=48 then '44-48'
when "HT" >48 and "HT" <=52 then '48-52'
when "HT" >52 and "HT" <=56 then '52-56'
when "HT" >56 and "HT" <=60 then '56-60'
when "HT" >60 and "HT" <=64 then '60-64'
when "HT" >64 and "HT" <=68 then '64-68'
when "HT" >68 and "HT" <=72 then '68-72'
when "HT" >72 and "HT" <=76 then '72-76'
when "HT" >76 and "HT" <=80 then '76-80'
when "HT" >80 and "HT" <=84 then '80-84'
when "HT" >84 and "HT" <=88 then '84-88'
when "HT" IS NULL then 'Null'
else '>88'
END AS "Height Range",
COUNT(DISTINCT vital."PATID") AS "Count"
FROM dbo."VITAL" vital
GROUP BY 1;
如果患者有多个记录,您必须选择所需的记录。
一个解决方案是将源更改为仅获得最大高度,如下所示:
FROM (select "PATID", max("HT") "HT" from dbo."VITAL" GROUP BY "PATID") vital
或者您可以取记录的最小值或平均值 - 适当的解决方案取决于您的要求。
您可能会在子查询中折叠重复项 在您进行计数之前:
SELECT CASE
WHEN "HT" IS NULL THEN 'Null'
WHEN "HT" <= 4 THEN '0-4'
WHEN "HT" <= 8 THEN '4-8'
WHEN "HT" <= 12 THEN '8-12'
WHEN "HT" <= 16 THEN '12-16'
WHEN "HT" <= 20 THEN '16-20'
WHEN "HT" <= 24 THEN '29-24'
WHEN "HT" <= 28 THEN '24-28'
WHEN "HT" <= 32 THEN '28-32'
WHEN "HT" <= 36 THEN '32-36'
WHEN "HT" <= 40 THEN '36-40'
WHEN "HT" <= 44 THEN '40-44'
WHEN "HT" <= 48 THEN '44-48'
WHEN "HT" <= 52 THEN '48-52'
WHEN "HT" <= 56 THEN '52-56'
WHEN "HT" <= 60 THEN '56-60'
WHEN "HT" <= 64 THEN '60-64'
WHEN "HT" <= 68 THEN '64-68'
WHEN "HT" <= 72 THEN '68-72'
WHEN "HT" <= 76 THEN '72-76'
WHEN "HT" <= 80 THEN '76-80'
WHEN "HT" <= 84 THEN '80-84'
WHEN "HT" <= 88 THEN '84-88'
ELSE '>88'
END AS "Height Range",
count(*) AS "Count" -- DISTINCT not needed any more
FROM (
SELECT DISTINCT ON ("PATID") -- get greatest "HT" per patient
"PATID", "HT"
FROM dbo."VITAL"
ORDER BY "PATID", "HT" DESC NULLS LAST
) sub
GROUP BY 1;
我还从您的 CASE
语句中删除了多余的检查 - 假设负高度是不可能的(您应该有一个 CHECK
约束)。
DISTINCT ON
的详细解释:
- Select first row in each GROUP BY group?
或者在子查询中使用聚合,如