SQL 重复行多个连接
SQL Duplicate Rows Multiple Joins
在 SQL 方面,我几乎是个菜鸟,因此我们将不胜感激。我有一个大型数据集,我正在为一家医院过滤。我从 6 个不同的 tables 中提取数据,而我的一个 tables 每次访问都有重复的行。我只想为每次访问拉入一行(拉入哪一行并不重要)。我知道我需要使用 DISTINCT 或 GROUP BY 子句,但我的语法一定是错误的。
SELECT
ADV.[VisitID] AS VisitID
,ADV.[Name] AS Name
,ADV.[UnitNumber] AS UnitNumber
,CONVERT(DATE,ADV.[BirthDateTime]) AS BirthDate
,ADV.[ReasonForVisit] AS ReasonForVisit
,ADV.[AccountNumber] AS AccountNumber
,DATEDIFF(day, ADV.ServiceDateTime, DIS.DischargeDateTime) AS LOS
,ADV.[HomePhone] AS PhoneNumber
,ADV.[ServiceDateTime] AS ServiceDateTime
,ADV.[Status] AS 'Status'
,PRV.[PrimaryCareID] AS PCP
,LAB.[TestMnemonic] AS Test
,LAB.[ResultRW] AS Result
,LAB.[AbnormalFlag] AS AbnormalFlag
,LAB.[ResultDateTime] AS ResultDateTime
,DIS.[Diagnosis] AS DischargeDiagnosis
,DIS.[ErDiagnosis] AS ERDiagnosis
,DCP.[TextLine] AS ProblemList
FROM Visits ADV
LEFT JOIN Tests LAB ON ( LAB.VisitID = ADV.VisitID AND
LAB.SourceID = ADV.SourceID )
LEFT JOIN Discharge DIS ON ( DIS.VisitID = LAB.VisitID AND
DIS.SourceID = LAB.SourceID )
LEFT JOIN Providers PRV ON ( PRV.VisitID = DIS.VisitID AND
PRV.SourceID = DIS.SourceID )
LEFT JOIN ProblemListVisits EPS ON ( EPS.VisitID = PRV.VisitID AND
EPS.SourceID = PRV.SourceID )
LEFT JOIN ProblemList DCP ON ( DCP.PatientID = EPS.PatientID AND
DCP.SourceID = EPS.SourceID )
WHERE ( DCP.[TextLine] LIKE '%Diabetes%' OR
DCP.[TextLine] LIKE '%Diabetic%' OR
DCP.[TextLine] LIKE '%DM2%' OR
DCP.[TextLine] LIKE '%DKA%' OR
DCP.[TextLine] LIKE '%Hyperglycemia%' OR
DCP.[TextLine] LIKE '%Hypoglycemia%' ) AND
( LAB.[TestMnemonic] = 'GLU' OR
LAB.[TestMnemonic] = '%HA1C' ) AND
ADV.[Status] != 'DIS CLI' )
所以这没问题,但是当医生进入患者的问题列表并进行更改时,它会重新归档整个列表,从而再次填充问题列表 table。因此,对于 1 次访问,由于问题列表,我可能会得到 4 个重复条目,而我只想要一个。哪个也无所谓。
我尝试引用其他问题并在其中嵌套另一个 SELECT 语句,但我总是遇到语法错误。
这是重复值的样子:
1111111111 SMITH,JOHN 1111 1/1/1901 CHEST PAIN 1111 2 111-111-1111 1/1/1901 12:15 DIS IN DOEJO GLU 120 H 1/2/1901 6:35 NULL CHEST PAIN Diabetes type 2, controlled
1111111111 SMITH,JOHN 1111 1/1/1901 CHEST PAIN 1111 2 111-111-1111 1/1/1901 12:15 DIS IN DOEJO GLU 120 H 1/2/1901 6:35 NULL CHEST PAIN Diabetes type 2, controlled
1111111111 SMITH,JOHN 1111 1/1/1901 CHEST PAIN 1111 2 111-111-1111 1/1/1901 12:15 DIS IN DOEJO GLU 120 H 1/2/1901 6:35 NULL CHEST PAIN Diabetes type 2, controlled
1111111111 SMITH,JOHN 1111 1/1/1901 CHEST PAIN 1111 2 111-111-1111 1/1/1901 12:15 DIS IN DOEJO GLU 120 H 1/2/1901 6:35 NULL CHEST PAIN Diabetes type 2, controlled
最后,'Diabetes type 2, controlled' 是导致重复的原因。如果我从查询中删除 ProblemListVisit 和 ProblemList tables,我只会得到一行数据。
最重要的是获得所有独特的测试结果,但不是问题列表中所有重复的条目(只想知道他们患有哪种类型的糖尿病,一次)。
谢谢!
Distinct
子句应该可以解决问题
但如果没有,您可以更改
LEFT JOIN ProblemList DCP ON ( DCP.PatientID = EPS.PatientID AND
DCP.SourceID = EPS.SourceID )
为
OUTER APPLY (Select top 1 DCP.[TextLine] FROM ProblemList DCP WHERE
DCP.PatientID = EPS.PatientID
AND DCP.SourceID = EPS.SourceID) DCP
尝试在 SELECT
之后添加 DISTINCT
。像这样:
SELECT DISTINCT
ADV.[VisitID] AS VisitID
,ADV.[Name] AS Name
...
代替 DISTINCT
,我认为这是实现此目标的最快方法,您还可以将生成多行的每个 table 移动到子查询中,在其中 GROUP BY您为 JOINS 和 SELECTS 寻找的值。
这里有两个优点:
您可以更好地控制这些更精细的 table 和
的输出
您减少了 JOIN 的开销,这将减少您的 I/O 和 CPU 使用,当您使用子查询中的 WHERE 子句限制它们允许通过的内容时。
代码:
SELECT
ADV.[VisitID] AS VisitID
,ADV.[Name] AS Name
,ADV.[UnitNumber] AS UnitNumber
,CONVERT(DATE,ADV.[BirthDateTime]) AS BirthDate
,ADV.[ReasonForVisit] AS ReasonForVisit
,ADV.[AccountNumber] AS AccountNumber
,DATEDIFF(day, ADV.ServiceDateTime, DIS.DischargeDateTime) AS LOS
,ADV.[HomePhone] AS PhoneNumber
,ADV.[ServiceDateTime] AS ServiceDateTime
,ADV.[Status] AS 'Status'
,PRV.[PrimaryCareID] AS PCP
,LAB.[TestMnemonic] AS Test
,LAB.[ResultRW] AS Result
,LAB.[AbnormalFlag] AS AbnormalFlag
,LAB.[ResultDateTime] AS ResultDateTime
,DIS.[Diagnosis] AS DischargeDiagnosis
,DIS.[ErDiagnosis] AS ERDiagnosis
,DCP.[TextLine] AS ProblemList
FROM Visits ADV
LEFT JOIN Tests LAB ON ( LAB.VisitID = ADV.VisitID AND
LAB.SourceID = ADV.SourceID )
LEFT JOIN Discharge DIS ON ( DIS.VisitID = LAB.VisitID AND
DIS.SourceID = LAB.SourceID )
LEFT JOIN Providers PRV ON ( PRV.VisitID = DIS.VisitID AND
PRV.SourceID = DIS.SourceID )
LEFT JOIN
(
SELECT
VisitID,
SourceID,
PatientID
FROM ProblemListVisits
GROUP BY
VisitID,
SourceID,
PatientID
) EPS ON ( EPS.VisitID = PRV.VisitID AND
EPS.SourceID = PRV.SourceID )
LEFT JOIN
(
SELECT
PatientID,
SourceID,
TextLine
FROM ProblemList
WHERE
[TextLine] LIKE '%Diabetes%' OR
[TextLine] LIKE '%Diabetic%' OR
[TextLine] LIKE '%DM2%' OR
[TextLine] LIKE '%DKA%' OR
[TextLine] LIKE '%Hyperglycemia%' OR
[TextLine] LIKE '%Hypoglycemia%'
GROUP BY
PatientID,
SourceID,
TextLine
) DCP ON ( DCP.PatientID = EPS.PatientID AND
DCP.SourceID = EPS.SourceID )
WHERE ( LAB.[TestMnemonic] = 'GLU' OR
LAB.[TestMnemonic] = '%HA1C' ) AND
ADV.[Status] != 'DIS CLI' )
如果您仍然得到倍数,则表明 [TextLine] 对您的问题列表 table 中的每个 VisitID/PatientID 组合都有一个以上的值。届时,您可以从 GROUP BY 子句中删除该字段,并在该字段上使用某种聚合,例如子查询中的 MAX([TextLine])
。不过,我怀疑在使用 DISTINCT
或使用此子查询方法后不会有重复项。
在 SQL 方面,我几乎是个菜鸟,因此我们将不胜感激。我有一个大型数据集,我正在为一家医院过滤。我从 6 个不同的 tables 中提取数据,而我的一个 tables 每次访问都有重复的行。我只想为每次访问拉入一行(拉入哪一行并不重要)。我知道我需要使用 DISTINCT 或 GROUP BY 子句,但我的语法一定是错误的。
SELECT
ADV.[VisitID] AS VisitID
,ADV.[Name] AS Name
,ADV.[UnitNumber] AS UnitNumber
,CONVERT(DATE,ADV.[BirthDateTime]) AS BirthDate
,ADV.[ReasonForVisit] AS ReasonForVisit
,ADV.[AccountNumber] AS AccountNumber
,DATEDIFF(day, ADV.ServiceDateTime, DIS.DischargeDateTime) AS LOS
,ADV.[HomePhone] AS PhoneNumber
,ADV.[ServiceDateTime] AS ServiceDateTime
,ADV.[Status] AS 'Status'
,PRV.[PrimaryCareID] AS PCP
,LAB.[TestMnemonic] AS Test
,LAB.[ResultRW] AS Result
,LAB.[AbnormalFlag] AS AbnormalFlag
,LAB.[ResultDateTime] AS ResultDateTime
,DIS.[Diagnosis] AS DischargeDiagnosis
,DIS.[ErDiagnosis] AS ERDiagnosis
,DCP.[TextLine] AS ProblemList
FROM Visits ADV
LEFT JOIN Tests LAB ON ( LAB.VisitID = ADV.VisitID AND
LAB.SourceID = ADV.SourceID )
LEFT JOIN Discharge DIS ON ( DIS.VisitID = LAB.VisitID AND
DIS.SourceID = LAB.SourceID )
LEFT JOIN Providers PRV ON ( PRV.VisitID = DIS.VisitID AND
PRV.SourceID = DIS.SourceID )
LEFT JOIN ProblemListVisits EPS ON ( EPS.VisitID = PRV.VisitID AND
EPS.SourceID = PRV.SourceID )
LEFT JOIN ProblemList DCP ON ( DCP.PatientID = EPS.PatientID AND
DCP.SourceID = EPS.SourceID )
WHERE ( DCP.[TextLine] LIKE '%Diabetes%' OR
DCP.[TextLine] LIKE '%Diabetic%' OR
DCP.[TextLine] LIKE '%DM2%' OR
DCP.[TextLine] LIKE '%DKA%' OR
DCP.[TextLine] LIKE '%Hyperglycemia%' OR
DCP.[TextLine] LIKE '%Hypoglycemia%' ) AND
( LAB.[TestMnemonic] = 'GLU' OR
LAB.[TestMnemonic] = '%HA1C' ) AND
ADV.[Status] != 'DIS CLI' )
所以这没问题,但是当医生进入患者的问题列表并进行更改时,它会重新归档整个列表,从而再次填充问题列表 table。因此,对于 1 次访问,由于问题列表,我可能会得到 4 个重复条目,而我只想要一个。哪个也无所谓。
我尝试引用其他问题并在其中嵌套另一个 SELECT 语句,但我总是遇到语法错误。
这是重复值的样子:
1111111111 SMITH,JOHN 1111 1/1/1901 CHEST PAIN 1111 2 111-111-1111 1/1/1901 12:15 DIS IN DOEJO GLU 120 H 1/2/1901 6:35 NULL CHEST PAIN Diabetes type 2, controlled
1111111111 SMITH,JOHN 1111 1/1/1901 CHEST PAIN 1111 2 111-111-1111 1/1/1901 12:15 DIS IN DOEJO GLU 120 H 1/2/1901 6:35 NULL CHEST PAIN Diabetes type 2, controlled
1111111111 SMITH,JOHN 1111 1/1/1901 CHEST PAIN 1111 2 111-111-1111 1/1/1901 12:15 DIS IN DOEJO GLU 120 H 1/2/1901 6:35 NULL CHEST PAIN Diabetes type 2, controlled
1111111111 SMITH,JOHN 1111 1/1/1901 CHEST PAIN 1111 2 111-111-1111 1/1/1901 12:15 DIS IN DOEJO GLU 120 H 1/2/1901 6:35 NULL CHEST PAIN Diabetes type 2, controlled
最后,'Diabetes type 2, controlled' 是导致重复的原因。如果我从查询中删除 ProblemListVisit 和 ProblemList tables,我只会得到一行数据。
最重要的是获得所有独特的测试结果,但不是问题列表中所有重复的条目(只想知道他们患有哪种类型的糖尿病,一次)。
谢谢!
Distinct
子句应该可以解决问题
但如果没有,您可以更改
LEFT JOIN ProblemList DCP ON ( DCP.PatientID = EPS.PatientID AND
DCP.SourceID = EPS.SourceID )
为
OUTER APPLY (Select top 1 DCP.[TextLine] FROM ProblemList DCP WHERE
DCP.PatientID = EPS.PatientID
AND DCP.SourceID = EPS.SourceID) DCP
尝试在 SELECT
之后添加 DISTINCT
。像这样:
SELECT DISTINCT
ADV.[VisitID] AS VisitID
,ADV.[Name] AS Name
...
代替 DISTINCT
,我认为这是实现此目标的最快方法,您还可以将生成多行的每个 table 移动到子查询中,在其中 GROUP BY您为 JOINS 和 SELECTS 寻找的值。
这里有两个优点:
您可以更好地控制这些更精细的 table 和
的输出
您减少了 JOIN 的开销,这将减少您的 I/O 和 CPU 使用,当您使用子查询中的 WHERE 子句限制它们允许通过的内容时。
代码:
SELECT
ADV.[VisitID] AS VisitID
,ADV.[Name] AS Name
,ADV.[UnitNumber] AS UnitNumber
,CONVERT(DATE,ADV.[BirthDateTime]) AS BirthDate
,ADV.[ReasonForVisit] AS ReasonForVisit
,ADV.[AccountNumber] AS AccountNumber
,DATEDIFF(day, ADV.ServiceDateTime, DIS.DischargeDateTime) AS LOS
,ADV.[HomePhone] AS PhoneNumber
,ADV.[ServiceDateTime] AS ServiceDateTime
,ADV.[Status] AS 'Status'
,PRV.[PrimaryCareID] AS PCP
,LAB.[TestMnemonic] AS Test
,LAB.[ResultRW] AS Result
,LAB.[AbnormalFlag] AS AbnormalFlag
,LAB.[ResultDateTime] AS ResultDateTime
,DIS.[Diagnosis] AS DischargeDiagnosis
,DIS.[ErDiagnosis] AS ERDiagnosis
,DCP.[TextLine] AS ProblemList
FROM Visits ADV
LEFT JOIN Tests LAB ON ( LAB.VisitID = ADV.VisitID AND
LAB.SourceID = ADV.SourceID )
LEFT JOIN Discharge DIS ON ( DIS.VisitID = LAB.VisitID AND
DIS.SourceID = LAB.SourceID )
LEFT JOIN Providers PRV ON ( PRV.VisitID = DIS.VisitID AND
PRV.SourceID = DIS.SourceID )
LEFT JOIN
(
SELECT
VisitID,
SourceID,
PatientID
FROM ProblemListVisits
GROUP BY
VisitID,
SourceID,
PatientID
) EPS ON ( EPS.VisitID = PRV.VisitID AND
EPS.SourceID = PRV.SourceID )
LEFT JOIN
(
SELECT
PatientID,
SourceID,
TextLine
FROM ProblemList
WHERE
[TextLine] LIKE '%Diabetes%' OR
[TextLine] LIKE '%Diabetic%' OR
[TextLine] LIKE '%DM2%' OR
[TextLine] LIKE '%DKA%' OR
[TextLine] LIKE '%Hyperglycemia%' OR
[TextLine] LIKE '%Hypoglycemia%'
GROUP BY
PatientID,
SourceID,
TextLine
) DCP ON ( DCP.PatientID = EPS.PatientID AND
DCP.SourceID = EPS.SourceID )
WHERE ( LAB.[TestMnemonic] = 'GLU' OR
LAB.[TestMnemonic] = '%HA1C' ) AND
ADV.[Status] != 'DIS CLI' )
如果您仍然得到倍数,则表明 [TextLine] 对您的问题列表 table 中的每个 VisitID/PatientID 组合都有一个以上的值。届时,您可以从 GROUP BY 子句中删除该字段,并在该字段上使用某种聚合,例如子查询中的 MAX([TextLine])
。不过,我怀疑在使用 DISTINCT
或使用此子查询方法后不会有重复项。