不能排除以前的非重复行
Cannot exclude previous non-duplicate rows
简而言之,这里是:
- 我有 1000 名左右的员工,他们有多次年度复训要求
- 我需要能够按县、机构、员工和培训类型对员工进行排序(并且还允许每个级别的排序列表)
- 我只想显示最近的员工参加培训的日期
到目前为止我尝试过的:
我只处理过一个员工的记录:
DECLARE @Skill int
SET @Skill = 81
SELECT TOP 1
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
FROM portfolio PO
INNER JOIN person P ON PO.person_id=P.person_id
INNER JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
GROUP BY
PO.person_id,
PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
HAVING SD.language_id=26
AND PO.person_id=123456
AND SV.skill_id= @Skill
ORDER BY Employee, PO.course_startdate DESC
注意: 过多的 JOINS 是由于主机数据库中缺少 FK 关系。我们的供应商将其设计为主要依赖于内置于其前端的代码,因此我正在使用现有的代码。
前面列出的代码returns结果如下:
Most Recent Record for Employee #123456
当我尝试从员工列表中提取最新记录时:
DECLARE @Skill int
SET @Skill = 81
SELECT
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
FROM portfolio PO
INNER JOIN person P ON PO.person_id=P.person_id
INNER JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
GROUP BY
PO.person_id,
PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
HAVING SD.language_id=26
AND PO.person_id IN (SELECT DISTINCT person_id FROM portfolio)
AND SV.skill_id= @Skill
ORDER BY Employee, PO.course_startdate DESC
我为同一名员工获得了多个条目(e.g.different 次该员工接受了相同 skill_id 的培训)。
我想做的是这样的:
IF count(SV.skill_id)>1
THEN SELECT TOP 1 component_id --for each individual
FROM portfolio
我只是不知道该把条件放在哪里让它给我每人一条记录。我试过分配 局部变量 ,将 SELECT 子查询移动到各个列,添加和删除约束...等等。到目前为止没有任何效果。
我正在使用以下软件:
- SQL Server Management Studio 2014 和 2017(live 数据库是 2014 年的,我在 2017 年有一个用于开发目的的静态数据库)
- Report Builder 3.0(我公司还没有升级到最新最好的)
P.S。如果有使用正则表达式对报表本身的记录进行排序的方法,请告诉我!
如果您 group by
课程名称,并且 select max(course_date)
您会得到它,例如
DECLARE @Skill int
SET @Skill = 81
SELECT TOP 1
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
max(PO.course_startdate) most_recent_course_startdate,
max(DATEADD(DD,SV.schedule_days,PO.course_startdate)) as ExpireDate
FROM portfolio PO
INNER JOIN person P ON PO.person_id=P.person_id
INNER JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
where SD.language_id=26
AND PO.person_id=123456
AND SV.skill_id= @Skill
GROUP BY
PO.person_id,
--PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
ORDER BY Employee, most_recent_course_startdate DESC
此外HAVING
是为了使用聚合条件,否则就坚持WHERE
。
您几乎发现了您的 "what I want to do" 片段的问题,那就是当您有超过1 个用户(即想要返回超过 1 行)。
ROW_NUMBER()
是处理这个问题的好方法。它根据条件为每一行分配一个数字。
例如,ROW_NUMBER() OVER (PARTITION BY PO.person_id ORDER BY PO.course_startdate DESC) as RN
将为每行分配一个 1
,每个 PO.person_id
的最新 PO.course_startdate
。如果您在派生的 table 或 CTE 中执行此操作,则只需在 final/outer select 中过滤到 RN = 1
即可找到最新的每个用户的行。
CTE 示例:
DECLARE @Skill int
SET @Skill = 81
;WITH yourCTE as (
SELECT
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate,
ROW_NUMBER() OVER (PARTITION BY PO.person_id ORDER BY PO.course_startdate DESC) as RN
FROM portfolio PO
JOIN person P ON PO.person_id=P.person_id
JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
WHERE SD.language_id=26
AND SV.skill_id= @Skill
)
SELECT employee, extenal_id, job_title, name,
ExpireInterval, course_startdate, ExpireDate
FROM yourCTE
WHERE RN = 1
我还将您的 HAVING
条件移动到 WHERE
(并删除了一个多余的),将 INNER JOIN
简写为 JOIN
(只是为了保持一致) ,并删除了您的 GROUP BY
和 ORDER BY
。我没有看到分组的意义,但如果您仍然需要,可以将 ORDER BY
添加到最后的 select。
几个观察,然后一个答案。
在SQL服务器中,INNER JOIN
和JOIN
是同一个意思。
正如@DaleBurrell 指出的那样,除非您按聚合值进行过滤,否则请使用 WHERE
子句而不是 HAVING
子句。 WHERE
在查询处理中较早应用,您应该会看到将过滤放在那里的性能略有提高。另外,如果你愿意的话,它更多 "standard"。
最后,我删除了 person_id
的过滤子查询,因为它是 portfolio
的自连接,我看不出有什么好的理由。如果其中有其他标准使其有用,请继续并将其放回原处。
话虽如此,您的第二次尝试非常接近。如果您 RANK your results using your existing ORDER BY
clause, then apply TOP (1) WITH TIES,它将 return 每个员工的排名第一的结果,按日期排序。
DECLARE @Skill int
SET @Skill = 81
SELECT TOP (1) WITH TIES
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
FROM portfolio PO
JOIN person P ON PO.person_id=P.person_id
JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
JOIN portfolio PF ON PO.person_id = PF.person_id
WHERE SD.language_id=26
AND SV.skill_id= @Skill
GROUP BY
PO.person_id,
PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
ORDER BY RANK() OVER (PARTITION BY Employee ORDER BY PO.course_startdate DESC)
简而言之,这里是:
- 我有 1000 名左右的员工,他们有多次年度复训要求
- 我需要能够按县、机构、员工和培训类型对员工进行排序(并且还允许每个级别的排序列表)
- 我只想显示最近的员工参加培训的日期
到目前为止我尝试过的:
我只处理过一个员工的记录:
DECLARE @Skill int SET @Skill = 81 SELECT TOP 1 P.lastname+', '+P.firstname AS Employee, P.external_id, PC.job_title, SD.name, SV.schedule_days as ExpireInterval, PO.course_startdate, DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate FROM portfolio PO INNER JOIN person P ON PO.person_id=P.person_id INNER JOIN e_component EC ON PO.component_id=EC.component_id JOIN skill_value SV ON EC.component_id=SV.object_id JOIN skill_description SD ON SV.skill_id=SD.skill_id JOIN person_custom PC ON P.person_id=PC.person_id GROUP BY PO.person_id, PO.course_startdate, SV.skill_id, P.lastname, P.firstname, P.external_id, PC.job_title, SD.name, SV.schedule_days, SD.language_id HAVING SD.language_id=26 AND PO.person_id=123456 AND SV.skill_id= @Skill ORDER BY Employee, PO.course_startdate DESC
注意: 过多的 JOINS 是由于主机数据库中缺少 FK 关系。我们的供应商将其设计为主要依赖于内置于其前端的代码,因此我正在使用现有的代码。
前面列出的代码returns结果如下:
Most Recent Record for Employee #123456
当我尝试从员工列表中提取最新记录时:
DECLARE @Skill int SET @Skill = 81 SELECT P.lastname+', '+P.firstname AS Employee, P.external_id, PC.job_title, SD.name, SV.schedule_days as ExpireInterval, PO.course_startdate, DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate FROM portfolio PO INNER JOIN person P ON PO.person_id=P.person_id INNER JOIN e_component EC ON PO.component_id=EC.component_id JOIN skill_value SV ON EC.component_id=SV.object_id JOIN skill_description SD ON SV.skill_id=SD.skill_id JOIN person_custom PC ON P.person_id=PC.person_id GROUP BY PO.person_id, PO.course_startdate, SV.skill_id, P.lastname, P.firstname, P.external_id, PC.job_title, SD.name, SV.schedule_days, SD.language_id HAVING SD.language_id=26 AND PO.person_id IN (SELECT DISTINCT person_id FROM portfolio) AND SV.skill_id= @Skill ORDER BY Employee, PO.course_startdate DESC
我为同一名员工获得了多个条目(e.g.different 次该员工接受了相同 skill_id 的培训)。
我想做的是这样的:
IF count(SV.skill_id)>1
THEN SELECT TOP 1 component_id --for each individual
FROM portfolio
我只是不知道该把条件放在哪里让它给我每人一条记录。我试过分配 局部变量 ,将 SELECT 子查询移动到各个列,添加和删除约束...等等。到目前为止没有任何效果。
我正在使用以下软件:
- SQL Server Management Studio 2014 和 2017(live 数据库是 2014 年的,我在 2017 年有一个用于开发目的的静态数据库)
- Report Builder 3.0(我公司还没有升级到最新最好的)
P.S。如果有使用正则表达式对报表本身的记录进行排序的方法,请告诉我!
如果您 group by
课程名称,并且 select max(course_date)
您会得到它,例如
DECLARE @Skill int
SET @Skill = 81
SELECT TOP 1
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
max(PO.course_startdate) most_recent_course_startdate,
max(DATEADD(DD,SV.schedule_days,PO.course_startdate)) as ExpireDate
FROM portfolio PO
INNER JOIN person P ON PO.person_id=P.person_id
INNER JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
where SD.language_id=26
AND PO.person_id=123456
AND SV.skill_id= @Skill
GROUP BY
PO.person_id,
--PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
ORDER BY Employee, most_recent_course_startdate DESC
此外HAVING
是为了使用聚合条件,否则就坚持WHERE
。
您几乎发现了您的 "what I want to do" 片段的问题,那就是当您有超过1 个用户(即想要返回超过 1 行)。
ROW_NUMBER()
是处理这个问题的好方法。它根据条件为每一行分配一个数字。
例如,ROW_NUMBER() OVER (PARTITION BY PO.person_id ORDER BY PO.course_startdate DESC) as RN
将为每行分配一个 1
,每个 PO.person_id
的最新 PO.course_startdate
。如果您在派生的 table 或 CTE 中执行此操作,则只需在 final/outer select 中过滤到 RN = 1
即可找到最新的每个用户的行。
CTE 示例:
DECLARE @Skill int
SET @Skill = 81
;WITH yourCTE as (
SELECT
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate,
ROW_NUMBER() OVER (PARTITION BY PO.person_id ORDER BY PO.course_startdate DESC) as RN
FROM portfolio PO
JOIN person P ON PO.person_id=P.person_id
JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
WHERE SD.language_id=26
AND SV.skill_id= @Skill
)
SELECT employee, extenal_id, job_title, name,
ExpireInterval, course_startdate, ExpireDate
FROM yourCTE
WHERE RN = 1
我还将您的 HAVING
条件移动到 WHERE
(并删除了一个多余的),将 INNER JOIN
简写为 JOIN
(只是为了保持一致) ,并删除了您的 GROUP BY
和 ORDER BY
。我没有看到分组的意义,但如果您仍然需要,可以将 ORDER BY
添加到最后的 select。
几个观察,然后一个答案。
在SQL服务器中,INNER JOIN
和JOIN
是同一个意思。
正如@DaleBurrell 指出的那样,除非您按聚合值进行过滤,否则请使用 WHERE
子句而不是 HAVING
子句。 WHERE
在查询处理中较早应用,您应该会看到将过滤放在那里的性能略有提高。另外,如果你愿意的话,它更多 "standard"。
最后,我删除了 person_id
的过滤子查询,因为它是 portfolio
的自连接,我看不出有什么好的理由。如果其中有其他标准使其有用,请继续并将其放回原处。
话虽如此,您的第二次尝试非常接近。如果您 RANK your results using your existing ORDER BY
clause, then apply TOP (1) WITH TIES,它将 return 每个员工的排名第一的结果,按日期排序。
DECLARE @Skill int
SET @Skill = 81
SELECT TOP (1) WITH TIES
P.lastname+', '+P.firstname AS Employee,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days as ExpireInterval,
PO.course_startdate,
DATEADD(DD,SV.schedule_days,PO.course_startdate) as ExpireDate
FROM portfolio PO
JOIN person P ON PO.person_id=P.person_id
JOIN e_component EC ON PO.component_id=EC.component_id
JOIN skill_value SV ON EC.component_id=SV.object_id
JOIN skill_description SD ON SV.skill_id=SD.skill_id
JOIN person_custom PC ON P.person_id=PC.person_id
JOIN portfolio PF ON PO.person_id = PF.person_id
WHERE SD.language_id=26
AND SV.skill_id= @Skill
GROUP BY
PO.person_id,
PO.course_startdate,
SV.skill_id,
P.lastname,
P.firstname,
P.external_id,
PC.job_title,
SD.name,
SV.schedule_days,
SD.language_id
ORDER BY RANK() OVER (PARTITION BY Employee ORDER BY PO.course_startdate DESC)