mysql 每行将一个 table 的数据与其他 table 的数据连接起来
mysql joining one table's data with other tables each row
我试图为以下 table.
的给定日期范围生成报告
table_columns => employee_id |date | status
其中状态 1 = not_visited、2 = 已访问、3 = 已取消、4 = 待定(待批准)
报告应如下所示:
+-------------+------------+-------+-------------+---------+----------+---------+
| employee_id | date | total | not_visited | visited | canceled | pending |
+-------------+------------+-------+-------------+---------+----------+---------+
| 3 | 2021-06-01 | 10 | 10 | 0 | 0 | 0 |
| 3 | 2021-06-02 | 22 | 10 | 2 | 10 | 0 |
| 3 | 2021-06-03 | 10 | 10 | 0 | 0 | 0 |
| 3 | 2021-06-05 | 11 | 10 | 1 | 0 | 0 |
| 4 | 2021-06-01 | 11 | 8 | 3 | 0 | 0 |
| 5 | 2021-06-01 | 10 | 1 | 9 | 0 | 0 |
+-------------+------------+-------+-------------+---------+----------+---------+
此报告的查询是:
select va.employee_id, va.date,
count(*) as total,
sum(case when status = 1 then 1 else 0 end) as not_visited,
sum(case when status = 2 then 1 else 0 end) as visited,
sum(case when status = 3 then 1 else 0 end) as canceled,
sum(case when status = 4 then 1 else 0 end) as pending
from visiting_addresses va
where va.date >= '2021-06-01'
and va.date <= '2021-06-30'
group by va.employee_id, va.date;
如果您查看结果,employee_id = 3 没有日期 2021-06-04
的条目。也没有从 2021-06-06 到 2021-06-30 的数据。我将不得不在结果中包括这个日期。所以我尝试创建另一个查询,该查询将生成给定范围之间的日期。以下查询将执行此操作
SELECT gen_date
FROM
(SELECT v.gen_date
FROM
(SELECT ADDDATE('1970-01-01',t4 * 10000 + t3 * 1000 + t2 * 100 + t1 * 10 + t0) gen_date
FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t4
) v
WHERE v.gen_date BETWEEN '2021-06-01' AND '2021-06-30'
) calendar;
此查询将生成如下日期:
+------------+
| gen_date |
+------------+
| 2021-06-01 |
| 2021-06-02 |
| 2021-06-03 |
| .......... |
| ...........|
| 2021-06-27 |
| 2021-06-28 |
| 2021-06-29 |
| 2021-06-30 |
+------------+
现在的问题是,我如何以某种方式加入上面的两个查询,以便对于每个 employee_id,所有日期都出现在结果中?或者甚至可以这样吗?
(实际table包含500万行。employee_id列的基数为3k++,日期和employee_id列被索引)
您标记了 MySQL 和 MariaDB。这两个 DBMS 是亲戚,但它们仍然是不同的 DBMS。在 MariaDB 中,您可以使用 built-in seq
:
轻松生成一个系列
select date '2021-06-01' + interval seq day as date from seq_0_to_29
在 MySQL 中这是不可用的,您可能会为此使用递归查询:
with recursive dates (date) as
(
select date '2021-06-01'
union all
select date + interval 1 day
from dates
where date < date '2021-06-30'
)
在递归查询中,您当然可以动态生成日期,例如table 中的最后一个月,或者说,当前和上个月。
在任何 SQL 方言中,您都可以加入查询。在您的情况下,您希望所有日期(如图所示生成)与所有员工(通过从员工 table 中选择)或仅与 visiting_addresses table 中的员工相结合。如果您只想要 table 中有数据的员工,请使用:
select distinct employee_id from visiting_addresses
为了获得所有组合,您将交叉连接两个数据集。然后你从你的 table 外部加入数据,以便也保持 employees/dates 没有访问。
查询格式为:
select
employees.employee_id,
dates.date,
visits.total,
visits.not_visited,
...
from ( <date sequence query here> ) dates
cross join ( <employee table query here> ) employees
left outer join ( <visits table query here> ) visits
on visits.date = dates.date
and visits.employee_id = employees.employee_id
order by employees.employee_id, dates.date;
(如果您希望所有员工都这样做,则只需将 ( <employee table query here> ) employees
替换为 table 姓名 employees
。
为了便于阅读,您可能更喜欢 WITH
子句:
with recursive dates (date) as ( <date sequence query here> )
, employees as ( <employee table query here> )
, visits as ( <visits table query here> )
select
employees.employee_id,
dates.date,
visits.total,
visits.not_visited,
...
from dates
cross join employees
left outer join visits
on visits.date = dates.date
and visits.employee_id = employees.employee_id
order by employees.employee_id, dates.date;
您提到您的 table 很大。我建议为此查询使用以下索引:
create index idx on visiting_addresses (date, employee_id, status);
我试图为以下 table.
的给定日期范围生成报告table_columns => employee_id |date | status
其中状态 1 = not_visited、2 = 已访问、3 = 已取消、4 = 待定(待批准) 报告应如下所示:
+-------------+------------+-------+-------------+---------+----------+---------+
| employee_id | date | total | not_visited | visited | canceled | pending |
+-------------+------------+-------+-------------+---------+----------+---------+
| 3 | 2021-06-01 | 10 | 10 | 0 | 0 | 0 |
| 3 | 2021-06-02 | 22 | 10 | 2 | 10 | 0 |
| 3 | 2021-06-03 | 10 | 10 | 0 | 0 | 0 |
| 3 | 2021-06-05 | 11 | 10 | 1 | 0 | 0 |
| 4 | 2021-06-01 | 11 | 8 | 3 | 0 | 0 |
| 5 | 2021-06-01 | 10 | 1 | 9 | 0 | 0 |
+-------------+------------+-------+-------------+---------+----------+---------+
此报告的查询是:
select va.employee_id, va.date,
count(*) as total,
sum(case when status = 1 then 1 else 0 end) as not_visited,
sum(case when status = 2 then 1 else 0 end) as visited,
sum(case when status = 3 then 1 else 0 end) as canceled,
sum(case when status = 4 then 1 else 0 end) as pending
from visiting_addresses va
where va.date >= '2021-06-01'
and va.date <= '2021-06-30'
group by va.employee_id, va.date;
如果您查看结果,employee_id = 3 没有日期 2021-06-04
的条目。也没有从 2021-06-06 到 2021-06-30 的数据。我将不得不在结果中包括这个日期。所以我尝试创建另一个查询,该查询将生成给定范围之间的日期。以下查询将执行此操作
SELECT gen_date
FROM
(SELECT v.gen_date
FROM
(SELECT ADDDATE('1970-01-01',t4 * 10000 + t3 * 1000 + t2 * 100 + t1 * 10 + t0) gen_date
FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION
SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION
SELECT 8 UNION SELECT 9) t4
) v
WHERE v.gen_date BETWEEN '2021-06-01' AND '2021-06-30'
) calendar;
此查询将生成如下日期:
+------------+
| gen_date |
+------------+
| 2021-06-01 |
| 2021-06-02 |
| 2021-06-03 |
| .......... |
| ...........|
| 2021-06-27 |
| 2021-06-28 |
| 2021-06-29 |
| 2021-06-30 |
+------------+
现在的问题是,我如何以某种方式加入上面的两个查询,以便对于每个 employee_id,所有日期都出现在结果中?或者甚至可以这样吗? (实际table包含500万行。employee_id列的基数为3k++,日期和employee_id列被索引)
您标记了 MySQL 和 MariaDB。这两个 DBMS 是亲戚,但它们仍然是不同的 DBMS。在 MariaDB 中,您可以使用 built-in seq
:
select date '2021-06-01' + interval seq day as date from seq_0_to_29
在 MySQL 中这是不可用的,您可能会为此使用递归查询:
with recursive dates (date) as
(
select date '2021-06-01'
union all
select date + interval 1 day
from dates
where date < date '2021-06-30'
)
在递归查询中,您当然可以动态生成日期,例如table 中的最后一个月,或者说,当前和上个月。
在任何 SQL 方言中,您都可以加入查询。在您的情况下,您希望所有日期(如图所示生成)与所有员工(通过从员工 table 中选择)或仅与 visiting_addresses table 中的员工相结合。如果您只想要 table 中有数据的员工,请使用:
select distinct employee_id from visiting_addresses
为了获得所有组合,您将交叉连接两个数据集。然后你从你的 table 外部加入数据,以便也保持 employees/dates 没有访问。
查询格式为:
select
employees.employee_id,
dates.date,
visits.total,
visits.not_visited,
...
from ( <date sequence query here> ) dates
cross join ( <employee table query here> ) employees
left outer join ( <visits table query here> ) visits
on visits.date = dates.date
and visits.employee_id = employees.employee_id
order by employees.employee_id, dates.date;
(如果您希望所有员工都这样做,则只需将 ( <employee table query here> ) employees
替换为 table 姓名 employees
。
为了便于阅读,您可能更喜欢 WITH
子句:
with recursive dates (date) as ( <date sequence query here> )
, employees as ( <employee table query here> )
, visits as ( <visits table query here> )
select
employees.employee_id,
dates.date,
visits.total,
visits.not_visited,
...
from dates
cross join employees
left outer join visits
on visits.date = dates.date
and visits.employee_id = employees.employee_id
order by employees.employee_id, dates.date;
您提到您的 table 很大。我建议为此查询使用以下索引:
create index idx on visiting_addresses (date, employee_id, status);