根据条件查询前几行
Query previous rows on a condition
我有一个 table 关于用户在网站上的航班预订模式的数据。假设以下数据是我拥有的关于我的用户的所有历史数据。
session_date
是用户访问网站并搜索特定航线的日期,而 flight_date
是航班的出发日期。我已经通过 session_date
订购了 table。结果记录在booked
.
+---------+--------------+----------------+--------------+-------------+--------+
| user_id | session_date | departure_code | arrival_code | flight_date | booked |
+---------+--------------+----------------+--------------+-------------+--------+
| user1 | 7 Jan | CA | MY | 8 Mar | 1 |
| user1 | 8 Jan | US | MY | 18 May | 0 |
| user1 | 8 Jan | US | MY | 18 May | 1 |
| user1 | 8 Jan | CA | MY | 19 Mar | 0 |
| user1 | 9 Jan | US | MY | 18 May | 1 |
+---------+--------------+----------------+--------------+-------------+--------+
我想在我的 table 中输出一个名为 previous_flight_date
的新列。新列将在每次搜索时说明之前为该特定路线预订的 flight_date
。即使用户多次搜索同一条路线但从未预订过,此列中的值也将为空。
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
| _id | session_date | departure_code | arrival_code | flight_date | booked | previous_flight_date |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
| user1 | 7 Jan | CA | SG | 8 Mar | 1 | null |
| user1 | 8 Jan | US | MY | 18 May | 0 | null |
| user1 | 8 Jan | US | MY | 18 May | 1 | null |
| user1 | 8 Jan | CA | SG | 19 Mar | 0 | 8 Mar |
| user1 | 2 Feb | US | MY | 2 Jul | 1 | 18 May |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
因此,例如,该列在反映“3 月 8 日”的第 4 行之前将为空,因为用户已经预订了当天从 CA-->SG 出发的航班。
我试过使用 LAST_VALUE 但没有成功。当我有多种不同类型的路线时,我也不知道如何使用 LAG(),并且我想根据条件查找前几行。如果建议解决方案会很棒!谢谢。
我开始按照您的建议使用 LAG
,但后来发现用短语表达查询相当困难。对于一种不使用分析函数的方法,我们可以尝试仅使用相关子查询来识别同一航线上最近预订的航班日期。
SELECT user_id, session_date, departure_code, arrival_code, flight_date, booked,
(SELECT t2.flight_date FROM yourTable t2
WHERE t2.departure_code = t1.departure_code AND
t2.arrival_code = t1.arrival_code AND
t2.booked = 1 AND
t2.flight_date < t1.flight_date
ORDER BY t2.flight_date DESC LIMIT 1) AS previous_flight_date
FROM yourTable t1
ORDER BY flight_date;
展示了 MariaDB 的演示,但相同的查询实际上应该 运行 在 BigQuery 上没有任何问题。
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT user_id, session_date, departure_code, arrival_code, flight_date, booked,
MAX(IF(booked = 1, flight_date, NULL)) OVER(previous_flights) AS previous_flight_date
FROM `project.dataset.table`
WINDOW previous_flights AS (
PARTITION BY user_id, departure_code, arrival_code
ORDER BY flight_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
如果应用到您问题中的样本数据,如下例
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'user1' AS user_id, DATE '2020-01-07' AS session_date, 'CA' AS departure_code, 'SG' AS arrival_code, DATE '2020-03-08' AS flight_date, 1 AS booked UNION ALL
SELECT 'user1', '2020-01-08', 'US', 'MY', '2020-05-18', 0 UNION ALL
SELECT 'user1', '2020-01-08', 'US', 'MY', '2020-05-18', 1 UNION ALL
SELECT 'user1', '2020-01-08', 'CA', 'SG', '2020-03-19', 0 UNION ALL
SELECT 'user1', '2020-02-09', 'US', 'MY', '2020-07-02', 1
)
SELECT user_id, session_date, departure_code, arrival_code, flight_date, booked,
MAX(IF(booked = 1, flight_date, NULL)) OVER(previous_flights) AS previous_flight_date
FROM `project.dataset.table`
WINDOW previous_flights AS (
PARTITION BY user_id, departure_code, arrival_code
ORDER BY flight_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
-- ORDER BY flight_date
输出是
Row user_id session_date departure_code arrival_code flight_date booked previous_flight_date
1 user1 2020-01-07 CA SG 2020-03-08 1 null
2 user1 2020-01-08 CA SG 2020-03-19 0 2020-03-08
3 user1 2020-01-08 US MY 2020-05-18 0 null
4 user1 2020-01-08 US MY 2020-05-18 1 null
5 user1 2020-02-09 US MY 2020-07-02 1 2020-05-18
以下是 SQL 使用窗口函数的基于服务器的解决方案。 Big Query 解决方案应该类似于窗口函数是标准的
SELECT
*
, Previous_Flight_Date = MAX(CASE booked = 1 THEN flight_date ELSE NULL END )
OVER (
PARTITION BY user_id, departure_code, arrival_code
ORDER BY flight_date
ROWS UNBOUNDED PRECEDING AND 1 PRECEDING
)
FROM historicTable t
我想你可以用 first_value()
做到这一点。诀窍是在 window 函数中放置一个条件,打开 ignore nulls
选项,然后使用 window 帧规范回顾具有相同 [=20] 的前几行=], 不包括当前行:
select
t.*,
first_value(case when booked = 1 then flight_date end ignore nulls) over(
partition by departure_code, arrival code
order by flight_date desc
rows between unbounded preceding and 1 preceding
) previous_flight_date
from mytable t
实际上 window max()
也可以(然后,不需要 ignore nulls
):
select
t.*,
max(case when booked = 1 then flight_date end) over(
partition by departure_code, arrival code
order by flight_date desc
rows between unbounded preceding and 1 preceding
) previous_flight_date
from mytable t
我有一个 table 关于用户在网站上的航班预订模式的数据。假设以下数据是我拥有的关于我的用户的所有历史数据。
session_date
是用户访问网站并搜索特定航线的日期,而 flight_date
是航班的出发日期。我已经通过 session_date
订购了 table。结果记录在booked
.
+---------+--------------+----------------+--------------+-------------+--------+
| user_id | session_date | departure_code | arrival_code | flight_date | booked |
+---------+--------------+----------------+--------------+-------------+--------+
| user1 | 7 Jan | CA | MY | 8 Mar | 1 |
| user1 | 8 Jan | US | MY | 18 May | 0 |
| user1 | 8 Jan | US | MY | 18 May | 1 |
| user1 | 8 Jan | CA | MY | 19 Mar | 0 |
| user1 | 9 Jan | US | MY | 18 May | 1 |
+---------+--------------+----------------+--------------+-------------+--------+
我想在我的 table 中输出一个名为 previous_flight_date
的新列。新列将在每次搜索时说明之前为该特定路线预订的 flight_date
。即使用户多次搜索同一条路线但从未预订过,此列中的值也将为空。
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
| _id | session_date | departure_code | arrival_code | flight_date | booked | previous_flight_date |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
| user1 | 7 Jan | CA | SG | 8 Mar | 1 | null |
| user1 | 8 Jan | US | MY | 18 May | 0 | null |
| user1 | 8 Jan | US | MY | 18 May | 1 | null |
| user1 | 8 Jan | CA | SG | 19 Mar | 0 | 8 Mar |
| user1 | 2 Feb | US | MY | 2 Jul | 1 | 18 May |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
因此,例如,该列在反映“3 月 8 日”的第 4 行之前将为空,因为用户已经预订了当天从 CA-->SG 出发的航班。
我试过使用 LAST_VALUE 但没有成功。当我有多种不同类型的路线时,我也不知道如何使用 LAG(),并且我想根据条件查找前几行。如果建议解决方案会很棒!谢谢。
我开始按照您的建议使用 LAG
,但后来发现用短语表达查询相当困难。对于一种不使用分析函数的方法,我们可以尝试仅使用相关子查询来识别同一航线上最近预订的航班日期。
SELECT user_id, session_date, departure_code, arrival_code, flight_date, booked,
(SELECT t2.flight_date FROM yourTable t2
WHERE t2.departure_code = t1.departure_code AND
t2.arrival_code = t1.arrival_code AND
t2.booked = 1 AND
t2.flight_date < t1.flight_date
ORDER BY t2.flight_date DESC LIMIT 1) AS previous_flight_date
FROM yourTable t1
ORDER BY flight_date;
展示了 MariaDB 的演示,但相同的查询实际上应该 运行 在 BigQuery 上没有任何问题。
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT user_id, session_date, departure_code, arrival_code, flight_date, booked,
MAX(IF(booked = 1, flight_date, NULL)) OVER(previous_flights) AS previous_flight_date
FROM `project.dataset.table`
WINDOW previous_flights AS (
PARTITION BY user_id, departure_code, arrival_code
ORDER BY flight_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
如果应用到您问题中的样本数据,如下例
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'user1' AS user_id, DATE '2020-01-07' AS session_date, 'CA' AS departure_code, 'SG' AS arrival_code, DATE '2020-03-08' AS flight_date, 1 AS booked UNION ALL
SELECT 'user1', '2020-01-08', 'US', 'MY', '2020-05-18', 0 UNION ALL
SELECT 'user1', '2020-01-08', 'US', 'MY', '2020-05-18', 1 UNION ALL
SELECT 'user1', '2020-01-08', 'CA', 'SG', '2020-03-19', 0 UNION ALL
SELECT 'user1', '2020-02-09', 'US', 'MY', '2020-07-02', 1
)
SELECT user_id, session_date, departure_code, arrival_code, flight_date, booked,
MAX(IF(booked = 1, flight_date, NULL)) OVER(previous_flights) AS previous_flight_date
FROM `project.dataset.table`
WINDOW previous_flights AS (
PARTITION BY user_id, departure_code, arrival_code
ORDER BY flight_date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
)
-- ORDER BY flight_date
输出是
Row user_id session_date departure_code arrival_code flight_date booked previous_flight_date
1 user1 2020-01-07 CA SG 2020-03-08 1 null
2 user1 2020-01-08 CA SG 2020-03-19 0 2020-03-08
3 user1 2020-01-08 US MY 2020-05-18 0 null
4 user1 2020-01-08 US MY 2020-05-18 1 null
5 user1 2020-02-09 US MY 2020-07-02 1 2020-05-18
以下是 SQL 使用窗口函数的基于服务器的解决方案。 Big Query 解决方案应该类似于窗口函数是标准的
SELECT
*
, Previous_Flight_Date = MAX(CASE booked = 1 THEN flight_date ELSE NULL END )
OVER (
PARTITION BY user_id, departure_code, arrival_code
ORDER BY flight_date
ROWS UNBOUNDED PRECEDING AND 1 PRECEDING
)
FROM historicTable t
我想你可以用 first_value()
做到这一点。诀窍是在 window 函数中放置一个条件,打开 ignore nulls
选项,然后使用 window 帧规范回顾具有相同 [=20] 的前几行=], 不包括当前行:
select
t.*,
first_value(case when booked = 1 then flight_date end ignore nulls) over(
partition by departure_code, arrival code
order by flight_date desc
rows between unbounded preceding and 1 preceding
) previous_flight_date
from mytable t
实际上 window max()
也可以(然后,不需要 ignore nulls
):
select
t.*,
max(case when booked = 1 then flight_date end) over(
partition by departure_code, arrival code
order by flight_date desc
rows between unbounded preceding and 1 preceding
) previous_flight_date
from mytable t