查找 SQL 中分组行程的总距离
Finding the total distance of a grouped trips in SQL
我有一个数据显示出租车服务的个人和团体行程
|----------------------------------------------------------------------------------------------------------------------------------------------|
| Trip_id | Trip_Created_Time | start_lat | start_lon | end_lat | end_lon | trip_updated_time | Is_Group |
|----------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | 2021-07-01 17:29:51 | 81.91892 | -42.19823 | 81.90281 | -42.38918 | 2021-07-01 17:35:21 | YES |
| 2 | 2021-07-01 17:31:52 | 81.91892 | -42.46920 | 81.97392 | -42.37819 | 2021-07-01 17:52:51 | YES |
| 3 | 2021-07-02 21:50:51 | 81.91892 | -42.01936 | 81.18937 | -42.01967 | 2021-07-02 22:09:09 | NO |
| 4 | 2021-07-02 23:31:41 | 81.91892 | -42.47821 | 81.01792 | -42.17839 | 2021-07-02 23:41:51 | NO |
| 5 | 2021-09-21 20:12:54 | 81.91892 | -42.47821 | 81.63829 | -42.67292 | 2021-09-21 20:42:54 | YES |
| 6 | 2021-09-21 20:15:21 | 81.91892 | -42.47821 | 81.62819 | -42.01927 | 2021-09-21 20:59:21 | YES |
| 7 | 2021-09-21 20:17:23 | 81.91892 | -42.47821 | 81.03926 | -42.36284 | 2021-09-21 21:02:21 | YES |
| 8 | 2021-11-01 02:41:41 | 81.91892 | -42.47821 | 81.36292 | -42.47682 | 2021-07-02 23:41:51 | NO |
| 9 | 2021-12-21 19:19:41 | 81.91892 | -42.47821 | 81.23671 | -42.93628 | 2021-07-02 23:41:51 | NO |
|----------------------------------------------------------------------------------------------------------------------------------------------|
组 = 两个或更多用户从同一位置开始但目的地不同
我正在尝试计算分组和非分组旅行从开始 lat/lon 到结束 lat/lon 的距离
这是我的尝试:
select *,
case when is_group = 'NO'
then haversine(start_lat, start_lon, end_lat, end_lon)
when is_group = 'YES'
then NULL
end as trip_distance
from my_table
当前输出如下:
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Trip_id | Trip_Created_Time | start_lat | start_lon | end_lat | end_lon | trip_updated_time | Is_Group | trip_distance |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | 2021-07-01 17:29:51 | 81.91892 | -42.19823 | 81.90281 | -42.38918 | 2021-07-01 17:35:21 | YES | NULL |
| 2 | 2021-07-01 17:31:52 | 81.91892 | -42.46920 | 81.97392 | -42.37819 | 2021-07-01 17:52:51 | YES | NULL |
| 3 | 2021-07-02 21:50:51 | 81.91892 | -42.01936 | 81.18937 | -42.01967 | 2021-07-02 22:09:09 | NO | 1.289247 |
| 4 | 2021-07-02 23:31:41 | 81.91892 | -42.47821 | 81.01792 | -42.17839 | 2021-07-02 23:41:51 | NO | 0.387922 |
| 5 | 2021-09-21 20:12:54 | 81.91892 | -42.47821 | 81.63829 | -42.67292 | 2021-09-21 20:42:54 | YES | NULL |
| 6 | 2021-09-21 20:15:21 | 81.91892 | -42.47821 | 81.62819 | -42.01927 | 2021-09-21 20:59:21 | YES | NULL |
| 7 | 2021-09-21 20:17:23 | 81.91892 | -42.47821 | 81.03926 | -42.36284 | 2021-09-21 21:02:21 | YES | NULL |
| 8 | 2021-11-01 02:41:41 | 81.91892 | -42.47821 | 81.36292 | -42.47682 | 2021-07-02 23:41:51 | NO | 3.29181 |
| 9 | 2021-12-21 19:19:41 | 81.91892 | -42.47821 | 81.23671 | -42.93628 | 2021-07-02 23:41:51 | NO | 0.29822 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
我如何计算 trip_distance 如果它在一个组中,即 is_group = 'YES'
编辑:分组的最终输出距离应该是该组中所有行程的距离总和,即A->B->C = A+B+C
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Trip_id | Trip_Created_Time | start_lat | start_lon | end_lat | end_lon | trip_updated_time | Is_Group | trip_distance |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | 2021-07-01 17:29:51 | 81.91892 | -42.19823 | 81.90281 | -42.38918 | 2021-07-01 17:35:21 | YES | 1.28463 | <---
| 2 | 2021-07-01 17:31:52 | 81.91892 | -42.46920 | 81.97392 | -42.37819 | 2021-07-01 17:52:51 | YES | 1.28463 | <--- These two total distances are same since grouped
| 3 | 2021-07-02 21:50:51 | 81.91892 | -42.01936 | 81.18937 | -42.01967 | 2021-07-02 22:09:09 | NO | 1.289247 |
| 4 | 2021-07-02 23:31:41 | 81.91892 | -42.47821 | 81.01792 | -42.17839 | 2021-07-02 23:41:51 | NO | 0.387922 |
| 5 | 2021-09-21 20:12:54 | 81.91892 | -42.47821 | 81.63829 | -42.67292 | 2021-09-21 20:42:54 | YES | 4.38921 | <---
| 6 | 2021-09-21 20:15:21 | 81.91892 | -42.47821 | 81.62819 | -42.01927 | 2021-09-21 20:59:21 | YES | 4.38921 | <---
| 7 | 2021-09-21 20:17:23 | 81.91892 | -42.47821 | 81.03926 | -42.36284 | 2021-09-21 21:02:21 | YES | 4.38921 | <--- These three total distances are same since grouped
| 8 | 2021-11-01 02:41:41 | 81.91892 | -42.47821 | 81.36292 | -42.47682 | 2021-07-02 23:41:51 | NO | 3.29181 |
| 9 | 2021-12-21 19:19:41 | 81.91892 | -42.47821 | 81.23671 | -42.93628 | 2021-07-02 23:41:51 | NO | 0.29822 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
如果我没理解错的话,你需要上一站的终点来计算到下一站的距离。我会尝试这样的事情。
select Trip_id,
Trip_Created_Time,
trip_updated_time,
start_lat,
start_lon,
lag(end_lat, 1, 0) over (partition by start_lat, date_trunc(hour, Trip_Created_Time) order by Trip_Created_Time, trip_updated_time) as mid_point_lat,
lag(end_lon, 1, 0) over (partition by start_lat, date_trunc(hour, Trip_Created_Time) order by Trip_Created_Time, trip_updated_time) as mid_point_lon,
end_lat,
end_lon,
Is_Group,
case when is_group = 'NO' or (Is_Group = 'YES' and mid_point_lat = 0)
then haversine(start_lat, start_lon, end_lat, end_lon)
when is_group = 'YES' and mid_point_lat != 0
then haversine(mid_point_lat, mid_point_lon, end_lat, end_lon)
end as trip_distance
from my_table
order by Trip_id ;
以上查询将为您提供每次旅行的真实距离。最后一步是对这些进行分组并对距离求和
所以你的例子 data/SQL 不会产生你的例子输出:
WITH fake_data AS (
SELECT * FROM VALUES
( 1, '2021-07-01 17:29:51', 81.91892, -42.19823, 81.90281, -42.38918, '2021-07-01 17:35:21', 'YES', 1),
( 2, '2021-07-01 17:31:52', 81.91892, -42.46920, 81.97392, -42.37819, '2021-07-01 17:52:51', 'YES', 1),
( 3, '2021-07-02 21:50:51', 81.91892, -42.01936, 81.18937, -42.01967, '2021-07-02 22:09:09', 'NO', null),
( 4, '2021-07-02 23:31:41', 81.91892, -42.47821, 81.01792, -42.17839, '2021-07-02 23:41:51', 'NO', null),
( 5, '2021-09-21 20:12:54', 81.91892, -42.47821, 81.63829, -42.67292, '2021-09-21 20:42:54', 'YES', 2),
( 6, '2021-09-21 20:15:21', 81.91892, -42.47821, 81.62819, -42.01927, '2021-09-21 20:59:21', 'YES', 2),
( 7, '2021-09-21 20:17:23', 81.91892, -42.47821, 81.03926, -42.36284, '2021-09-21 21:02:21', 'YES', 2),
( 8, '2021-11-01 02:41:41', 81.91892, -42.47821, 81.36292, -42.47682, '2021-07-02 23:41:51', 'NO', null),
( 9, '2021-12-21 19:19:41', 81.91892, -42.47821, 81.23671, -42.93628, '2021-07-02 23:41:51', 'NO', null)
t(trip_id, trip_created_time, start_lat, start_lon, end_lat, end_lon, trip_updated_time, is_group, group_id)
)
select *,
case when is_group = 'NO'
then haversine(start_lat, start_lon, end_lat, end_lon)
when is_group = 'YES'
then NULL
end as trip_distance
from fake_data;
创建:
TRIP_ID
TRIP_CREATED_TIME
START_LAT
START_LON
END_LAT
END_LON
TRIP_UPDATED_TIME
IS_GROUP
GROUP_ID
TRIP_DISTANCE
1
2021-07-01 17:29:51
81.91892
-42.19823
81.90281
-42.38918
2021-07-01 17:35:21
YES
1
2
2021-07-01 17:31:52
81.91892
-42.4692
81.97392
-42.37819
2021-07-01 17:52:51
YES
1
3
2021-07-02 21:50:51
81.91892
-42.01936
81.18937
-42.01967
2021-07-02 22:09:09
NO
81.122258891
4
2021-07-02 23:31:41
81.91892
-42.47821
81.01792
-42.17839
2021-07-02 23:41:51
NO
100.308299209
5
2021-09-21 20:12:54
81.91892
-42.47821
81.63829
-42.67292
2021-09-21 20:42:54
YES
2
6
2021-09-21 20:15:21
81.91892
-42.47821
81.62819
-42.01927
2021-09-21 20:59:21
YES
2
7
2021-09-21 20:17:23
81.91892
-42.47821
81.03926
-42.36284
2021-09-21 21:02:21
YES
2
8
2021-11-01 02:41:41
81.91892
-42.47821
81.36292
-42.47682
2021-07-02 23:41:51
NO
61.824383293
9
2021-12-21 19:19:41
81.91892
-42.47821
81.23671
-42.93628
2021-07-02 23:41:51
NO
76.223649989
但是如果我们假设这些是有效的:
你描述的想要的(解构的)看起来像(如果提供了分组 ID):
SELECT
trip_id
,trip_created_time
,start_lat
,start_lon
,end_lat
,end_lon
,trip_updated_time
,is_group
,round(haversine(start_lat, start_lon, end_lat, end_lon),3) as dist_km
,sum(dist_km) over (partition by group_id) as group_sum_km
,iff(is_group='YES', group_sum_km, dist_km) as result
FROM fake_data
ORDER BY 1
;
给出:
TRIP_ID
TRIP_CREATED_TIME
START_LAT
START_LON
END_LAT
END_LON
TRIP_UPDATED_TIME
IS_GROUP
DIST_KM
GROUP_SUM_KM
RESULT
1
2021-07-01 17:29:51
81.91892
-42.19823
81.90281
-42.38918
2021-07-01 17:35:21
YES
3.484
9.762
9.762
2
2021-07-01 17:31:52
81.91892
-42.4692
81.97392
-42.37819
2021-07-01 17:52:51
YES
6.278
9.762
9.762
3
2021-07-02 21:50:51
81.91892
-42.01936
81.18937
-42.01967
2021-07-02 22:09:09
NO
81.122
319.478
81.122
4
2021-07-02 23:31:41
81.91892
-42.47821
81.01792
-42.17839
2021-07-02 23:41:51
NO
100.308
319.478
100.308
5
2021-09-21 20:12:54
81.91892
-42.47821
81.63829
-42.67292
2021-09-21 20:42:54
YES
31.358
162.332
162.332
6
2021-09-21 20:15:21
81.91892
-42.47821
81.62819
-42.01927
2021-09-21 20:59:21
YES
33.142
162.332
162.332
7
2021-09-21 20:17:23
81.91892
-42.47821
81.03926
-42.36284
2021-09-21 21:02:21
YES
97.832
162.332
162.332
8
2021-11-01 02:41:41
81.91892
-42.47821
81.36292
-42.47682
2021-07-02 23:41:51
NO
61.824
319.478
61.824
9
2021-12-21 19:19:41
81.91892
-42.47821
81.23671
-42.93628
2021-07-02 23:41:51
NO
76.224
319.478
76.224
可以压缩为:
SELECT
trip_id
,trip_created_time
,start_lat
,start_lon
,end_lat
,end_lon
,trip_updated_time
,is_group
, iff(is_group='YES'
,sum(haversine(start_lat, start_lon, end_lat, end_lon)) over (partition by group_id)
,haversine(start_lat, start_lon, end_lat, end_lon)
) as result
FROM fake_data
ORDER BY 1
;
我有一个数据显示出租车服务的个人和团体行程
|----------------------------------------------------------------------------------------------------------------------------------------------|
| Trip_id | Trip_Created_Time | start_lat | start_lon | end_lat | end_lon | trip_updated_time | Is_Group |
|----------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | 2021-07-01 17:29:51 | 81.91892 | -42.19823 | 81.90281 | -42.38918 | 2021-07-01 17:35:21 | YES |
| 2 | 2021-07-01 17:31:52 | 81.91892 | -42.46920 | 81.97392 | -42.37819 | 2021-07-01 17:52:51 | YES |
| 3 | 2021-07-02 21:50:51 | 81.91892 | -42.01936 | 81.18937 | -42.01967 | 2021-07-02 22:09:09 | NO |
| 4 | 2021-07-02 23:31:41 | 81.91892 | -42.47821 | 81.01792 | -42.17839 | 2021-07-02 23:41:51 | NO |
| 5 | 2021-09-21 20:12:54 | 81.91892 | -42.47821 | 81.63829 | -42.67292 | 2021-09-21 20:42:54 | YES |
| 6 | 2021-09-21 20:15:21 | 81.91892 | -42.47821 | 81.62819 | -42.01927 | 2021-09-21 20:59:21 | YES |
| 7 | 2021-09-21 20:17:23 | 81.91892 | -42.47821 | 81.03926 | -42.36284 | 2021-09-21 21:02:21 | YES |
| 8 | 2021-11-01 02:41:41 | 81.91892 | -42.47821 | 81.36292 | -42.47682 | 2021-07-02 23:41:51 | NO |
| 9 | 2021-12-21 19:19:41 | 81.91892 | -42.47821 | 81.23671 | -42.93628 | 2021-07-02 23:41:51 | NO |
|----------------------------------------------------------------------------------------------------------------------------------------------|
组 = 两个或更多用户从同一位置开始但目的地不同
我正在尝试计算分组和非分组旅行从开始 lat/lon 到结束 lat/lon 的距离
这是我的尝试:
select *,
case when is_group = 'NO'
then haversine(start_lat, start_lon, end_lat, end_lon)
when is_group = 'YES'
then NULL
end as trip_distance
from my_table
当前输出如下:
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Trip_id | Trip_Created_Time | start_lat | start_lon | end_lat | end_lon | trip_updated_time | Is_Group | trip_distance |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | 2021-07-01 17:29:51 | 81.91892 | -42.19823 | 81.90281 | -42.38918 | 2021-07-01 17:35:21 | YES | NULL |
| 2 | 2021-07-01 17:31:52 | 81.91892 | -42.46920 | 81.97392 | -42.37819 | 2021-07-01 17:52:51 | YES | NULL |
| 3 | 2021-07-02 21:50:51 | 81.91892 | -42.01936 | 81.18937 | -42.01967 | 2021-07-02 22:09:09 | NO | 1.289247 |
| 4 | 2021-07-02 23:31:41 | 81.91892 | -42.47821 | 81.01792 | -42.17839 | 2021-07-02 23:41:51 | NO | 0.387922 |
| 5 | 2021-09-21 20:12:54 | 81.91892 | -42.47821 | 81.63829 | -42.67292 | 2021-09-21 20:42:54 | YES | NULL |
| 6 | 2021-09-21 20:15:21 | 81.91892 | -42.47821 | 81.62819 | -42.01927 | 2021-09-21 20:59:21 | YES | NULL |
| 7 | 2021-09-21 20:17:23 | 81.91892 | -42.47821 | 81.03926 | -42.36284 | 2021-09-21 21:02:21 | YES | NULL |
| 8 | 2021-11-01 02:41:41 | 81.91892 | -42.47821 | 81.36292 | -42.47682 | 2021-07-02 23:41:51 | NO | 3.29181 |
| 9 | 2021-12-21 19:19:41 | 81.91892 | -42.47821 | 81.23671 | -42.93628 | 2021-07-02 23:41:51 | NO | 0.29822 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
我如何计算 trip_distance 如果它在一个组中,即 is_group = 'YES'
编辑:分组的最终输出距离应该是该组中所有行程的距离总和,即A->B->C = A+B+C
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Trip_id | Trip_Created_Time | start_lat | start_lon | end_lat | end_lon | trip_updated_time | Is_Group | trip_distance |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | 2021-07-01 17:29:51 | 81.91892 | -42.19823 | 81.90281 | -42.38918 | 2021-07-01 17:35:21 | YES | 1.28463 | <---
| 2 | 2021-07-01 17:31:52 | 81.91892 | -42.46920 | 81.97392 | -42.37819 | 2021-07-01 17:52:51 | YES | 1.28463 | <--- These two total distances are same since grouped
| 3 | 2021-07-02 21:50:51 | 81.91892 | -42.01936 | 81.18937 | -42.01967 | 2021-07-02 22:09:09 | NO | 1.289247 |
| 4 | 2021-07-02 23:31:41 | 81.91892 | -42.47821 | 81.01792 | -42.17839 | 2021-07-02 23:41:51 | NO | 0.387922 |
| 5 | 2021-09-21 20:12:54 | 81.91892 | -42.47821 | 81.63829 | -42.67292 | 2021-09-21 20:42:54 | YES | 4.38921 | <---
| 6 | 2021-09-21 20:15:21 | 81.91892 | -42.47821 | 81.62819 | -42.01927 | 2021-09-21 20:59:21 | YES | 4.38921 | <---
| 7 | 2021-09-21 20:17:23 | 81.91892 | -42.47821 | 81.03926 | -42.36284 | 2021-09-21 21:02:21 | YES | 4.38921 | <--- These three total distances are same since grouped
| 8 | 2021-11-01 02:41:41 | 81.91892 | -42.47821 | 81.36292 | -42.47682 | 2021-07-02 23:41:51 | NO | 3.29181 |
| 9 | 2021-12-21 19:19:41 | 81.91892 | -42.47821 | 81.23671 | -42.93628 | 2021-07-02 23:41:51 | NO | 0.29822 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
如果我没理解错的话,你需要上一站的终点来计算到下一站的距离。我会尝试这样的事情。
select Trip_id,
Trip_Created_Time,
trip_updated_time,
start_lat,
start_lon,
lag(end_lat, 1, 0) over (partition by start_lat, date_trunc(hour, Trip_Created_Time) order by Trip_Created_Time, trip_updated_time) as mid_point_lat,
lag(end_lon, 1, 0) over (partition by start_lat, date_trunc(hour, Trip_Created_Time) order by Trip_Created_Time, trip_updated_time) as mid_point_lon,
end_lat,
end_lon,
Is_Group,
case when is_group = 'NO' or (Is_Group = 'YES' and mid_point_lat = 0)
then haversine(start_lat, start_lon, end_lat, end_lon)
when is_group = 'YES' and mid_point_lat != 0
then haversine(mid_point_lat, mid_point_lon, end_lat, end_lon)
end as trip_distance
from my_table
order by Trip_id ;
以上查询将为您提供每次旅行的真实距离。最后一步是对这些进行分组并对距离求和
所以你的例子 data/SQL 不会产生你的例子输出:
WITH fake_data AS (
SELECT * FROM VALUES
( 1, '2021-07-01 17:29:51', 81.91892, -42.19823, 81.90281, -42.38918, '2021-07-01 17:35:21', 'YES', 1),
( 2, '2021-07-01 17:31:52', 81.91892, -42.46920, 81.97392, -42.37819, '2021-07-01 17:52:51', 'YES', 1),
( 3, '2021-07-02 21:50:51', 81.91892, -42.01936, 81.18937, -42.01967, '2021-07-02 22:09:09', 'NO', null),
( 4, '2021-07-02 23:31:41', 81.91892, -42.47821, 81.01792, -42.17839, '2021-07-02 23:41:51', 'NO', null),
( 5, '2021-09-21 20:12:54', 81.91892, -42.47821, 81.63829, -42.67292, '2021-09-21 20:42:54', 'YES', 2),
( 6, '2021-09-21 20:15:21', 81.91892, -42.47821, 81.62819, -42.01927, '2021-09-21 20:59:21', 'YES', 2),
( 7, '2021-09-21 20:17:23', 81.91892, -42.47821, 81.03926, -42.36284, '2021-09-21 21:02:21', 'YES', 2),
( 8, '2021-11-01 02:41:41', 81.91892, -42.47821, 81.36292, -42.47682, '2021-07-02 23:41:51', 'NO', null),
( 9, '2021-12-21 19:19:41', 81.91892, -42.47821, 81.23671, -42.93628, '2021-07-02 23:41:51', 'NO', null)
t(trip_id, trip_created_time, start_lat, start_lon, end_lat, end_lon, trip_updated_time, is_group, group_id)
)
select *,
case when is_group = 'NO'
then haversine(start_lat, start_lon, end_lat, end_lon)
when is_group = 'YES'
then NULL
end as trip_distance
from fake_data;
创建:
TRIP_ID | TRIP_CREATED_TIME | START_LAT | START_LON | END_LAT | END_LON | TRIP_UPDATED_TIME | IS_GROUP | GROUP_ID | TRIP_DISTANCE |
---|---|---|---|---|---|---|---|---|---|
1 | 2021-07-01 17:29:51 | 81.91892 | -42.19823 | 81.90281 | -42.38918 | 2021-07-01 17:35:21 | YES | 1 | |
2 | 2021-07-01 17:31:52 | 81.91892 | -42.4692 | 81.97392 | -42.37819 | 2021-07-01 17:52:51 | YES | 1 | |
3 | 2021-07-02 21:50:51 | 81.91892 | -42.01936 | 81.18937 | -42.01967 | 2021-07-02 22:09:09 | NO | 81.122258891 | |
4 | 2021-07-02 23:31:41 | 81.91892 | -42.47821 | 81.01792 | -42.17839 | 2021-07-02 23:41:51 | NO | 100.308299209 | |
5 | 2021-09-21 20:12:54 | 81.91892 | -42.47821 | 81.63829 | -42.67292 | 2021-09-21 20:42:54 | YES | 2 | |
6 | 2021-09-21 20:15:21 | 81.91892 | -42.47821 | 81.62819 | -42.01927 | 2021-09-21 20:59:21 | YES | 2 | |
7 | 2021-09-21 20:17:23 | 81.91892 | -42.47821 | 81.03926 | -42.36284 | 2021-09-21 21:02:21 | YES | 2 | |
8 | 2021-11-01 02:41:41 | 81.91892 | -42.47821 | 81.36292 | -42.47682 | 2021-07-02 23:41:51 | NO | 61.824383293 | |
9 | 2021-12-21 19:19:41 | 81.91892 | -42.47821 | 81.23671 | -42.93628 | 2021-07-02 23:41:51 | NO | 76.223649989 |
但是如果我们假设这些是有效的:
你描述的想要的(解构的)看起来像(如果提供了分组 ID):
SELECT
trip_id
,trip_created_time
,start_lat
,start_lon
,end_lat
,end_lon
,trip_updated_time
,is_group
,round(haversine(start_lat, start_lon, end_lat, end_lon),3) as dist_km
,sum(dist_km) over (partition by group_id) as group_sum_km
,iff(is_group='YES', group_sum_km, dist_km) as result
FROM fake_data
ORDER BY 1
;
给出:
TRIP_ID | TRIP_CREATED_TIME | START_LAT | START_LON | END_LAT | END_LON | TRIP_UPDATED_TIME | IS_GROUP | DIST_KM | GROUP_SUM_KM | RESULT |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2021-07-01 17:29:51 | 81.91892 | -42.19823 | 81.90281 | -42.38918 | 2021-07-01 17:35:21 | YES | 3.484 | 9.762 | 9.762 |
2 | 2021-07-01 17:31:52 | 81.91892 | -42.4692 | 81.97392 | -42.37819 | 2021-07-01 17:52:51 | YES | 6.278 | 9.762 | 9.762 |
3 | 2021-07-02 21:50:51 | 81.91892 | -42.01936 | 81.18937 | -42.01967 | 2021-07-02 22:09:09 | NO | 81.122 | 319.478 | 81.122 |
4 | 2021-07-02 23:31:41 | 81.91892 | -42.47821 | 81.01792 | -42.17839 | 2021-07-02 23:41:51 | NO | 100.308 | 319.478 | 100.308 |
5 | 2021-09-21 20:12:54 | 81.91892 | -42.47821 | 81.63829 | -42.67292 | 2021-09-21 20:42:54 | YES | 31.358 | 162.332 | 162.332 |
6 | 2021-09-21 20:15:21 | 81.91892 | -42.47821 | 81.62819 | -42.01927 | 2021-09-21 20:59:21 | YES | 33.142 | 162.332 | 162.332 |
7 | 2021-09-21 20:17:23 | 81.91892 | -42.47821 | 81.03926 | -42.36284 | 2021-09-21 21:02:21 | YES | 97.832 | 162.332 | 162.332 |
8 | 2021-11-01 02:41:41 | 81.91892 | -42.47821 | 81.36292 | -42.47682 | 2021-07-02 23:41:51 | NO | 61.824 | 319.478 | 61.824 |
9 | 2021-12-21 19:19:41 | 81.91892 | -42.47821 | 81.23671 | -42.93628 | 2021-07-02 23:41:51 | NO | 76.224 | 319.478 | 76.224 |
可以压缩为:
SELECT
trip_id
,trip_created_time
,start_lat
,start_lon
,end_lat
,end_lon
,trip_updated_time
,is_group
, iff(is_group='YES'
,sum(haversine(start_lat, start_lon, end_lat, end_lon)) over (partition by group_id)
,haversine(start_lat, start_lon, end_lat, end_lon)
) as result
FROM fake_data
ORDER BY 1
;