过滤按唯一 ID 分区的列,不重复
Filtering on columns partitioning on unique id without duplicates
我正在尝试创建一个规则来过滤基于分区的事件,并且我正在尝试以一种比我正在做的更好的方式来做到这一点:
我有以下事件table:
EVENT_ID,USER_PROPERTIES_KEY,USER_PROPERTIES_VALUE
1,country,us
1,country_id,fr
2,country,uk
3,country_id,it
4,platform,Android
4,country,cn
objective是根据以下规则添加一列country_code:
如果对于相同的 event_id,USER_PROPERTIES_KEY = 国家和 USER_PROPERTIES_KEY = country_id,country_code 将采用 USER_PROPERTIES_VALUE,其中 USER_PROPERTIES_KEY = country_id,对于EVENT_ID = 1,则为'fr',否则为event_id的USER_PROPERTIES_KEY的USER_PROPERTIES_VALUE , 否则如果 none 的国家和 country_id 被尊重,它取 null.
最终预期结果为:
EVENT_ID,USER_PROPERTIES_KEY,USER_PROPERTIES_VALUE,COUNTRY_CODE
1,country_id,fr,fr
2,country,uk,uk
3,country_id,it,it
4,platform,Android,null
4,country,cn,cn
我必须在查询后对 EVENT_ID 进行分区,并在 'NA' 上进行过滤,但我做不到。
select
EVENT_ID,
USER_PROPERTIES_KEY,
USER_PROPERTIES_VALUE,
CASE
WHEN USER_PROPERTIES_KEY like 'country%' THEN
CASE WHEN count(user_properties_key like 'country%') over(partition by EVENT_ID) > 1
THEN
CASE
WHEN user_properties_key = 'country_id'
THEN UPPER(user_properties_value) ELSE 'NA'
END
WHEN count(user_properties_key like 'country%') over(partition by EVENT_ID) = 1
THEN UPPER(user_properties_value) ELSE 'NA'
END
END
AS country_code
from events;
您可以尝试first_value
按请求的优先顺序排序。您还可以使用 case
和 row_number
标记您想要的行
select
EVENT_ID,
USER_PROPERTIES_KEY,
USER_PROPERTIES_VALUE,
country_code
from (
select
EVENT_ID,
USER_PROPERTIES_KEY,
USER_PROPERTIES_VALUE,
case when USER_PROPERTIES_KEY like 'country%' then
row_number()over(partition by EVENT_ID order by case USER_PROPERTIES_KEY when 'country_id' then 1 when 'country' then 2 else 3 end)
else 1 end flag,
first_value(case when USER_PROPERTIES_KEY like 'country%' THEN UPPER(user_properties_value) END)
over(partition by EVENT_ID order by case USER_PROPERTIES_KEY when 'country_id' then 1 when 'country' then 2 else 3 end)
AS country_code
from events
) t
where flag = 1;
我正在尝试创建一个规则来过滤基于分区的事件,并且我正在尝试以一种比我正在做的更好的方式来做到这一点:
我有以下事件table:
EVENT_ID,USER_PROPERTIES_KEY,USER_PROPERTIES_VALUE
1,country,us
1,country_id,fr
2,country,uk
3,country_id,it
4,platform,Android
4,country,cn
objective是根据以下规则添加一列country_code:
如果对于相同的 event_id,USER_PROPERTIES_KEY = 国家和 USER_PROPERTIES_KEY = country_id,country_code 将采用 USER_PROPERTIES_VALUE,其中 USER_PROPERTIES_KEY = country_id,对于EVENT_ID = 1,则为'fr',否则为event_id的USER_PROPERTIES_KEY的USER_PROPERTIES_VALUE , 否则如果 none 的国家和 country_id 被尊重,它取 null.
最终预期结果为:
EVENT_ID,USER_PROPERTIES_KEY,USER_PROPERTIES_VALUE,COUNTRY_CODE
1,country_id,fr,fr
2,country,uk,uk
3,country_id,it,it
4,platform,Android,null
4,country,cn,cn
我必须在查询后对 EVENT_ID 进行分区,并在 'NA' 上进行过滤,但我做不到。
select
EVENT_ID,
USER_PROPERTIES_KEY,
USER_PROPERTIES_VALUE,
CASE
WHEN USER_PROPERTIES_KEY like 'country%' THEN
CASE WHEN count(user_properties_key like 'country%') over(partition by EVENT_ID) > 1
THEN
CASE
WHEN user_properties_key = 'country_id'
THEN UPPER(user_properties_value) ELSE 'NA'
END
WHEN count(user_properties_key like 'country%') over(partition by EVENT_ID) = 1
THEN UPPER(user_properties_value) ELSE 'NA'
END
END
AS country_code
from events;
您可以尝试first_value
按请求的优先顺序排序。您还可以使用 case
和 row_number
select
EVENT_ID,
USER_PROPERTIES_KEY,
USER_PROPERTIES_VALUE,
country_code
from (
select
EVENT_ID,
USER_PROPERTIES_KEY,
USER_PROPERTIES_VALUE,
case when USER_PROPERTIES_KEY like 'country%' then
row_number()over(partition by EVENT_ID order by case USER_PROPERTIES_KEY when 'country_id' then 1 when 'country' then 2 else 3 end)
else 1 end flag,
first_value(case when USER_PROPERTIES_KEY like 'country%' THEN UPPER(user_properties_value) END)
over(partition by EVENT_ID order by case USER_PROPERTIES_KEY when 'country_id' then 1 when 'country' then 2 else 3 end)
AS country_code
from events
) t
where flag = 1;