使用 vertica 匹配子句发出识别模式
Issue identifying pattern with vertica match clause
我在理解如何利用 Vertica 的匹配子句来识别用户在我们网站上搜索某些内容 (event_category ='Search') 然后看到产品轮播的会话时遇到了一些困难项目(product_list ='banner' 和 event_action ='impression')。
在我想要确定的模式之前、之后和期间捕获了不同的事件,因为页面上显示的产品数量和用户对我们网站的参与度可能因会话和用户而异用户。
原始数据示例
| hit_number | product_list | Event_Category | Event_Action | Event_Label |
|------------|----------------------|----------------|--------------|---------------|
| 105 | (null) | Search | Submit | chocolate |
| 106 | (null) | eec | impression | search-result |
| 107 | search-result | eec | impression | sendData |
| 107 | search-result | eec | impression | sendData |
| 107 | search-result | eec | impression | sendData |
| 107 | search-result | eec | impression | sendData |
| 108 | (null) | (null) | (null) | (null) |
| 109 | (null) | eec | impression | banner |
| 110 | banner-105-chocolate | eec | impression | sendData |
| 110 | banner-105-chocolate | eec | impression | sendData |
| 110 | banner-105-chocolate | eec | impression | sendData |
要使模式有效,必须至少有 1 次搜索事件和 1 次横幅展示,我已将模式设置为 (Search+ Banner+) 以反映这一点,但我没有返回任何结果运行 执行如下所示的 SQL 查询。
SELECT
page_title
,event_label
,event_name()
,match_id()
,pattern_id()
FROM
(SELECT
unique_visit_id
,hit_number
,event_category
,event_label
,event_action
,product_list
FROM
atomic.ga_sessions_hits_product_expanded
WHERE
1=1
AND ga_sessions_date >= CURRENT_DATE -3
AND unique_visit_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
ORDER BY
hit_number ASC) base
Match
(Partition by unique_visit_id Order by hit_number
Define
Search as event_category ='Search' and event_action = 'Submit',
Banner as product_list ilike 'banner-%' and event_action ='impression'
Pattern
P as (Search+ BannerImpression+)
ROWS MATCH FIRST EVENT)
如果有什么需要澄清的,请告诉我,任何见解或帮助将不胜感激!
首先,您作为分区依据的列不在示例输入中。我添加了它并为输入数据中的所有行赋予了值 42。
您的问题是,在该数据片段中没有任何模式,其中您命名为 banner
的事件紧跟在您命名为 search
的事件之后
我在最后的 DEFINE 子句中添加了另一个事件。如果其他两个的计算结果不为真,则将选择刚刚定义为 other AS true
的最后一个(这是 ROWS MATCH FIRST EVENT 的行为)。
然后模式变成 (search+ other* banner+)
,然后找到那个。
看这里:
WITH
ga_sessions_hits_product_expanded(
unique_visit_id,hit_number,product_list,Event_Category,Event_Action,Event_Label
) AS (
SELECT 42,105,NULL,'Search','Submit','chocolate'
UNION ALL SELECT 42,106,NULL,'eec','impression','search-result'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,108,NULL,NULL,NULL,NULL
UNION ALL SELECT 42,109,NULL,'eec','impression','banner'
UNION ALL SELECT 42,110,'banner-105-chocolate','eec','impression','sendData'
UNION ALL SELECT 42,110,'banner-105-chocolate','eec','impression','sendData'
UNION ALL SELECT 42,110,'banner-105-chocolate','eec','impression','sendData'
)
SELECT
*
, event_name()
, pattern_id()
, match_id()
FROM ga_sessions_hits_product_expanded
MATCH(
PARTITION BY unique_visit_id ORDER BY hit_number
DEFINE
search AS event_category='Search' AND event_action='Submit'
, banner AS product_list ILIKE 'banner-%' AND event_action='impression'
, other AS true
PATTERN p AS (search+ other* banner+)
ROWS MATCH FIRST EVENT
);
-- out Null display is "NULL".
-- out unique_visit_id | hit_number | product_list | Event_Category | Event_Action | Event_Label | event_name | pattern_id | match_id
-- out -----------------+------------+----------------------+----------------+--------------+---------------+------------+------------+----------
-- out 42 | 105 | NULL | Search | Submit | chocolate | search | 1 | 1
-- out 42 | 106 | NULL | eec | impression | search-result | other | 1 | 2
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 3
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 4
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 5
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 6
-- out 42 | 108 | NULL | NULL | NULL | NULL | other | 1 | 7
-- out 42 | 109 | NULL | eec | impression | banner | other | 1 | 8
-- out 42 | 110 | banner-105-chocolate | eec | impression | sendData | banner | 1 | 9
-- out 42 | 110 | banner-105-chocolate | eec | impression | sendData | banner | 1 | 10
-- out 42 | 110 | banner-105-chocolate | eec | impression | sendData | banner | 1 | 11
-- out (11 rows)
-- out
-- out Time: First fetch (11 rows): 50.632 ms. All rows formatted: 50.721 ms
我在理解如何利用 Vertica 的匹配子句来识别用户在我们网站上搜索某些内容 (event_category ='Search') 然后看到产品轮播的会话时遇到了一些困难项目(product_list ='banner' 和 event_action ='impression')。
在我想要确定的模式之前、之后和期间捕获了不同的事件,因为页面上显示的产品数量和用户对我们网站的参与度可能因会话和用户而异用户。
原始数据示例
| hit_number | product_list | Event_Category | Event_Action | Event_Label |
|------------|----------------------|----------------|--------------|---------------|
| 105 | (null) | Search | Submit | chocolate |
| 106 | (null) | eec | impression | search-result |
| 107 | search-result | eec | impression | sendData |
| 107 | search-result | eec | impression | sendData |
| 107 | search-result | eec | impression | sendData |
| 107 | search-result | eec | impression | sendData |
| 108 | (null) | (null) | (null) | (null) |
| 109 | (null) | eec | impression | banner |
| 110 | banner-105-chocolate | eec | impression | sendData |
| 110 | banner-105-chocolate | eec | impression | sendData |
| 110 | banner-105-chocolate | eec | impression | sendData |
要使模式有效,必须至少有 1 次搜索事件和 1 次横幅展示,我已将模式设置为 (Search+ Banner+) 以反映这一点,但我没有返回任何结果运行 执行如下所示的 SQL 查询。
SELECT
page_title
,event_label
,event_name()
,match_id()
,pattern_id()
FROM
(SELECT
unique_visit_id
,hit_number
,event_category
,event_label
,event_action
,product_list
FROM
atomic.ga_sessions_hits_product_expanded
WHERE
1=1
AND ga_sessions_date >= CURRENT_DATE -3
AND unique_visit_id = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
ORDER BY
hit_number ASC) base
Match
(Partition by unique_visit_id Order by hit_number
Define
Search as event_category ='Search' and event_action = 'Submit',
Banner as product_list ilike 'banner-%' and event_action ='impression'
Pattern
P as (Search+ BannerImpression+)
ROWS MATCH FIRST EVENT)
如果有什么需要澄清的,请告诉我,任何见解或帮助将不胜感激!
首先,您作为分区依据的列不在示例输入中。我添加了它并为输入数据中的所有行赋予了值 42。
您的问题是,在该数据片段中没有任何模式,其中您命名为 banner
的事件紧跟在您命名为 search
我在最后的 DEFINE 子句中添加了另一个事件。如果其他两个的计算结果不为真,则将选择刚刚定义为 other AS true
的最后一个(这是 ROWS MATCH FIRST EVENT 的行为)。
然后模式变成 (search+ other* banner+)
,然后找到那个。
看这里:
WITH
ga_sessions_hits_product_expanded(
unique_visit_id,hit_number,product_list,Event_Category,Event_Action,Event_Label
) AS (
SELECT 42,105,NULL,'Search','Submit','chocolate'
UNION ALL SELECT 42,106,NULL,'eec','impression','search-result'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,107,'search-result','eec','impression','sendData'
UNION ALL SELECT 42,108,NULL,NULL,NULL,NULL
UNION ALL SELECT 42,109,NULL,'eec','impression','banner'
UNION ALL SELECT 42,110,'banner-105-chocolate','eec','impression','sendData'
UNION ALL SELECT 42,110,'banner-105-chocolate','eec','impression','sendData'
UNION ALL SELECT 42,110,'banner-105-chocolate','eec','impression','sendData'
)
SELECT
*
, event_name()
, pattern_id()
, match_id()
FROM ga_sessions_hits_product_expanded
MATCH(
PARTITION BY unique_visit_id ORDER BY hit_number
DEFINE
search AS event_category='Search' AND event_action='Submit'
, banner AS product_list ILIKE 'banner-%' AND event_action='impression'
, other AS true
PATTERN p AS (search+ other* banner+)
ROWS MATCH FIRST EVENT
);
-- out Null display is "NULL".
-- out unique_visit_id | hit_number | product_list | Event_Category | Event_Action | Event_Label | event_name | pattern_id | match_id
-- out -----------------+------------+----------------------+----------------+--------------+---------------+------------+------------+----------
-- out 42 | 105 | NULL | Search | Submit | chocolate | search | 1 | 1
-- out 42 | 106 | NULL | eec | impression | search-result | other | 1 | 2
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 3
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 4
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 5
-- out 42 | 107 | search-result | eec | impression | sendData | other | 1 | 6
-- out 42 | 108 | NULL | NULL | NULL | NULL | other | 1 | 7
-- out 42 | 109 | NULL | eec | impression | banner | other | 1 | 8
-- out 42 | 110 | banner-105-chocolate | eec | impression | sendData | banner | 1 | 9
-- out 42 | 110 | banner-105-chocolate | eec | impression | sendData | banner | 1 | 10
-- out 42 | 110 | banner-105-chocolate | eec | impression | sendData | banner | 1 | 11
-- out (11 rows)
-- out
-- out Time: First fetch (11 rows): 50.632 ms. All rows formatted: 50.721 ms