SQL 查找具有多行条件的 ID 并创建唯一的 % 公式

SQL finding IDs that have multiple rows of criteria and creating a unique % formula

所以我有以下一组问题:

The table 'visits' a list of all visits with three columns: Visit_ID, Visitor_ID, Timestamp, and Page Name: Visit_ID Visitor_ID Timestamp Page_Name

The table ‘first_visitors' a list of visitors and their first visit with three columns: Visitor_ID, First_Visit_Date, and Channel

A1. Average number of visits, and visitors per pageover the last seven days

A2. Visitor_id and Channel for visitors that have visited all three of ‘home page’, ‘product page’, and ‘confirmation page’ at least once in the last seven days

A3. Percent of SEM visitors that visit the ‘confirmation page’ within thirty days of their first visit.

我对将时间戳转换为日期并使用 date 和 date-7 获取一周有一些顾虑。这是正确的方法吗? (A1)

让 visitor_ids 访问所有 3 个页面也很困难。我尝试使用 having 子句,但不确定这是否正确。 (A2)

最后,很难将一个聚合列除以另一个聚合列来获得百分比,我不确定这是否是正确的方法? (A3)

我的代码如下。非常感谢任何建议。

--A1.   Average number of visits, and visitors per page over the last seven days
select
page_name,
count(visit_ID) as average_visits, 
count(DISTINCT visitor_ID) as average_visitors
from visits 
where cast(timestamp as date) between date and date-7 
group by page_name;

--A2.   Visitor_id and Channel for visitors that have visited all three of ‘home page’, ‘product page’, and ‘confirmation page’ at least once in the last seven days
select
a.visitor_id,
b.channel
from visits a
join first_visitors b on a.visitor_id = b.visitor_id
where cast(a.timestamp as date) between date and date-7 
and a.page_name in ('home page','product page','confirmation page') 
group by a.visitor_id
having count(distinct a.page_name) >= 3;

--A3.   Percent of SEM visitors that visit the ‘confirmation page’ within thirty days of their first visit.
select 
count(*) as visited_confirmation_page from
(select distinct a.visitor_id
    from first_visitors a
    join visits b on b.visitor_id = a.visitor_id
    where channel = 'SEM'
    and b.page_name in 'confirmation_page'
    and cast(b.timestamp as date) between cast(a.first_visit_timestamp as date) and cast(a.first_visit_timestamp as date)+30) 
count(*) as all_SEM_visits
(select distinct a.visitor_id
    from first_visitors a
    where channel = 'SEM')
((visited_confirmation_page / all_SEM_visits) * 100.00) as %_of_SEM_confirmations;

A1 没问题,您只需做一个小的语法变化,而不是 DATE,最好使用标准 SQL 的 CURRENT_DATE:

where cast(timestamp as date) between CURRENT_DATE and CURRENT_DATE-7 

A2 的逻辑正确,但您需要将 b.channel 添加到 group by 以避免语法错误。你应该尝试加入 after 聚合:

select
a.visitor_id,
b.channel
from first_visitors b
join
 (
   select visitor_id
   from visits 
   where cast(timestamp as date) between date and date-7 
   and page_name in ('home page','product page','confirmation page') 
   group by visitor_id
   having count(distinct a.page_name) >= 3
 ) as a
on a.visitor_id = b.visitor_id

您的 A3 语法将失败,但您已接近:

select
 100.00 * -- multiply first, then divide
 (select count(distinct a.visitor_id) as visited_confirmation_page 
    from first_visitors a
    join visits b on b.visitor_id = a.visitor_id
    where channel = 'SEM'
    and b.page_name in 'confirmation_page'
    and cast(b.timestamp as date) between cast(a.first_visit_timestamp as date) 
    and cast(a.first_visit_timestamp as date)+30
 ) / 
 (select count(distinct a.visitor_id) as all_SEM_visits -- DISTINCT probably not needed
    from first_visitors a
    where channel = 'SEM'
 ) as "%_of_SEM_confirmations"