在新 Table 中根据其他中的值创建新列并考虑事件归因建模
Create New Columns in New Table Based on Values in Other with Event Attribution Modeling in Mind
我正在努力创建一个 table 来帮助我的公司进行归因建模。我们有几个数据集,包括发票、公司、人员和事件数据。
我们的数据很复杂,因为我们与 B2B(企业对企业)客户打交道。因此,这并不像查看一个人的事件并将发票总额归因于他们所做的事件(或行为)那么简单。
相反,我们的发票引用了公司 ID,我们的员工引用了公司 ID - 然后我们的员工引用了他们的事件。因此,我目前正在基于这种关系加入我的 table,并且拥有一个包含所有信息的巨大 table。
看起来像这样:
INVOICE_ID
INVOICE_DATE
INVOICE_TOTAL
PERSON_COMPANY_ID
PERSON_EMAIL
EVENT_NAME
EVENT_DATE
DAYS_BETWEEN_EVENT_AND_INVOICE
111
3/7/2022
4.80
ABC
john@coolcompany.com
Spoke to Sales Rep
2/10/2022
25
111
3/7/2022
4.80
ABC
jenny@coolcompany.com
Form Submitted
6/8/2021
272
111
3/7/2022
4.80
ABC
jenny@coolcompany.com
Spoke to Sales Rep
2/10/2022
25
111
3/7/2022
4.80
ABC
jim@coolcompany.com
Clicked Email
3/21/2022
-14
111
3/7/2022
4.80
ABC
jim@coolcompany.com
Chat on Website
3/2/2022
5
111
3/7/2022
4.80
ABC
jim@coolcompany.com
Opened Email
3/7/2022
0
111
3/7/2022
4.80
ABC
jim@coolcompany.com
Spoke to Sales Rep
2/10/2022
25
111
3/7/2022
4.80
ABC
jim@coolcompany.com
Google Ad
2/28/2022
7
111
3/7/2022
4.80
ABC
jim@coolcompany.com
Google Ad
3/1/2022
6
111
3/7/2022
4.80
ABC
jim@coolcompany.com
Google Ad
3/2/2022
5
111
3/7/2022
4.80
ABC
jim@coolcompany.com
Google Ad
3/14/2022
-7
111
3/7/2022
4.80
ABC
mark@coolcompany.com
Spoke to Sales Rep
2/10/2022
25
111
3/7/2022
4.80
ABC
mark@coolcompany.com
Form Submitted
12/2/2021
95
222
3/7/2022
4.80
XYZ
tom@coolcompany.com
Spoke to Sales Rep
2/10/2022
25
222
3/7/2022
0.25
XYZ
andy@testcompany.com
Spoke to Sales Rep
6/3/2021
277
222
3/7/2022
0.25
XYZ
andy@testcompany.com
Spoke to Sales Rep
4/8/2021
333
222
3/7/2022
0.25
XYZ
andy@testcompany.com
Spoke to Sales Rep
6/4/2021
276
222
3/7/2022
0.25
XYZ
andy@testcompany.com
Spoke to Sales Rep
2/23/2022
12
222
3/7/2022
0.25
XYZ
phil@testcompany.com
Spoke to Sales Rep
2/23/2022
12
222
3/7/2022
0.25
XYZ
jordan@testcompany.com
Spoke to Sales Rep
4/8/2021
333
222
3/7/2022
0.25
XYZ
jordan@testcompany.com
Spoke to Sales Rep
6/4/2021
276
222
3/7/2022
0.25
XYZ
jordan@testcompany.com
Spoke to Sales Rep
2/23/2022
12
222
3/7/2022
0.25
XYZ
matt@testcompany.com
Spoke to Sales Rep
2/23/2022
12
我想创建一个 table,其中包含基于发票发生的最后五个事件的事件位置列。并且仅在发票日期的最后 90 天内。所以我想创建一个新的 table,看起来可能像这样:
INVOICE_ID
INVOICE_DATE
INVOICE_TOTAL
PERSON_COMPANY_ID
EVENT_5
EVENT_5_EMAIL
EVENT_5_DATE
Event 4
Event 4 Email
Event 4 Date
Event 3
Event 3 Email
Event 3 Date
Event 2
Event 2 Email
Event 2 Date
Event 1
Event 1 Email
Event 1 Date
111
3/7/2022
4.80
ABC
Google Ad
jim@coolcompany.com
2/28/2022
Google Ad
jim@coolcompany.com
3/1/2022
Google Ad
jim@coolcompany.com
3/2/2022
Chat on Website
jim@coolcompany.com
3/2/2022
Opened Email
jim@coolcompany.com
3/7/2022
222
3/7/2022
0.25
XYZ
Spoke to Sales Rep
nick@testcompany.com
2/23/2022
Spoke to Sales Rep
matt@testcompany.com
2/23/2022
Spoke to Sales Rep
jordan@testcompany.com
2/23/2022
Spoke to Sales Rep
phil@testcompany.com
2/23/2022
Spoke to Sales Rep
andy@testcompany.com
2/23/2022
为了尝试创建它,我添加了 DAYS_BETWEEN_EVENT_AND_INVOICE 列,如您在第一个 table 中看到的那样。我认为使用它来过滤负值可以让我更接近,但我不确定这是否是进行归因的最佳方法。我也不确定如何从根本上循环遍历我的 table 并根据这些条件填写我的第二个 table:发票的最后 5 个事件,仅持续 90 天。
我正在使用 SQL,Snowflake 数据仓库和最终的 Power BI 来可视化这些数据。
您可以在 Power Query 中执行此操作(=> 转换)
在此查询生成的数据中,发票 222 的发票总额可能存在错误。这可能是由于拼写错误,其中该发票的最新事件行具有相同的发票价值 111.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("tdRPb4IwFADwr9KQHR1tXxHYTWe2HZfMw/4YD1WaWaEtge7gt1/VgNnEBAUupXmkv/foK10sPEqpN/IYjjAQADe9G5PAj4mbTR9nbtyajZ6sjcnWRuVc73z3dOF5blKBrEFznokSvYncBQFTUjkw9pajNr7QeteQ4NkUCs1/VkpaKxIXCHG8N/YcRNAN7696qRr4WSbXqUjQk+IyOwBAK+GeBp3oDbfIaPQuVqW04ohXRKeiX3Oh/9Rcrydd2IG3+sWY70ygaXJQIa5WR32hDNetC/sz+2nZvzqD+oy1/HrFi3TIll3wz35tCscN2XMPRxsOYKP98fnlRmu6nLZzngLx3cuK5zrZTawobQs/xOx0M0X9+8Hp5mOMDVF/cKo/7N93vWWVQ6Gdn29kNqS/NUXC9ZAduC7DLT24LsMtu6S4tbf6y18=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [INVOICE_ID = _t, INVOICE_DATE = _t, INVOICE_TOTAL = _t, PERSON_COMPANY_ID = _t, PERSON_EMAIL = _t, EVENT_NAME = _t, EVENT_DATE = _t, DAYS_BETWEEN_EVENT_AND_INVOICE = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"INVOICE_ID", Int64.Type}, {"INVOICE_DATE", type date}, {"INVOICE_TOTAL", Currency.Type}, {"PERSON_COMPANY_ID", type text}, {"PERSON_EMAIL", type text}, {"EVENT_NAME", type text}, {"EVENT_DATE", type date}, {"DAYS_BETWEEN_EVENT_AND_INVOICE", Int64.Type}}),
//removed this column since we won't need it
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"DAYS_BETWEEN_EVENT_AND_INVOICE"}),
//Group by Invoice
#"Grouped Rows" = Table.Group(#"Removed Columns", {"INVOICE_ID"}, {
{"within90", (t)=> let
//Filter the table by duration between invoice date and event date
//then sort descending by event date and split off the first five rows
// note that split will be populated by fewer rows if there are not five dates in the range
x = Table.Split(
Table.Sort(
Table.SelectRows(t,
each Duration.Days([INVOICE_DATE]-[EVENT_DATE]) < 90 and
Duration.Days([INVOICE_DATE]-[EVENT_DATE]) >= 0),
{"EVENT_DATE", Order.Descending}),
5){0},
//generate a list of records, along with their field names, for those events
events = List.Generate(()=>
[evEM=x{0}[PERSON_EMAIL] , evN=x{0}[EVENT_NAME], evD=x{0}[EVENT_DATE] , idx=0],
each [idx] < Table.RowCount(x),
each [evEM=x{[idx]+1}[PERSON_EMAIL] , evN=x{[idx]+1}[EVENT_NAME], evD=x{[idx]+1}[EVENT_DATE] , idx=[idx]+1],
each Record.FromList(
{[evN],[evEM],[evD]},
{"EVENT_" & Text.From([idx]+1),
"EVENT_" & Text.From([idx]+1) & " EMAIL",
"EVENT_" & Text.From([idx]+1) & " DATE"})),
//combine the generated records with the first row of each subTable to create new table rows
newTable = Record.Combine({t{0}} & List.Reverse(events))
in
newTable}
}),
//expand the records to new columns and set the data types
#"Expanded within90" = Table.ExpandRecordColumn(#"Grouped Rows", "within90", {"INVOICE_DATE", "INVOICE_TOTAL", "PERSON_COMPANY_ID", "PERSON_EMAIL", "EVENT_NAME", "EVENT_DATE", "EVENT_5", "EVENT_5 EMAIL", "EVENT_5 DATE", "EVENT_4", "EVENT_4 EMAIL", "EVENT_4 DATE", "EVENT_3", "EVENT_3 EMAIL", "EVENT_3 DATE", "EVENT_2", "EVENT_2 EMAIL", "EVENT_2 DATE", "EVENT_1", "EVENT_1 EMAIL", "EVENT_1 DATE"}, {"INVOICE_DATE", "INVOICE_TOTAL", "PERSON_COMPANY_ID", "PERSON_EMAIL", "EVENT_NAME", "EVENT_DATE", "EVENT_5", "EVENT_5 EMAIL", "EVENT_5 DATE", "EVENT_4", "EVENT_4 EMAIL", "EVENT_4 DATE", "EVENT_3", "EVENT_3 EMAIL", "EVENT_3 DATE", "EVENT_2", "EVENT_2 EMAIL", "EVENT_2 DATE", "EVENT_1", "EVENT_1 EMAIL", "EVENT_1 DATE"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded within90",{{"INVOICE_DATE", type date}, {"INVOICE_TOTAL", type number}, {"PERSON_COMPANY_ID", type text}, {"PERSON_EMAIL", type text}, {"EVENT_NAME", type text}, {"EVENT_DATE", type date}, {"EVENT_5", type text}, {"EVENT_5 EMAIL", type text}, {"EVENT_5 DATE", type date}, {"EVENT_4", type text}, {"EVENT_4 EMAIL", type text}, {"EVENT_4 DATE", type date}, {"EVENT_3", type text}, {"EVENT_3 EMAIL", type text}, {"EVENT_3 DATE", type date}, {"EVENT_2", type text}, {"EVENT_2 EMAIL", type text}, {"EVENT_2 DATE", type date}, {"EVENT_1", type text}, {"EVENT_1 EMAIL", type text}, {"EVENT_1 DATE", type date}})
in
#"Changed Type1"
尝试使用 CTE 和 pivot 解决方案。
with cte1 as (
select * from
(
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,
event_name, 'event_'||rn event1
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,
event_name,dd,rn
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,
event_name,datediff(day,event_date,invoice_date) dd,
row_number() over (partition by invoice_id order by dd desc) as rn
from invoice1 where dd<=90
)
where rn<=5
) x
)
pivot (max(event_name)
for
event1 in ('event_1','event_2','event_3','event_4','event_5')) as pvt
),
cte2 as (
select * from
(
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,PERSON_EMAIL,
'event_'||rn||'_email' event1
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,PERSON_EMAIL,
dd,rn
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,PERSON_EMAIL,
datediff(day,event_date,invoice_date) dd,
row_number() over (partition by invoice_id order by dd desc) as rn
from invoice1 where dd<=90
)
where rn<=5
) x
)
pivot (max(PERSON_EMAIL)
for
event1 in ('event_1_email','event_2_email','event_3_email','event_4_email','event_5_email')) as pvt
),
cte3 as (
select * from
(
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,EVENT_DATE,
'event_'||rn||'_date' event1
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,EVENT_DATE,
dd,rn
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,EVENT_DATE,
datediff(day,event_date,invoice_date) dd,
row_number() over (partition by invoice_id order by dd desc) as rn
from invoice1 where dd<=90
)
where rn<=5
) x
)
pivot (max(EVENT_DATE)
for
event1 in ('event_1_date','event_2_date','event_3_date','event_4_date','event_5_date')) as pvt
)
select
cte1.invoice_id,cte1.invoice_date,cte1.invoice_total,cte1.person_company_id,
cte1."'event_1'",cte2."'event_1_email'",cte3."'event_1_date'",
cte1."'event_2'",cte2."'event_2_email'",cte3."'event_2_date'",
cte1."'event_3'",cte2."'event_3_email'",cte3."'event_3_date'",
cte1."'event_4'",cte2."'event_4_email'",cte3."'event_4_date'",
cte1."'event_5'",cte2."'event_5_email'",cte3."'event_5_date'"
from cte1,cte2,cte3
where cte1.invoice_id=cte2.invoice_id
and cte2.invoice_id=cte3.invoice_id ;
此(在 CTE 内)的主要查询是 -
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,EVENT_DATE,datediff(day,event_date,invoice_date) dd,
row_number() over (partition by invoice_id order by dd desc) as rn from invoice1 where dd<=90
遵循 table 定义为 -
table invoice1
(
INVOICE_ID number,
INVOICE_DATE date,
INVOICE_TOTAL varchar2(100),
PERSON_COMPANY_ID varchar2(100),
PERSON_EMAIL varchar2(100),
EVENT_NAME varchar2(100),
EVENT_DATE date
)
我正在努力创建一个 table 来帮助我的公司进行归因建模。我们有几个数据集,包括发票、公司、人员和事件数据。
我们的数据很复杂,因为我们与 B2B(企业对企业)客户打交道。因此,这并不像查看一个人的事件并将发票总额归因于他们所做的事件(或行为)那么简单。
相反,我们的发票引用了公司 ID,我们的员工引用了公司 ID - 然后我们的员工引用了他们的事件。因此,我目前正在基于这种关系加入我的 table,并且拥有一个包含所有信息的巨大 table。
看起来像这样:
INVOICE_ID | INVOICE_DATE | INVOICE_TOTAL | PERSON_COMPANY_ID | PERSON_EMAIL | EVENT_NAME | EVENT_DATE | DAYS_BETWEEN_EVENT_AND_INVOICE |
---|---|---|---|---|---|---|---|
111 | 3/7/2022 | 4.80 | ABC | john@coolcompany.com | Spoke to Sales Rep | 2/10/2022 | 25 |
111 | 3/7/2022 | 4.80 | ABC | jenny@coolcompany.com | Form Submitted | 6/8/2021 | 272 |
111 | 3/7/2022 | 4.80 | ABC | jenny@coolcompany.com | Spoke to Sales Rep | 2/10/2022 | 25 |
111 | 3/7/2022 | 4.80 | ABC | jim@coolcompany.com | Clicked Email | 3/21/2022 | -14 |
111 | 3/7/2022 | 4.80 | ABC | jim@coolcompany.com | Chat on Website | 3/2/2022 | 5 |
111 | 3/7/2022 | 4.80 | ABC | jim@coolcompany.com | Opened Email | 3/7/2022 | 0 |
111 | 3/7/2022 | 4.80 | ABC | jim@coolcompany.com | Spoke to Sales Rep | 2/10/2022 | 25 |
111 | 3/7/2022 | 4.80 | ABC | jim@coolcompany.com | Google Ad | 2/28/2022 | 7 |
111 | 3/7/2022 | 4.80 | ABC | jim@coolcompany.com | Google Ad | 3/1/2022 | 6 |
111 | 3/7/2022 | 4.80 | ABC | jim@coolcompany.com | Google Ad | 3/2/2022 | 5 |
111 | 3/7/2022 | 4.80 | ABC | jim@coolcompany.com | Google Ad | 3/14/2022 | -7 |
111 | 3/7/2022 | 4.80 | ABC | mark@coolcompany.com | Spoke to Sales Rep | 2/10/2022 | 25 |
111 | 3/7/2022 | 4.80 | ABC | mark@coolcompany.com | Form Submitted | 12/2/2021 | 95 |
222 | 3/7/2022 | 4.80 | XYZ | tom@coolcompany.com | Spoke to Sales Rep | 2/10/2022 | 25 |
222 | 3/7/2022 | 0.25 | XYZ | andy@testcompany.com | Spoke to Sales Rep | 6/3/2021 | 277 |
222 | 3/7/2022 | 0.25 | XYZ | andy@testcompany.com | Spoke to Sales Rep | 4/8/2021 | 333 |
222 | 3/7/2022 | 0.25 | XYZ | andy@testcompany.com | Spoke to Sales Rep | 6/4/2021 | 276 |
222 | 3/7/2022 | 0.25 | XYZ | andy@testcompany.com | Spoke to Sales Rep | 2/23/2022 | 12 |
222 | 3/7/2022 | 0.25 | XYZ | phil@testcompany.com | Spoke to Sales Rep | 2/23/2022 | 12 |
222 | 3/7/2022 | 0.25 | XYZ | jordan@testcompany.com | Spoke to Sales Rep | 4/8/2021 | 333 |
222 | 3/7/2022 | 0.25 | XYZ | jordan@testcompany.com | Spoke to Sales Rep | 6/4/2021 | 276 |
222 | 3/7/2022 | 0.25 | XYZ | jordan@testcompany.com | Spoke to Sales Rep | 2/23/2022 | 12 |
222 | 3/7/2022 | 0.25 | XYZ | matt@testcompany.com | Spoke to Sales Rep | 2/23/2022 | 12 |
我想创建一个 table,其中包含基于发票发生的最后五个事件的事件位置列。并且仅在发票日期的最后 90 天内。所以我想创建一个新的 table,看起来可能像这样:
INVOICE_ID | INVOICE_DATE | INVOICE_TOTAL | PERSON_COMPANY_ID | EVENT_5 | EVENT_5_EMAIL | EVENT_5_DATE | Event 4 | Event 4 Email | Event 4 Date | Event 3 | Event 3 Email | Event 3 Date | Event 2 | Event 2 Email | Event 2 Date | Event 1 | Event 1 Email | Event 1 Date |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
111 | 3/7/2022 | 4.80 | ABC | Google Ad | jim@coolcompany.com | 2/28/2022 | Google Ad | jim@coolcompany.com | 3/1/2022 | Google Ad | jim@coolcompany.com | 3/2/2022 | Chat on Website | jim@coolcompany.com | 3/2/2022 | Opened Email | jim@coolcompany.com | 3/7/2022 |
222 | 3/7/2022 | 0.25 | XYZ | Spoke to Sales Rep | nick@testcompany.com | 2/23/2022 | Spoke to Sales Rep | matt@testcompany.com | 2/23/2022 | Spoke to Sales Rep | jordan@testcompany.com | 2/23/2022 | Spoke to Sales Rep | phil@testcompany.com | 2/23/2022 | Spoke to Sales Rep | andy@testcompany.com | 2/23/2022 |
为了尝试创建它,我添加了 DAYS_BETWEEN_EVENT_AND_INVOICE 列,如您在第一个 table 中看到的那样。我认为使用它来过滤负值可以让我更接近,但我不确定这是否是进行归因的最佳方法。我也不确定如何从根本上循环遍历我的 table 并根据这些条件填写我的第二个 table:发票的最后 5 个事件,仅持续 90 天。
我正在使用 SQL,Snowflake 数据仓库和最终的 Power BI 来可视化这些数据。
您可以在 Power Query 中执行此操作(=> 转换)
在此查询生成的数据中,发票 222 的发票总额可能存在错误。这可能是由于拼写错误,其中该发票的最新事件行具有相同的发票价值 111.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("tdRPb4IwFADwr9KQHR1tXxHYTWe2HZfMw/4YD1WaWaEtge7gt1/VgNnEBAUupXmkv/foK10sPEqpN/IYjjAQADe9G5PAj4mbTR9nbtyajZ6sjcnWRuVc73z3dOF5blKBrEFznokSvYncBQFTUjkw9pajNr7QeteQ4NkUCs1/VkpaKxIXCHG8N/YcRNAN7696qRr4WSbXqUjQk+IyOwBAK+GeBp3oDbfIaPQuVqW04ohXRKeiX3Oh/9Rcrydd2IG3+sWY70ygaXJQIa5WR32hDNetC/sz+2nZvzqD+oy1/HrFi3TIll3wz35tCscN2XMPRxsOYKP98fnlRmu6nLZzngLx3cuK5zrZTawobQs/xOx0M0X9+8Hp5mOMDVF/cKo/7N93vWWVQ6Gdn29kNqS/NUXC9ZAduC7DLT24LsMtu6S4tbf6y18=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [INVOICE_ID = _t, INVOICE_DATE = _t, INVOICE_TOTAL = _t, PERSON_COMPANY_ID = _t, PERSON_EMAIL = _t, EVENT_NAME = _t, EVENT_DATE = _t, DAYS_BETWEEN_EVENT_AND_INVOICE = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"INVOICE_ID", Int64.Type}, {"INVOICE_DATE", type date}, {"INVOICE_TOTAL", Currency.Type}, {"PERSON_COMPANY_ID", type text}, {"PERSON_EMAIL", type text}, {"EVENT_NAME", type text}, {"EVENT_DATE", type date}, {"DAYS_BETWEEN_EVENT_AND_INVOICE", Int64.Type}}),
//removed this column since we won't need it
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"DAYS_BETWEEN_EVENT_AND_INVOICE"}),
//Group by Invoice
#"Grouped Rows" = Table.Group(#"Removed Columns", {"INVOICE_ID"}, {
{"within90", (t)=> let
//Filter the table by duration between invoice date and event date
//then sort descending by event date and split off the first five rows
// note that split will be populated by fewer rows if there are not five dates in the range
x = Table.Split(
Table.Sort(
Table.SelectRows(t,
each Duration.Days([INVOICE_DATE]-[EVENT_DATE]) < 90 and
Duration.Days([INVOICE_DATE]-[EVENT_DATE]) >= 0),
{"EVENT_DATE", Order.Descending}),
5){0},
//generate a list of records, along with their field names, for those events
events = List.Generate(()=>
[evEM=x{0}[PERSON_EMAIL] , evN=x{0}[EVENT_NAME], evD=x{0}[EVENT_DATE] , idx=0],
each [idx] < Table.RowCount(x),
each [evEM=x{[idx]+1}[PERSON_EMAIL] , evN=x{[idx]+1}[EVENT_NAME], evD=x{[idx]+1}[EVENT_DATE] , idx=[idx]+1],
each Record.FromList(
{[evN],[evEM],[evD]},
{"EVENT_" & Text.From([idx]+1),
"EVENT_" & Text.From([idx]+1) & " EMAIL",
"EVENT_" & Text.From([idx]+1) & " DATE"})),
//combine the generated records with the first row of each subTable to create new table rows
newTable = Record.Combine({t{0}} & List.Reverse(events))
in
newTable}
}),
//expand the records to new columns and set the data types
#"Expanded within90" = Table.ExpandRecordColumn(#"Grouped Rows", "within90", {"INVOICE_DATE", "INVOICE_TOTAL", "PERSON_COMPANY_ID", "PERSON_EMAIL", "EVENT_NAME", "EVENT_DATE", "EVENT_5", "EVENT_5 EMAIL", "EVENT_5 DATE", "EVENT_4", "EVENT_4 EMAIL", "EVENT_4 DATE", "EVENT_3", "EVENT_3 EMAIL", "EVENT_3 DATE", "EVENT_2", "EVENT_2 EMAIL", "EVENT_2 DATE", "EVENT_1", "EVENT_1 EMAIL", "EVENT_1 DATE"}, {"INVOICE_DATE", "INVOICE_TOTAL", "PERSON_COMPANY_ID", "PERSON_EMAIL", "EVENT_NAME", "EVENT_DATE", "EVENT_5", "EVENT_5 EMAIL", "EVENT_5 DATE", "EVENT_4", "EVENT_4 EMAIL", "EVENT_4 DATE", "EVENT_3", "EVENT_3 EMAIL", "EVENT_3 DATE", "EVENT_2", "EVENT_2 EMAIL", "EVENT_2 DATE", "EVENT_1", "EVENT_1 EMAIL", "EVENT_1 DATE"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Expanded within90",{{"INVOICE_DATE", type date}, {"INVOICE_TOTAL", type number}, {"PERSON_COMPANY_ID", type text}, {"PERSON_EMAIL", type text}, {"EVENT_NAME", type text}, {"EVENT_DATE", type date}, {"EVENT_5", type text}, {"EVENT_5 EMAIL", type text}, {"EVENT_5 DATE", type date}, {"EVENT_4", type text}, {"EVENT_4 EMAIL", type text}, {"EVENT_4 DATE", type date}, {"EVENT_3", type text}, {"EVENT_3 EMAIL", type text}, {"EVENT_3 DATE", type date}, {"EVENT_2", type text}, {"EVENT_2 EMAIL", type text}, {"EVENT_2 DATE", type date}, {"EVENT_1", type text}, {"EVENT_1 EMAIL", type text}, {"EVENT_1 DATE", type date}})
in
#"Changed Type1"
尝试使用 CTE 和 pivot 解决方案。
with cte1 as (
select * from
(
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,
event_name, 'event_'||rn event1
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,
event_name,dd,rn
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,
event_name,datediff(day,event_date,invoice_date) dd,
row_number() over (partition by invoice_id order by dd desc) as rn
from invoice1 where dd<=90
)
where rn<=5
) x
)
pivot (max(event_name)
for
event1 in ('event_1','event_2','event_3','event_4','event_5')) as pvt
),
cte2 as (
select * from
(
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,PERSON_EMAIL,
'event_'||rn||'_email' event1
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,PERSON_EMAIL,
dd,rn
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,PERSON_EMAIL,
datediff(day,event_date,invoice_date) dd,
row_number() over (partition by invoice_id order by dd desc) as rn
from invoice1 where dd<=90
)
where rn<=5
) x
)
pivot (max(PERSON_EMAIL)
for
event1 in ('event_1_email','event_2_email','event_3_email','event_4_email','event_5_email')) as pvt
),
cte3 as (
select * from
(
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,EVENT_DATE,
'event_'||rn||'_date' event1
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,EVENT_DATE,
dd,rn
from (
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,EVENT_DATE,
datediff(day,event_date,invoice_date) dd,
row_number() over (partition by invoice_id order by dd desc) as rn
from invoice1 where dd<=90
)
where rn<=5
) x
)
pivot (max(EVENT_DATE)
for
event1 in ('event_1_date','event_2_date','event_3_date','event_4_date','event_5_date')) as pvt
)
select
cte1.invoice_id,cte1.invoice_date,cte1.invoice_total,cte1.person_company_id,
cte1."'event_1'",cte2."'event_1_email'",cte3."'event_1_date'",
cte1."'event_2'",cte2."'event_2_email'",cte3."'event_2_date'",
cte1."'event_3'",cte2."'event_3_email'",cte3."'event_3_date'",
cte1."'event_4'",cte2."'event_4_email'",cte3."'event_4_date'",
cte1."'event_5'",cte2."'event_5_email'",cte3."'event_5_date'"
from cte1,cte2,cte3
where cte1.invoice_id=cte2.invoice_id
and cte2.invoice_id=cte3.invoice_id ;
此(在 CTE 内)的主要查询是 -
select INVOICE_ID,INVOICE_DATE,INVOICE_TOTAL,PERSON_COMPANY_ID,EVENT_DATE,datediff(day,event_date,invoice_date) dd,
row_number() over (partition by invoice_id order by dd desc) as rn from invoice1 where dd<=90
遵循 table 定义为 -
table invoice1
(
INVOICE_ID number,
INVOICE_DATE date,
INVOICE_TOTAL varchar2(100),
PERSON_COMPANY_ID varchar2(100),
PERSON_EMAIL varchar2(100),
EVENT_NAME varchar2(100),
EVENT_DATE date
)