将数据 table 重新格式化(重塑)为时间序列,使用数据透视 table 类型的重新格式化 - 将级别添加到变量名称
Reformat (reshape) a data table to a time series, using a pivot table type of reformatting - adding the levels to the variable names
使用 R,有没有一种方法可以重新格式化数据集,类似于在 excel 中进行数据透视 table?我的数据总共有 5 个变量。三个变量是 Date、Channel 和 Category,还有两个度量变量 Views 和 Spend。有没有一种方法可以生成时间序列数据,从而在行中显示日期,并根据渠道和类别的排列为每个度量变量、视图和支出自动生成新变量?这个问题和其他问题不一样,因为我希望变量名是变量名的一部分。
起始文件如下所示
Date=c("01/01/2020","01/01/2020","01/01/2020","01/01/2020","01/01/2020","08/01/2020","08/01/2020","15/01/2020","15/01/2020","15/01/2020","15/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","29/01/2020","29/01/2020","05/02/2020","05/02/2020","05/02/2020")
Channel=c("TV","TV","TV","Internet","TV","Internet","TV","Internet","TV","TV","Internet","TV","Internet","TV","TV","Internet","TV","TV","Internet","TV","Internet","TV","Internet")
Category=c("CatA","CatA","CatA","CatA","CatB","CatB","CatB","CatB","CatA","CatB","CatB","CatA","CatB","CatB","CatB","CatB","CatB","CatB","CatB","CatA","CatA","CatA","CatA")
Views=c(190,320,260,300,240,190,200,190,230,30,370,260,350,240,330,190,290,220,230,180,230,310,270)
Spend=c(34,63,46,53,21,23,17,24,20,5,50,42,46,39,44,31,72,54,58,22,29,41,36)
df <- data.frame(Date,Channel,Category,Views,Spend)
df
> df
Date Channel Category Views Spend
1 01/01/2020 TV CatA 190 34
2 01/01/2020 TV CatA 320 63
3 01/01/2020 TV CatA 260 46
4 01/01/2020 Internet CatA 300 53
5 01/01/2020 TV CatB 240 21
6 08/01/2020 Internet CatB 190 23
7 08/01/2020 TV CatB 200 17
8 15/01/2020 Internet CatB 190 24
9 15/01/2020 TV CatA 230 20
10 15/01/2020 TV CatB 30 5
11 15/01/2020 Internet CatB 370 50
12 22/01/2020 TV CatA 260 42
13 22/01/2020 Internet CatB 350 46
14 22/01/2020 TV CatB 240 39
15 22/01/2020 TV CatB 330 44
16 22/01/2020 Internet CatB 190 31
17 22/01/2020 TV CatB 290 72
18 22/01/2020 TV CatB 220 54
19 29/01/2020 Internet CatB 230 58
20 29/01/2020 TV CatA 180 22
21 05/02/2020 Internet CatA 230 29
22 05/02/2020 TV CatA 310 41
23 05/02/2020 Internet CatA 270 36
我希望重新格式化的数据框看起来像这样
Date=c("01/01/2020","08/01/2020","15/01/2020","22/01/2020","29/01/2020","05/02/2020")
TV.CatA.Views=c(770,0,230,260,180,310)
TV.CatB.Views=c(240,200,30,1080,0,0)
Internet.CatA.Views=c(300,0,0,0,0,500)
Internet.CatB.Views=c(0,190,560,540,230,0)
TV.CatA.Spend=c(143,0,20,42,22,41)
TV.CatB.Spend=c(21,17,5,209,0,0)
Internet.CatA.Spend=c(53,0,0,0,0,65)
Internet.CatB.Spend=c(0,23,74,77,58,0)
df_result <- data.frame(Date,TV.CatA.Views,TV.CatB.Views,Internet.CatA.Views,Internet.CatB.Views,TV.CatA.Spend,TV.CatB.Spend,Internet.CatA.Spend,Internet.CatB.Spend)
df_result
> df_result
Date TV.CatA.Views TV.CatB.Views Internet.CatA.Views Internet.CatB.Views TV.CatA.Spend
1 01/01/2020 770 240 300 0 143
2 08/01/2020 0 200 0 190 0
3 15/01/2020 230 30 0 560 20
4 22/01/2020 260 1080 0 540 42
5 29/01/2020 180 0 0 230 22
6 05/02/2020 310 0 500 0 41
TV.CatB.Spend Internet.CatA.Spend Internet.CatB.Spend
1 21 53 0
2 17 0 23
3 5 0 74
4 209 0 77
5 0 0 58
6 0 65 0
变量名称不需要完全按照我上面指定的方式命名,只要可以识别变量中的那些级别即可。目前,我在 excel 中一直在这样做,但在连续完成了 50 多个之后,我需要找到一种更有效的方法。
感谢您花时间看我的问题,非常感谢您的帮助。
此代码使用您添加的 df
生成类似于您想要的东西:
library(tidyverse)
#Code
mdf <- df %>% group_by(Date,Channel,Category) %>% summarise_all(.funs = sum) %>%
ungroup() %>% pivot_wider(names_from = c(Channel,Category),values_from = c(Views,Spend))
输出:
Date Views_Internet_CatA Views_TV_CatA Views_TV_CatB Views_Internet_CatB Spend_Internet_CatA
1 01/01/2020 300 770 240 NA 53
2 05/02/2020 500 310 NA NA 65
3 08/01/2020 NA NA 200 190 NA
4 15/01/2020 NA 230 30 560 NA
5 22/01/2020 NA 260 1080 540 NA
6 29/01/2020 NA 180 NA 230 NA
Spend_TV_CatA Spend_TV_CatB Spend_Internet_CatB
1 143 21 NA
2 41 NA NA
3 NA 17 23
4 20 5 74
5 42 209 77
6 22 NA 58
使用 R,有没有一种方法可以重新格式化数据集,类似于在 excel 中进行数据透视 table?我的数据总共有 5 个变量。三个变量是 Date、Channel 和 Category,还有两个度量变量 Views 和 Spend。有没有一种方法可以生成时间序列数据,从而在行中显示日期,并根据渠道和类别的排列为每个度量变量、视图和支出自动生成新变量?这个问题和其他问题不一样,因为我希望变量名是变量名的一部分。
起始文件如下所示
Date=c("01/01/2020","01/01/2020","01/01/2020","01/01/2020","01/01/2020","08/01/2020","08/01/2020","15/01/2020","15/01/2020","15/01/2020","15/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","22/01/2020","29/01/2020","29/01/2020","05/02/2020","05/02/2020","05/02/2020")
Channel=c("TV","TV","TV","Internet","TV","Internet","TV","Internet","TV","TV","Internet","TV","Internet","TV","TV","Internet","TV","TV","Internet","TV","Internet","TV","Internet")
Category=c("CatA","CatA","CatA","CatA","CatB","CatB","CatB","CatB","CatA","CatB","CatB","CatA","CatB","CatB","CatB","CatB","CatB","CatB","CatB","CatA","CatA","CatA","CatA")
Views=c(190,320,260,300,240,190,200,190,230,30,370,260,350,240,330,190,290,220,230,180,230,310,270)
Spend=c(34,63,46,53,21,23,17,24,20,5,50,42,46,39,44,31,72,54,58,22,29,41,36)
df <- data.frame(Date,Channel,Category,Views,Spend)
df
> df
Date Channel Category Views Spend
1 01/01/2020 TV CatA 190 34
2 01/01/2020 TV CatA 320 63
3 01/01/2020 TV CatA 260 46
4 01/01/2020 Internet CatA 300 53
5 01/01/2020 TV CatB 240 21
6 08/01/2020 Internet CatB 190 23
7 08/01/2020 TV CatB 200 17
8 15/01/2020 Internet CatB 190 24
9 15/01/2020 TV CatA 230 20
10 15/01/2020 TV CatB 30 5
11 15/01/2020 Internet CatB 370 50
12 22/01/2020 TV CatA 260 42
13 22/01/2020 Internet CatB 350 46
14 22/01/2020 TV CatB 240 39
15 22/01/2020 TV CatB 330 44
16 22/01/2020 Internet CatB 190 31
17 22/01/2020 TV CatB 290 72
18 22/01/2020 TV CatB 220 54
19 29/01/2020 Internet CatB 230 58
20 29/01/2020 TV CatA 180 22
21 05/02/2020 Internet CatA 230 29
22 05/02/2020 TV CatA 310 41
23 05/02/2020 Internet CatA 270 36
我希望重新格式化的数据框看起来像这样
Date=c("01/01/2020","08/01/2020","15/01/2020","22/01/2020","29/01/2020","05/02/2020")
TV.CatA.Views=c(770,0,230,260,180,310)
TV.CatB.Views=c(240,200,30,1080,0,0)
Internet.CatA.Views=c(300,0,0,0,0,500)
Internet.CatB.Views=c(0,190,560,540,230,0)
TV.CatA.Spend=c(143,0,20,42,22,41)
TV.CatB.Spend=c(21,17,5,209,0,0)
Internet.CatA.Spend=c(53,0,0,0,0,65)
Internet.CatB.Spend=c(0,23,74,77,58,0)
df_result <- data.frame(Date,TV.CatA.Views,TV.CatB.Views,Internet.CatA.Views,Internet.CatB.Views,TV.CatA.Spend,TV.CatB.Spend,Internet.CatA.Spend,Internet.CatB.Spend)
df_result
> df_result
Date TV.CatA.Views TV.CatB.Views Internet.CatA.Views Internet.CatB.Views TV.CatA.Spend
1 01/01/2020 770 240 300 0 143
2 08/01/2020 0 200 0 190 0
3 15/01/2020 230 30 0 560 20
4 22/01/2020 260 1080 0 540 42
5 29/01/2020 180 0 0 230 22
6 05/02/2020 310 0 500 0 41
TV.CatB.Spend Internet.CatA.Spend Internet.CatB.Spend
1 21 53 0
2 17 0 23
3 5 0 74
4 209 0 77
5 0 0 58
6 0 65 0
变量名称不需要完全按照我上面指定的方式命名,只要可以识别变量中的那些级别即可。目前,我在 excel 中一直在这样做,但在连续完成了 50 多个之后,我需要找到一种更有效的方法。 感谢您花时间看我的问题,非常感谢您的帮助。
此代码使用您添加的 df
生成类似于您想要的东西:
library(tidyverse)
#Code
mdf <- df %>% group_by(Date,Channel,Category) %>% summarise_all(.funs = sum) %>%
ungroup() %>% pivot_wider(names_from = c(Channel,Category),values_from = c(Views,Spend))
输出:
Date Views_Internet_CatA Views_TV_CatA Views_TV_CatB Views_Internet_CatB Spend_Internet_CatA
1 01/01/2020 300 770 240 NA 53
2 05/02/2020 500 310 NA NA 65
3 08/01/2020 NA NA 200 190 NA
4 15/01/2020 NA 230 30 560 NA
5 22/01/2020 NA 260 1080 540 NA
6 29/01/2020 NA 180 NA 230 NA
Spend_TV_CatA Spend_TV_CatB Spend_Internet_CatB
1 143 21 NA
2 41 NA NA
3 NA 17 23
4 20 5 74
5 42 209 77
6 22 NA 58