R中的Transpose DataFrame:将一行作为列,将另一列聚合为行
Transpose DataFrame in R: having one row as columns and another column aggregated as rows
为了更好地解释我的问题,我准备了以下与我的原始数据集相似的示例数据:
library(zoo)
sample_data <- data.frame(User = c("customer1", "customer2", "customer3", "customer4", "customer5", "customer1", "customer2", "customer3", "customer4", "customer1", "customer3", "customer5"),
Cohort = as.yearmon(c("2020-03-01", "2020-02-17", "2020-04-10","2020-02-01", "2020-04-10", "2020-03-01", "2020-02-17", "2020-04-10","2020-02-01", "2020-03-01", "2020-04-10", "2020-04-30"), "%Y-%m-%d"),
Purchase_month = as.yearmon(c("2020-03-01", "2020-02-17", "2020-04-10", "2020-02-01", "2020-04-10", "2020-07-05", "2020-03-05", "2020-06-11","2020-03-07", "2020-11-01", "2020-11-04", "2020-06-30"), "%Y-%m-%d"),
Revenue = c(25, 34, 20, 50, 75, 80, 100, 76, 39, 20, 10, 90))
如您所见,我有一个购买数据列,其中一列表示客户,其中一列显示他们所属的同类群组(他们下第一笔订单的月份),另一列您可以在其中找到他们的购买日期和一个是他们每次购买的花费。
我想做的是更改 table 以显示每行 同类群组 和每个同类群组每月的 收入在每一列中 。
结果基本上应该如下所示
Cohort | Feb 2020 | Mar 2020 | Apr 2020 | May 2020 | Jun 2020 | Jul 2020 | Aug 2020 | Sep 2020 | Oct 2020 | Nov 2020 | Dec 2020
Feb 2020 | 84 | 139 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
Mar 2020 | 0 | 25 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 20 | 0
Apr 2020 | 0 | 0 | 95 | 0 | 169 | 0 | 0 | 0 | 0 | 10 | 0
谢谢!
Ps:可能这个标题不太贴切,但我不知道table这样的改造是怎么叫的。
我们可以删除 User
列并执行 sum
的 Revenue
值。
library(dplyr)
library(tidyr)
sample_data %>%
select(-User) %>%
pivot_wider(names_from = Purchase_month, values_from = Revenue, values_fill = 0, values_fn = sum)
为了更好地解释我的问题,我准备了以下与我的原始数据集相似的示例数据:
library(zoo)
sample_data <- data.frame(User = c("customer1", "customer2", "customer3", "customer4", "customer5", "customer1", "customer2", "customer3", "customer4", "customer1", "customer3", "customer5"),
Cohort = as.yearmon(c("2020-03-01", "2020-02-17", "2020-04-10","2020-02-01", "2020-04-10", "2020-03-01", "2020-02-17", "2020-04-10","2020-02-01", "2020-03-01", "2020-04-10", "2020-04-30"), "%Y-%m-%d"),
Purchase_month = as.yearmon(c("2020-03-01", "2020-02-17", "2020-04-10", "2020-02-01", "2020-04-10", "2020-07-05", "2020-03-05", "2020-06-11","2020-03-07", "2020-11-01", "2020-11-04", "2020-06-30"), "%Y-%m-%d"),
Revenue = c(25, 34, 20, 50, 75, 80, 100, 76, 39, 20, 10, 90))
如您所见,我有一个购买数据列,其中一列表示客户,其中一列显示他们所属的同类群组(他们下第一笔订单的月份),另一列您可以在其中找到他们的购买日期和一个是他们每次购买的花费。
我想做的是更改 table 以显示每行 同类群组 和每个同类群组每月的 收入在每一列中 。 结果基本上应该如下所示
Cohort | Feb 2020 | Mar 2020 | Apr 2020 | May 2020 | Jun 2020 | Jul 2020 | Aug 2020 | Sep 2020 | Oct 2020 | Nov 2020 | Dec 2020
Feb 2020 | 84 | 139 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0
Mar 2020 | 0 | 25 | 0 | 0 | 0 | 80 | 0 | 0 | 0 | 20 | 0
Apr 2020 | 0 | 0 | 95 | 0 | 169 | 0 | 0 | 0 | 0 | 10 | 0
谢谢!
Ps:可能这个标题不太贴切,但我不知道table这样的改造是怎么叫的。
我们可以删除 User
列并执行 sum
的 Revenue
值。
library(dplyr)
library(tidyr)
sample_data %>%
select(-User) %>%
pivot_wider(names_from = Purchase_month, values_from = Revenue, values_fill = 0, values_fn = sum)