Add/Merge/Melt 只给出特定的列并给出一个唯一的行
Add/Merge/Melt only specific columns and give out one unique row
我正在尝试转换在某个日期有多个产品销售的数据集。最后,我只想保留包含每天产品销售额总和的唯一列。
我的 MRE:
df <- data.frame(created = as.Date(c("2020-01-01", "2020-01-01", "2020-01-02", "2020-01-02", "2020-01-03", "2020-01-03"), "%Y-%m-%d", tz = "GMT"),
soldUnits = c(1, 1, 1, 1, 1, 1),
Weekday = c("Mo","Mo","Tu","Tu","Th","Th"),
Sunshinehours = c(7.8,7.8,6.0,6.0,8.0,8.0))
看起来像这样:
Date soldUnits Weekday Sunshinehours
2020-01-01 1 Mo 7.8
2020-01-01 1 Mo 7.8
2020-01-02 1 Tu 6.0
2020-01-02 1 Tu 6.0
2020-01-03 1 We 8.0
2020-01-03 1 We 8.0
变形后应该是这样的:
Date soldUnits Weekday Sunshinehours
2020-01-01 2 Mo 7.8
2020-01-02 2 Tu 6.0
2020-01-03 2 We 8.0
我尝试了 aggregate()
和 group_by
但没有成功,因为我的数据被丢弃了。
有没有人知道如何根据我提到的规范转换和清理我的数据集?
使用 base
和 dplyr
R
df1 = aggregate(df["Sunshinehours"], by=df["created"], mean)
df2 = aggregate(df["soldUnits"], by=df["created"], sum)
df3 = inner_join(df1, df2)
#converting `Weekday` to factors
df$Weekday = as.factor(df$Weekday)
df3$Weekday = levels(df$Weekday)
created Sunshinehours soldUnits Weekday
1 2020-01-01 7.8 2 Mo
2 2020-01-02 6.0 2 Th
3 2020-01-03 8.0 2 Tu
可以使用 collap
将不同的函数应用于不同的列(或一组列)
library(collapse)
collap(df, ~ created + Weekday,
custom = list(fmean = "Sunshinehours", fsum = "soldUnits"))
created soldUnits Weekday Sunshinehours
1 2020-01-01 2 Mo 7.8
2 2020-01-02 2 Tu 6.0
3 2020-01-03 2 Th 8.0
这可行:
library(tidyverse)
df %>%
group_by(created) %>%
count(Weekday, Sunshinehours, wt = soldUnits,name = "soldUnits")
#> # A tibble: 3 × 4
#> # Groups: created [3]
#> created Weekday Sunshinehours soldUnits
#> <date> <chr> <dbl> <dbl>
#> 1 2020-01-01 Mo 7.8 2
#> 2 2020-01-02 Tu 6 2
#> 3 2020-01-03 Th 8 2
由 reprex package (v2.0.1)
于 2021-12-04 创建
另一种dplyr
方法:
df %>%
group_by(created, Weekday, Sunshinehours) %>%
summarise(soldUnits = sum(soldUnits))
created Weekday Sunshinehours soldUnits
<date> <chr> <dbl> <dbl>
1 2020-01-01 Mo 7.8 2
2 2020-01-02 Tu 6 2
3 2020-01-03 Th 8 2
我正在尝试转换在某个日期有多个产品销售的数据集。最后,我只想保留包含每天产品销售额总和的唯一列。
我的 MRE:
df <- data.frame(created = as.Date(c("2020-01-01", "2020-01-01", "2020-01-02", "2020-01-02", "2020-01-03", "2020-01-03"), "%Y-%m-%d", tz = "GMT"),
soldUnits = c(1, 1, 1, 1, 1, 1),
Weekday = c("Mo","Mo","Tu","Tu","Th","Th"),
Sunshinehours = c(7.8,7.8,6.0,6.0,8.0,8.0))
看起来像这样:
Date soldUnits Weekday Sunshinehours
2020-01-01 1 Mo 7.8
2020-01-01 1 Mo 7.8
2020-01-02 1 Tu 6.0
2020-01-02 1 Tu 6.0
2020-01-03 1 We 8.0
2020-01-03 1 We 8.0
变形后应该是这样的:
Date soldUnits Weekday Sunshinehours
2020-01-01 2 Mo 7.8
2020-01-02 2 Tu 6.0
2020-01-03 2 We 8.0
我尝试了 aggregate()
和 group_by
但没有成功,因为我的数据被丢弃了。
有没有人知道如何根据我提到的规范转换和清理我的数据集?
使用 base
和 dplyr
R
df1 = aggregate(df["Sunshinehours"], by=df["created"], mean)
df2 = aggregate(df["soldUnits"], by=df["created"], sum)
df3 = inner_join(df1, df2)
#converting `Weekday` to factors
df$Weekday = as.factor(df$Weekday)
df3$Weekday = levels(df$Weekday)
created Sunshinehours soldUnits Weekday
1 2020-01-01 7.8 2 Mo
2 2020-01-02 6.0 2 Th
3 2020-01-03 8.0 2 Tu
可以使用 collap
library(collapse)
collap(df, ~ created + Weekday,
custom = list(fmean = "Sunshinehours", fsum = "soldUnits"))
created soldUnits Weekday Sunshinehours
1 2020-01-01 2 Mo 7.8
2 2020-01-02 2 Tu 6.0
3 2020-01-03 2 Th 8.0
这可行:
library(tidyverse)
df %>%
group_by(created) %>%
count(Weekday, Sunshinehours, wt = soldUnits,name = "soldUnits")
#> # A tibble: 3 × 4
#> # Groups: created [3]
#> created Weekday Sunshinehours soldUnits
#> <date> <chr> <dbl> <dbl>
#> 1 2020-01-01 Mo 7.8 2
#> 2 2020-01-02 Tu 6 2
#> 3 2020-01-03 Th 8 2
由 reprex package (v2.0.1)
于 2021-12-04 创建另一种dplyr
方法:
df %>%
group_by(created, Weekday, Sunshinehours) %>%
summarise(soldUnits = sum(soldUnits))
created Weekday Sunshinehours soldUnits
<date> <chr> <dbl> <dbl>
1 2020-01-01 Mo 7.8 2
2 2020-01-02 Tu 6 2
3 2020-01-03 Th 8 2