r 当组合不存在时用 null 重塑
r reshape with nulls when combination doesn't exist
我melt
和dcast
使用reshape2
包有一些数据,如下。
dat <- data.frame(Name = c("Alice", "Alice", "Alice", "Alice", "Bob", "Bob", "Bob"),
Month = c(1, 1, 1, 2, 1, 2, 2),
Product = c("Car", "Bike", "Car", "Car", "Car", "Bike", "Bike"),
Price = c(1000, 150, 300, 500, 2000, 200, 100))
# Name Month Product Price
# 1 Alice 1 Car 1000
# 2 Alice 1 Bike 150
# 3 Alice 1 Car 300
# 4 Alice 2 Car 500
# 5 Bob 1 Car 2000
# 6 Bob 2 Bike 200
# 7 Bob 2 Bike 100
dat_melt <- melt(dat, id=c("Name", "Month", "Product"))
# Name Month Product variable value
# 1 Alice 1 Car Price 1000
# 2 Alice 1 Bike Price 150
# 3 Alice 1 Car Price 300
# 4 Alice 2 Car Price 500
# 5 Bob 1 Car Price 2000
# 6 Bob 2 Bike Price 200
# 7 Bob 2 Bike Price 100
dat_spread <- dcast(dat_melt, Name + Month ~ Product + variable, value.var="value", fun=sum)
# Name Month Bike_Price Car_Price
# 1 Alice 1 150 1300
# 2 Alice 2 0 500
# 3 Bob 1 0 2000
# 4 Bob 2 300 0
我怎样才能得到这个输出,以便名称-月份-产品组合不存在的情况(例如 Alice,2,Bike)returns NULL
或 NA
而不是 0
?请注意,该解决方案适用于 Price
为 0 的情况,例如dat_spread$BikePrice[BikePrice == 0] <- NA
不可接受。
我曾尝试在 dcast
中使用匿名函数但无济于事,例如
library(dplyr)
dcast(dat_melt, Name + Month ~ Product + variable, value.var="value",
fun.aggregate = function(x) if_else(is.na(x), NULL, sum(x)))
# Error: `false` must be type NULL, not double
dcast(dat_melt, Name + Month ~ Product + variable, value.var="value",
fun.aggregate = function(x) if_else(is.na(x), 3.14, sum(x))) # then update after
# Error in vapply(indices, fun, .default) : values must be length 0,
# but FUN(X[[1]]) result is length 1
请注意,reshape2
不是必需的,因此如果您有不使用它的解决方案(例如使用 tidyverse
函数),那也很好。
您可以使用 fill
参数指定用于 dcast
中缺失组合的值:
dcast(dat_melt, Name + Month ~ Product + variable,
value.var = "value", fun = sum, fill = NA_real_)
#> Name Month Bike_Price Car_Price
#> 1 Alice 1 150 1300
#> 2 Alice 2 NA 500
#> 3 Bob 1 NA 2000
#> 4 Bob 2 300 NA
由 reprex package (v0.2.0) 创建于 2018-03-07。
(请注意,在幕后,dcast
调用 vapply
,这对类型很挑剔;因此仅指定 fill = NA
是不够的,因为 typeof(NA) == "logical"
并且您的值是数字:您必须明确使用 "double" NA 和 NA_real_
)
作为替代方案:您还可以使用 dplyr
+tidyr
:
进行所有重塑
library(dplyr);
library(tidyr);
dat %>%
group_by(Name, Month, Product) %>%
summarise(Price = sum(Price)) %>%
spread(Product, Price);
## A tibble: 4 x 4
## Groups: Name, Month [4]
# Name Month Bike Car
# <fct> <dbl> <dbl> <dbl>
#1 Alice 1. 150. 1300.
#2 Alice 2. NA 500.
#3 Bob 1. NA 2000.
#4 Bob 2. 300. NA
与dcast
类似,spread
有一个fill
参数,默认为fill=NA
。
我melt
和dcast
使用reshape2
包有一些数据,如下。
dat <- data.frame(Name = c("Alice", "Alice", "Alice", "Alice", "Bob", "Bob", "Bob"),
Month = c(1, 1, 1, 2, 1, 2, 2),
Product = c("Car", "Bike", "Car", "Car", "Car", "Bike", "Bike"),
Price = c(1000, 150, 300, 500, 2000, 200, 100))
# Name Month Product Price
# 1 Alice 1 Car 1000
# 2 Alice 1 Bike 150
# 3 Alice 1 Car 300
# 4 Alice 2 Car 500
# 5 Bob 1 Car 2000
# 6 Bob 2 Bike 200
# 7 Bob 2 Bike 100
dat_melt <- melt(dat, id=c("Name", "Month", "Product"))
# Name Month Product variable value
# 1 Alice 1 Car Price 1000
# 2 Alice 1 Bike Price 150
# 3 Alice 1 Car Price 300
# 4 Alice 2 Car Price 500
# 5 Bob 1 Car Price 2000
# 6 Bob 2 Bike Price 200
# 7 Bob 2 Bike Price 100
dat_spread <- dcast(dat_melt, Name + Month ~ Product + variable, value.var="value", fun=sum)
# Name Month Bike_Price Car_Price
# 1 Alice 1 150 1300
# 2 Alice 2 0 500
# 3 Bob 1 0 2000
# 4 Bob 2 300 0
我怎样才能得到这个输出,以便名称-月份-产品组合不存在的情况(例如 Alice,2,Bike)returns NULL
或 NA
而不是 0
?请注意,该解决方案适用于 Price
为 0 的情况,例如dat_spread$BikePrice[BikePrice == 0] <- NA
不可接受。
我曾尝试在 dcast
中使用匿名函数但无济于事,例如
library(dplyr)
dcast(dat_melt, Name + Month ~ Product + variable, value.var="value",
fun.aggregate = function(x) if_else(is.na(x), NULL, sum(x)))
# Error: `false` must be type NULL, not double
dcast(dat_melt, Name + Month ~ Product + variable, value.var="value",
fun.aggregate = function(x) if_else(is.na(x), 3.14, sum(x))) # then update after
# Error in vapply(indices, fun, .default) : values must be length 0,
# but FUN(X[[1]]) result is length 1
请注意,reshape2
不是必需的,因此如果您有不使用它的解决方案(例如使用 tidyverse
函数),那也很好。
您可以使用 fill
参数指定用于 dcast
中缺失组合的值:
dcast(dat_melt, Name + Month ~ Product + variable,
value.var = "value", fun = sum, fill = NA_real_)
#> Name Month Bike_Price Car_Price
#> 1 Alice 1 150 1300
#> 2 Alice 2 NA 500
#> 3 Bob 1 NA 2000
#> 4 Bob 2 300 NA
由 reprex package (v0.2.0) 创建于 2018-03-07。
(请注意,在幕后,dcast
调用 vapply
,这对类型很挑剔;因此仅指定 fill = NA
是不够的,因为 typeof(NA) == "logical"
并且您的值是数字:您必须明确使用 "double" NA 和 NA_real_
)
作为替代方案:您还可以使用 dplyr
+tidyr
:
library(dplyr);
library(tidyr);
dat %>%
group_by(Name, Month, Product) %>%
summarise(Price = sum(Price)) %>%
spread(Product, Price);
## A tibble: 4 x 4
## Groups: Name, Month [4]
# Name Month Bike Car
# <fct> <dbl> <dbl> <dbl>
#1 Alice 1. 150. 1300.
#2 Alice 2. NA 500.
#3 Bob 1. NA 2000.
#4 Bob 2. 300. NA
与dcast
类似,spread
有一个fill
参数,默认为fill=NA
。