重塑数据框 r
reshaping data frame r
简单整形,我有以下数据:
df<-data.frame(Product=c("A","A","A","B","B","C"), Ingredients=c("Chocolate","Vanilla","Berry","Chocolate","Berry2","Vanilla"))
df
Product Ingredients
1 A Chocolate
2 A Vanilla
3 A Berry
4 B Chocolate
5 B Berry2
6 C Vanilla
我想要 "ingredient" 的每个唯一值都有一列,例如:
df2
Product Ingredient_1 Ingredient_2 Ingredient_3
A Chocolate Vanilla Berry
B Chocolate Berry2 NULL
C Vanilla NULL NULL
看起来微不足道,我尝试重塑但我一直在计算(不是 "ingredients" 的实际值)。想法?
这是一个可能的解决方案,使用 data.table
包
library(data.table)
setDT(df)[, Ingredient := paste0("Ingredient_", seq_len(.N)), Product]
dcast(df, Product ~ Ingredient, value.var = "Ingredients")
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
或者,我们可以用性感的 dplyr/tidyr
组合来做到这一点
library(dplyr)
library(tidyr)
df %>%
group_by(Product) %>%
mutate(Ingredient = paste0("Ingredient_", row_number())) %>%
spread(Ingredient, Ingredients)
# Source: local data frame [3 x 4]
#
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1 A Chocolate Vanilla Berry
# 2 B Chocolate Berry2 NA
# 3 C Vanilla NA NA
带基数 R reshape
df$Count<-ave(rep(1,nrow(df)),df$Product,FUN=cumsum)
reshape(df,idvar="Product",timevar="Count",direction="wide",sep="_")
# Product Ingredients_1 Ingredients_2 Ingredients_3
#1 A Chocolate Vanilla Berry
#4 B Chocolate Berry2 <NA>
#6 C Vanilla <NA> <NA>
本着分享备选方案的精神,这里还有两个:
选项 1:split
列并使用 stri_list2matrix
创建宽表单。
library(stringi)
x <- with(df, split(Ingredients, Product))
data.frame(Product = names(x), stri_list2matrix(x))
# Product X1 X2 X3
# 1 A Chocolate Chocolate Vanilla
# 2 B Vanilla Berry2 <NA>
# 3 C Berry <NA> <NA>
选项 2:使用我的 "splitstackshape" 包中的 getanID
生成“.id”列,然后 dcast
它。 "data.table"包中加载了"splitstackshape",直接调用dcast.data.table
即可整形
library(splitstackshape)
dcast.data.table(getanID(df, "Product"),
Product ~ .id, value.var = "Ingredients")
# Product 1 2 3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
简单整形,我有以下数据:
df<-data.frame(Product=c("A","A","A","B","B","C"), Ingredients=c("Chocolate","Vanilla","Berry","Chocolate","Berry2","Vanilla"))
df
Product Ingredients
1 A Chocolate
2 A Vanilla
3 A Berry
4 B Chocolate
5 B Berry2
6 C Vanilla
我想要 "ingredient" 的每个唯一值都有一列,例如:
df2
Product Ingredient_1 Ingredient_2 Ingredient_3
A Chocolate Vanilla Berry
B Chocolate Berry2 NULL
C Vanilla NULL NULL
看起来微不足道,我尝试重塑但我一直在计算(不是 "ingredients" 的实际值)。想法?
这是一个可能的解决方案,使用 data.table
包
library(data.table)
setDT(df)[, Ingredient := paste0("Ingredient_", seq_len(.N)), Product]
dcast(df, Product ~ Ingredient, value.var = "Ingredients")
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
或者,我们可以用性感的 dplyr/tidyr
组合来做到这一点
library(dplyr)
library(tidyr)
df %>%
group_by(Product) %>%
mutate(Ingredient = paste0("Ingredient_", row_number())) %>%
spread(Ingredient, Ingredients)
# Source: local data frame [3 x 4]
#
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1 A Chocolate Vanilla Berry
# 2 B Chocolate Berry2 NA
# 3 C Vanilla NA NA
带基数 R reshape
df$Count<-ave(rep(1,nrow(df)),df$Product,FUN=cumsum)
reshape(df,idvar="Product",timevar="Count",direction="wide",sep="_")
# Product Ingredients_1 Ingredients_2 Ingredients_3
#1 A Chocolate Vanilla Berry
#4 B Chocolate Berry2 <NA>
#6 C Vanilla <NA> <NA>
本着分享备选方案的精神,这里还有两个:
选项 1:split
列并使用 stri_list2matrix
创建宽表单。
library(stringi)
x <- with(df, split(Ingredients, Product))
data.frame(Product = names(x), stri_list2matrix(x))
# Product X1 X2 X3
# 1 A Chocolate Chocolate Vanilla
# 2 B Vanilla Berry2 <NA>
# 3 C Berry <NA> <NA>
选项 2:使用我的 "splitstackshape" 包中的 getanID
生成“.id”列,然后 dcast
它。 "data.table"包中加载了"splitstackshape",直接调用dcast.data.table
即可整形
library(splitstackshape)
dcast.data.table(getanID(df, "Product"),
Product ~ .id, value.var = "Ingredients")
# Product 1 2 3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA