在 R 中的 1 个单元格中进行多次观察
Dcast multiple observation in a 1 cell in R
我有一个 R 数据框
Customer Month BaseVolume IncrementalVolume TradeSpend
10 Jan 11 1 110
10 Feb 12 2 120
20 Jan 21 7 210
20 Feb 22 8 220
我想这样转换,
Customer Jan Feb
10 BaseVolume 11 BaseVolume 12
IncrementalVolume 1 IncrementalVolume 2
TradeSpend 110 TradeSpend 120
20 BaseVolume 21 BaseVolume 22
IncrementalVolume 7 IncrementalVolume 8
TradeSpend 210 TradeSpend 220
我试过 dcast (reshape) 但我得不到这个结果。请帮帮我
您可以尝试以下操作(在您的情况下,假设您的数据是 df1,您需要在我提到的任何操作之前执行 setDT(df1)
):
library(data.table)
dt1 <- structure(list(Customer = c(10L, 10L, 20L, 20L), Month = c("Jan",
"Feb", "Jan", "Feb"), BaseVolume = c(11L, 12L, 21L, 22L), IncrementalVolume = c(1L,
2L, 7L, 8L), TradeSpend = c(110L, 120L, 210L, 220L)), .Names = c("Customer",
"Month", "BaseVolume", "IncrementalVolume", "TradeSpend"), row.names = c(NA,
-4L), class = c("data.table", "data.frame"))
res <- dcast(melt(dt1, id.vars = c("Customer", "Month")), Customer + variable~ Month)
> res
Customer variable Feb Jan
1: 10 BaseVolume 12 11
2: 10 IncrementalVolume 2 1
3: 10 TradeSpend 120 110
4: 20 BaseVolume 22 21
5: 20 IncrementalVolume 8 7
6: 20 TradeSpend 220 210
如果您希望将它们放在一起,您可以执行以下操作:
update_cols <- which(!names(res) %in% c("Customer", "variable"))
res[, (update_cols):= lapply(.SD, function(x) paste(variable, x)), .SDcols = update_cols][, variable:= NULL]
给出:
> res
Customer Feb Jan
1: 10 BaseVolume 12 BaseVolume 11
2: 10 IncrementalVolume 2 IncrementalVolume 1
3: 10 TradeSpend 120 TradeSpend 110
4: 20 BaseVolume 22 BaseVolume 21
5: 20 IncrementalVolume 8 IncrementalVolume 7
6: 20 TradeSpend 220 TradeSpend 210
虽然已经有,但我觉得在某些方面可以改进以更接近预期的输出:
- OP 指定了按
Jan
、Feb
顺序出现的月份
- 输出难以阅读
- 应该在
dcast()
之前修改列
我们将从将输入数据从宽格式重塑为长格式开始,但要确保 Month
将以正确的顺序出现:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
# turn Month into factor with levels in the given order
molten[, Month := forcats::fct_inorder(Month)]
现在,在 调用 dcast()
:
之前,会以长格式创建一个新的 text
列
molten[, text := paste(variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
结果类似于 ,但月份按预期顺序排列。
N.B. 不幸的是,折叠每个 Customer
行的方法不起作用,因为打印时不考虑换行符:
dcast(molten, Customer ~ Month, value.var = "text", paste0, collapse = "\n")
# Customer Jan Feb
#1: 10 BaseVolume 11\nIncrementalVolume 1\nTradeSpend 110 BaseVolume 12\nIncrementalVolume 2\nTradeSpend 120
#2: 20 BaseVolume 21\nIncrementalVolume 7\nTradeSpend 210 BaseVolume 22\nIncrementalVolume 8\nTradeSpend 220
text
列可以通过向右填充白色space来左对齐(最小长度由最长字符串的字符长度决定):
molten[, text := paste(variable, value)]
molten[, text := stringr::str_pad(text, max(nchar(text)), "right")]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
或者,text
列可以自行对齐:
fmt <- stringr::str_interp("%-${n}s %3i", list(n = molten[, max(nchar(levels(variable)))]))
molten[, text := sprintf(fmt, variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
这里,sprintf()
中要使用的格式也是使用字符串插值动态创建的:
fmt
#[1] "%-17s %3i"
注意这里使用了variable
最长级别的字符长度,因为melt()
已经默认将variable
转为factor .
答案可能更简单,因为 data.table
的最新版本允许同时重塑多个列:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
molten[, Month := forcats::fct_inorder(Month)]
dcast(molten, Customer + variable ~ Month, value.var = c("variable", "value"))
# Customer variable variable.1_Jan variable.1_Feb value_Jan value_Feb
#1: 10 BaseVolume BaseVolume BaseVolume 11 12
#2: 10 IncrementalVolume IncrementalVolume IncrementalVolume 1 2
#3: 10 TradeSpend TradeSpend TradeSpend 110 120
#4: 20 BaseVolume BaseVolume BaseVolume 21 22
#5: 20 IncrementalVolume IncrementalVolume IncrementalVolume 7 8
#6: 20 TradeSpend TradeSpend TradeSpend 210 220
但不幸的是,它缺少一个选项,可以轻松地按交替顺序对列进行重新排序,即属于 Jan
的所有列,然后是 Feb
等
我有一个 R 数据框
Customer Month BaseVolume IncrementalVolume TradeSpend
10 Jan 11 1 110
10 Feb 12 2 120
20 Jan 21 7 210
20 Feb 22 8 220
我想这样转换,
Customer Jan Feb
10 BaseVolume 11 BaseVolume 12
IncrementalVolume 1 IncrementalVolume 2
TradeSpend 110 TradeSpend 120
20 BaseVolume 21 BaseVolume 22
IncrementalVolume 7 IncrementalVolume 8
TradeSpend 210 TradeSpend 220
我试过 dcast (reshape) 但我得不到这个结果。请帮帮我
您可以尝试以下操作(在您的情况下,假设您的数据是 df1,您需要在我提到的任何操作之前执行 setDT(df1)
):
library(data.table)
dt1 <- structure(list(Customer = c(10L, 10L, 20L, 20L), Month = c("Jan",
"Feb", "Jan", "Feb"), BaseVolume = c(11L, 12L, 21L, 22L), IncrementalVolume = c(1L,
2L, 7L, 8L), TradeSpend = c(110L, 120L, 210L, 220L)), .Names = c("Customer",
"Month", "BaseVolume", "IncrementalVolume", "TradeSpend"), row.names = c(NA,
-4L), class = c("data.table", "data.frame"))
res <- dcast(melt(dt1, id.vars = c("Customer", "Month")), Customer + variable~ Month)
> res
Customer variable Feb Jan
1: 10 BaseVolume 12 11
2: 10 IncrementalVolume 2 1
3: 10 TradeSpend 120 110
4: 20 BaseVolume 22 21
5: 20 IncrementalVolume 8 7
6: 20 TradeSpend 220 210
如果您希望将它们放在一起,您可以执行以下操作:
update_cols <- which(!names(res) %in% c("Customer", "variable"))
res[, (update_cols):= lapply(.SD, function(x) paste(variable, x)), .SDcols = update_cols][, variable:= NULL]
给出:
> res
Customer Feb Jan
1: 10 BaseVolume 12 BaseVolume 11
2: 10 IncrementalVolume 2 IncrementalVolume 1
3: 10 TradeSpend 120 TradeSpend 110
4: 20 BaseVolume 22 BaseVolume 21
5: 20 IncrementalVolume 8 IncrementalVolume 7
6: 20 TradeSpend 220 TradeSpend 210
虽然已经有
- OP 指定了按
Jan
、Feb
顺序出现的月份
- 输出难以阅读
- 应该在
dcast()
之前修改列
我们将从将输入数据从宽格式重塑为长格式开始,但要确保 Month
将以正确的顺序出现:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
# turn Month into factor with levels in the given order
molten[, Month := forcats::fct_inorder(Month)]
现在,在 调用 dcast()
:
text
列
molten[, text := paste(variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
结果类似于
N.B. 不幸的是,折叠每个 Customer
行的方法不起作用,因为打印时不考虑换行符:
dcast(molten, Customer ~ Month, value.var = "text", paste0, collapse = "\n")
# Customer Jan Feb
#1: 10 BaseVolume 11\nIncrementalVolume 1\nTradeSpend 110 BaseVolume 12\nIncrementalVolume 2\nTradeSpend 120
#2: 20 BaseVolume 21\nIncrementalVolume 7\nTradeSpend 210 BaseVolume 22\nIncrementalVolume 8\nTradeSpend 220
text
列可以通过向右填充白色space来左对齐(最小长度由最长字符串的字符长度决定):
molten[, text := paste(variable, value)]
molten[, text := stringr::str_pad(text, max(nchar(text)), "right")]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
或者,text
列可以自行对齐:
fmt <- stringr::str_interp("%-${n}s %3i", list(n = molten[, max(nchar(levels(variable)))]))
molten[, text := sprintf(fmt, variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
这里,sprintf()
中要使用的格式也是使用字符串插值动态创建的:
fmt
#[1] "%-17s %3i"
注意这里使用了variable
最长级别的字符长度,因为melt()
已经默认将variable
转为factor .
答案可能更简单,因为 data.table
的最新版本允许同时重塑多个列:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
molten[, Month := forcats::fct_inorder(Month)]
dcast(molten, Customer + variable ~ Month, value.var = c("variable", "value"))
# Customer variable variable.1_Jan variable.1_Feb value_Jan value_Feb
#1: 10 BaseVolume BaseVolume BaseVolume 11 12
#2: 10 IncrementalVolume IncrementalVolume IncrementalVolume 1 2
#3: 10 TradeSpend TradeSpend TradeSpend 110 120
#4: 20 BaseVolume BaseVolume BaseVolume 21 22
#5: 20 IncrementalVolume IncrementalVolume IncrementalVolume 7 8
#6: 20 TradeSpend TradeSpend TradeSpend 210 220
但不幸的是,它缺少一个选项,可以轻松地按交替顺序对列进行重新排序,即属于 Jan
的所有列,然后是 Feb
等