将用户输入传递给 data.table 中的 'by' 和 reshape - r 中的公式
Pass user input to 'by' in data.table and the formula in reshape - r
下面是我想做的一个例子。 eval(substitute(*))
效果很好,如 here 所示,但会使代码更难阅读。我想知道是否有更好的东西我不知道。
我希望能够选择 table(最后)的行和列变量。
所以,如果我有
input.col <- 'Gender'
input.row <- 'Region'
我希望能够将这些参数传递给数据 table 而不是像下面那样使用 Region
和 Gender
。
library(data.table)
library(reshape)
set.seed(5)
DT <- data.table(Region = sample(x = c('Asia', 'Americas', 'Africa', 'Europe', 'Oceania'), size = 200, replace = T), Weight = runif(n = 200, min = 1, max = 5), Age = round(x = 10*rexp(n = 200, rate = 1), digits = 0), Gender = sample(x = c('Male', 'Female', 'Gender diverse'), size = 200, replace = T, prob = c(0.49, 0.49, 0.02)))
cast(data = DT[, sum(Weight), .(Region, Gender)], formula = Region~Gender, fun.aggregate = sum, value = 'V1')
我想进入以下table
Region Female Gender diverse Male
1 Africa 32.95019 3.222125 77.50863
2 Americas 49.12787 0.000000 84.97214
3 Asia 41.04879 0.000000 55.43294
4 Europe 45.39469 4.296767 47.76714
5 Oceania 65.89198 1.439075 72.27496
谢谢!
您可以使用get
,然后重命名可以在公式中进一步使用的变量:
input.col <- 'Gender'
input.row <- 'Region'
dt <- cast(data = DT[, sum(Weight), .(row = get(input.row), col = get(input.col))],
# ^^^ ^^^ ^^^ ^^^
formula = row ~ col, fun.aggregate = sum, value = 'V1')
dt
# row Female Gender diverse Male
#1 Africa 32.95019 3.222125 77.50863
#2 Americas 49.12787 0.000000 84.97214
#3 Asia 41.04879 0.000000 55.43294
#4 Europe 45.39469 4.296767 47.76714
#5 Oceania 65.89198 1.439075 72.27496
这里有一些可能性。除了 (3),他们只使用 data.table。所有方法都在一次操作中执行聚合和重塑,因此无需首先使用 by
。如果你真的出于某种原因确实想使用 by
那么这会起作用:
cast(data = DT[, sum(Weight), by = c(input.row, input.col)],
formula = paste(input.row, "~", input.col), fun.aggregate = sum, value = 'V1')
1) data.table::dcast
dcast(DT, paste(input.row, "~", input.col), sum, value.var = "Weight")
给予:
Region Female Gender diverse Male
1: Africa 32.95019 3.222125 77.50863
2: Americas 49.12787 0.000000 84.97214
3: Asia 41.04879 0.000000 55.43294
4: Europe 45.39469 4.296767 47.76714
5: Oceania 65.89198 1.439075 72.27496
2) xtabs xtabs
位于 R:
的基数
fo <- sprintf("Weight ~ %s + %s", input.row, input.col)
xtabs(fo, DT)
给予:
Gender
Region Female Gender diverse Male
Africa 32.950187 3.222125 77.508626
Americas 49.127873 0.000000 84.972137
Asia 41.048787 0.000000 55.432941
Europe 45.394693 4.296767 47.767138
Oceania 65.891983 1.439075 72.274955
3) reshape::cast 我们将使用 reshape 包,因为问题确实存在,但请注意,它已被 reshape2 包取代,在 reshape2 中,人们将使用 dcast
;然而,dcast
也按照 (1) 在 data.table 中实现。
cast(DT, paste(input.row, "~", input.col), sum, value = "Weight")
给予:
Region Female Gender diverse Male
1 Africa 32.95019 3.222125 77.50863
2 Americas 49.12787 0.000000 84.97214
3 Asia 41.04879 0.000000 55.43294
4 Europe 45.39469 4.296767 47.76714
5 Oceania 65.89198 1.439075 72.27496
4) 轻拍
tapply(DT$Weight, as.list(DT)[c(input.row, input.col)], sum, default = 0)
给予:
Gender
Region Female Gender diverse Male
Africa 32.95019 3.222125 77.50863
Americas 49.12787 0.000000 84.97214
Asia 41.04879 0.000000 55.43294
Europe 45.39469 4.296767 47.76714
Oceania 65.89198 1.439075 72.27496
下面是我想做的一个例子。 eval(substitute(*))
效果很好,如 here 所示,但会使代码更难阅读。我想知道是否有更好的东西我不知道。
我希望能够选择 table(最后)的行和列变量。 所以,如果我有
input.col <- 'Gender'
input.row <- 'Region'
我希望能够将这些参数传递给数据 table 而不是像下面那样使用 Region
和 Gender
。
library(data.table)
library(reshape)
set.seed(5)
DT <- data.table(Region = sample(x = c('Asia', 'Americas', 'Africa', 'Europe', 'Oceania'), size = 200, replace = T), Weight = runif(n = 200, min = 1, max = 5), Age = round(x = 10*rexp(n = 200, rate = 1), digits = 0), Gender = sample(x = c('Male', 'Female', 'Gender diverse'), size = 200, replace = T, prob = c(0.49, 0.49, 0.02)))
cast(data = DT[, sum(Weight), .(Region, Gender)], formula = Region~Gender, fun.aggregate = sum, value = 'V1')
我想进入以下table
Region Female Gender diverse Male
1 Africa 32.95019 3.222125 77.50863
2 Americas 49.12787 0.000000 84.97214
3 Asia 41.04879 0.000000 55.43294
4 Europe 45.39469 4.296767 47.76714
5 Oceania 65.89198 1.439075 72.27496
谢谢!
您可以使用get
,然后重命名可以在公式中进一步使用的变量:
input.col <- 'Gender'
input.row <- 'Region'
dt <- cast(data = DT[, sum(Weight), .(row = get(input.row), col = get(input.col))],
# ^^^ ^^^ ^^^ ^^^
formula = row ~ col, fun.aggregate = sum, value = 'V1')
dt
# row Female Gender diverse Male
#1 Africa 32.95019 3.222125 77.50863
#2 Americas 49.12787 0.000000 84.97214
#3 Asia 41.04879 0.000000 55.43294
#4 Europe 45.39469 4.296767 47.76714
#5 Oceania 65.89198 1.439075 72.27496
这里有一些可能性。除了 (3),他们只使用 data.table。所有方法都在一次操作中执行聚合和重塑,因此无需首先使用 by
。如果你真的出于某种原因确实想使用 by
那么这会起作用:
cast(data = DT[, sum(Weight), by = c(input.row, input.col)],
formula = paste(input.row, "~", input.col), fun.aggregate = sum, value = 'V1')
1) data.table::dcast
dcast(DT, paste(input.row, "~", input.col), sum, value.var = "Weight")
给予:
Region Female Gender diverse Male
1: Africa 32.95019 3.222125 77.50863
2: Americas 49.12787 0.000000 84.97214
3: Asia 41.04879 0.000000 55.43294
4: Europe 45.39469 4.296767 47.76714
5: Oceania 65.89198 1.439075 72.27496
2) xtabs xtabs
位于 R:
fo <- sprintf("Weight ~ %s + %s", input.row, input.col)
xtabs(fo, DT)
给予:
Gender
Region Female Gender diverse Male
Africa 32.950187 3.222125 77.508626
Americas 49.127873 0.000000 84.972137
Asia 41.048787 0.000000 55.432941
Europe 45.394693 4.296767 47.767138
Oceania 65.891983 1.439075 72.274955
3) reshape::cast 我们将使用 reshape 包,因为问题确实存在,但请注意,它已被 reshape2 包取代,在 reshape2 中,人们将使用 dcast
;然而,dcast
也按照 (1) 在 data.table 中实现。
cast(DT, paste(input.row, "~", input.col), sum, value = "Weight")
给予:
Region Female Gender diverse Male
1 Africa 32.95019 3.222125 77.50863
2 Americas 49.12787 0.000000 84.97214
3 Asia 41.04879 0.000000 55.43294
4 Europe 45.39469 4.296767 47.76714
5 Oceania 65.89198 1.439075 72.27496
4) 轻拍
tapply(DT$Weight, as.list(DT)[c(input.row, input.col)], sum, default = 0)
给予:
Gender
Region Female Gender diverse Male
Africa 32.95019 3.222125 77.50863
Americas 49.12787 0.000000 84.97214
Asia 41.04879 0.000000 55.43294
Europe 45.39469 4.296767 47.76714
Oceania 65.89198 1.439075 72.27496