将用户输入传递给 data.table 中的 'by' 和 reshape - r 中的公式

Question

下面是我想做的一个例子。 eval(substitute(*)) 效果很好，如 here 所示，但会使代码更难阅读。我想知道是否有更好的东西我不知道。

我希望能够选择 table（最后）的行和列变量。所以，如果我有

input.col <- 'Gender'
input.row <- 'Region'

我希望能够将这些参数传递给数据 table 而不是像下面那样使用 Region 和 Gender。

library(data.table)
library(reshape)
set.seed(5)
DT <- data.table(Region = sample(x = c('Asia', 'Americas', 'Africa', 'Europe', 'Oceania'), size = 200, replace = T), Weight = runif(n = 200, min = 1, max = 5), Age = round(x = 10*rexp(n = 200, rate = 1), digits = 0), Gender = sample(x = c('Male', 'Female', 'Gender diverse'), size = 200, replace = T, prob = c(0.49, 0.49, 0.02)))
cast(data = DT[, sum(Weight), .(Region, Gender)], formula = Region~Gender, fun.aggregate = sum, value = 'V1')

我想进入以下table

Region   Female Gender diverse     Male
1   Africa 32.95019       3.222125 77.50863
2 Americas 49.12787       0.000000 84.97214
3     Asia 41.04879       0.000000 55.43294
4   Europe 45.39469       4.296767 47.76714
5  Oceania 65.89198       1.439075 72.27496

谢谢！

Answer 1

您可以使用get，然后重命名可以在公式中进一步使用的变量：

input.col <- 'Gender'
input.row <- 'Region'

dt <- cast(data = DT[, sum(Weight), .(row = get(input.row), col = get(input.col))], 
#                                     ^^^   ^^^             ^^^   ^^^  
           formula = row ~ col, fun.aggregate = sum, value = 'V1')

dt
#       row   Female Gender diverse     Male
#1   Africa 32.95019       3.222125 77.50863
#2 Americas 49.12787       0.000000 84.97214
#3     Asia 41.04879       0.000000 55.43294
#4   Europe 45.39469       4.296767 47.76714
#5  Oceania 65.89198       1.439075 72.27496

Answer 2

这里有一些可能性。除了 (3)，他们只使用 data.table。所有方法都在一次操作中执行聚合和重塑，因此无需首先使用 by。如果你真的出于某种原因确实想使用 by 那么这会起作用：

cast(data = DT[, sum(Weight), by = c(input.row, input.col)], 
     formula = paste(input.row, "~", input.col), fun.aggregate = sum, value = 'V1')

1) data.table::dcast

dcast(DT, paste(input.row, "~", input.col), sum, value.var = "Weight")

给予：

     Region   Female Gender diverse     Male
1:   Africa 32.95019       3.222125 77.50863
2: Americas 49.12787       0.000000 84.97214
3:     Asia 41.04879       0.000000 55.43294
4:   Europe 45.39469       4.296767 47.76714
5:  Oceania 65.89198       1.439075 72.27496

2) xtabs xtabs 位于 R:

的基数

fo <- sprintf("Weight ~ %s + %s", input.row, input.col)
xtabs(fo, DT)

给予：

          Gender
Region        Female Gender diverse      Male
  Africa   32.950187       3.222125 77.508626
  Americas 49.127873       0.000000 84.972137
  Asia     41.048787       0.000000 55.432941
  Europe   45.394693       4.296767 47.767138
  Oceania  65.891983       1.439075 72.274955

3) reshape::cast 我们将使用 reshape 包，因为问题确实存在，但请注意，它已被 reshape2 包取代，在 reshape2 中，人们将使用 dcast;然而，dcast 也按照 (1) 在 data.table 中实现。

cast(DT, paste(input.row, "~", input.col), sum, value = "Weight")

给予：

    Region   Female Gender diverse     Male
1   Africa 32.95019       3.222125 77.50863
2 Americas 49.12787       0.000000 84.97214
3     Asia 41.04879       0.000000 55.43294
4   Europe 45.39469       4.296767 47.76714
5  Oceania 65.89198       1.439075 72.27496

4) 轻拍

tapply(DT$Weight, as.list(DT)[c(input.row, input.col)], sum, default = 0)

给予：

          Gender
Region       Female Gender diverse     Male
  Africa   32.95019       3.222125 77.50863
  Americas 49.12787       0.000000 84.97214
  Asia     41.04879       0.000000 55.43294
  Europe   45.39469       4.296767 47.76714
  Oceania  65.89198       1.439075 72.27496

将用户输入传递给 data.table 中的 'by' 和 reshape - r 中的公式

Pass user input to 'by' in data.table and the formula in reshape - r

r

reshape

data.table