R:data.table 中的旋转和小计?

R: pivoting & subtotals in data.table?

透视和小计是电子表格和 SQL 中常见的辅助步骤。

假设 data.table 的字段为 date, myCategory, revenue。假设您想知道天收入占所有收入的比例以及不同子组内的天收入比例使得

 b[,{
    #First auxiliary variable of all revenue
    totalRev = sum(revenue)                     #SUBGROUP OF ALL REV

    #Second auxiliary variable of revenue by date, syntax wrong! How to do this?
    {totalRev_date=sum(revenue), by=list(date)} #DIFFERENT SUBGROUP, by DATE's rev

    #Within the subgroup by date and myCategory, we will use 1st&2nd auxiliary vars
    .SD[,.(Revenue_prop_of_TOT=revenue/totalRev,
          ,Revenue_prop_of_DAY=revenue/totalRev_date)    ,by=list(myCategory,date)]
    },]

我们需要计算辅助金额,特定日期的所有收入和整个历史的所有收入。

最终结果应该是这样的:

date            myCategory       Revenue_prop_of_TOT         Revenue_prop_of_DAY
2019-01-01      Cat1             0.002                       0.2
...

在那里你看到辅助变量只是帮助函数。

如何在 R data.table 中旋转和计算小计?

希望我能正确理解您的意图,但如果您需要不同的输出,请在评论中告诉我。

b = data.table(date = rep(seq.Date(Sys.Date()-99, Sys.Date(), "days"), each=2), 
               myCategory = c("a", "b"), 
               revenue = rnorm(100, 200))


# global total, just create a constant
totalRev = b[, sum(revenue)]

# Total revenue at myCategory and date level / total Revenue
b[, Revenue_prop_of_TOT:=sum(revenue)/totalRev, by=.(myCategory, date)]

# you can calculate totalRev_date independently
b[, totalRev_date:=sum(revenue), by=date]

# If these are all the columns you have you don't need the sum(revenue) and by calls
b[, Revenue_prop_of_DAY:=sum(revenue)/totalRev_date, by=.(myCategory, date)]

最后我将它包装在一个函数中。

revenue_total <- function(b){ 
  totalRev = b[, sum(revenue)]
  b[, Revenue_prop_of_TOT:=sum(revenue)/totalRev, by=.(myCategory, date)]
  b[, totalRev_date:=sum(revenue), by=date]
  b[, Revenue_prop_of_DAY:=sum(revenue)/totalRev_date, by=.(myCategory, date)]
  b
}

b = revenue_total(b)

使用 data.table::cube 的另一个选项:

cb <- cube(DT, sum(value), by=c("date","category"), id=TRUE)

cb[grouping==0L, .(date, category,

    PropByDate = V1 / cb[grouping==1L][.SD, on="date", x.V1],

    PropByCategory = V1 / cb[grouping==2L][.SD, on="category", x.V1],

    PropByTotal = V1 / cb[grouping==3L, V1]
)]

输出:

   date category PropByDate PropByCategory PropByTotal
1:    1        1  0.3333333      0.2500000         0.1
2:    1        2  0.6666667      0.3333333         0.2
3:    2        1  0.4285714      0.7500000         0.3
4:    2        2  0.5714286      0.6666667         0.4

数据:

DT <- data.table(date=c(1, 1, 2, 2), category=c(1, 2, 1, 2), value=1:4)

#   date category value
#1:    1        1     1
#2:    1        2     2
#3:    2        1     3
#4:    2        2     4

R 中的透视和小计选项

  1. 魔方回答

  2. marbel here

  3. 评论的分组集