按组组织的 R 相对频率

Question

我有以下数据框：

> LikelyRenew_ReasonB %>%
    +   mutate(cum_Sum = ave(freq,Name,FUN = cumsum))
          Name x freq cum_Sum
 1       costC 1   10      10
 2       costC 2   11      21
 3       costC 3   17      38
 4       costC 4  149     187
 5   productsC 1   31      31
 6   productsC 2   40      71
 7   productsC 3   30     101
 8   productsC 4   86     187
 9     communC 1   51      51
 10    communC 2   50     101
 11    communC 3   34     135
 12    communC 4   52     187
 13 reimburseC 1   42      42
 14 reimburseC 2   26      68
 15 reimburseC 3   25      93
 16 reimburseC 4   94     187
 17    policyC 1   31      31
 18    policyC 2   25      56
 19    policyC 3   28      84
 20    policyC 4  103     187
 21  discountC 1    2       2
 22  discountC 2    2       4
 23  discountC 3    3       7
 24  discountC 4  180     187

变量如下所示：

> dput(head(LikelyRenew_ReasonB))
structure(list(Name = c("costC", "costC", "costC", "costC", "productsC", 
"productsC"), x = c(1, 2, 3, 4, 1, 2), freq = c(10L, 11L, 17L, 
149L, 31L, 40L)), .Names = c("Name", "x", "freq"), row.names = c(NA, 
6L), class = "data.frame")

我试图为每个组获取每个频率分数的相对频率，然后是该组的相对频率之和。我在下面放了一个我正在寻找的样本——前三行是它们的 freq/cum_Sum[x==4]。最后一行应该是这 3 行的总和。

这可能吗？？我完全被难住了。

          Name x freq cum_Sum    IdealOutput   *how i calculated IdealOutput
 1       costC 1   10      10         5.35       (10/187)
 2       costC 2   11      21         5.88       (11/187)
 3       costC 3   17      38         9.09       (17/187) 
 4       costC 4  149     187         20.32      (sum of above 3 values)

Answer 1

您可以尝试在 cum_Sum 上使用 dplyr::lag 来计算一组最后一行的 IdealOutput。

可以使用条件 row_number() == n()

找到组的最后一行

library(dplyr)

LikelyRenew_ReasonB %>% group_by(Name) %>%
  arrange(Name, x) %>%
  mutate(cum_Sum = cumsum(freq)) %>%
  mutate(IdealOutput = ifelse(row_number() == n(), 
                       lag(cum_Sum)/sum(freq), freq/sum(freq))) 

#   # A tibble: 6 x 5
#   # Groups: Name [2]
#   Name          x  freq cum_Sum IdealOutput
# <chr>     <dbl> <int>   <int>       <dbl>
# 1 costC      1.00    10      10      0.0535
# 2 costC      2.00    11      21      0.0588
# 3 costC      3.00    17      38      0.0909
# 4 costC      4.00   149     187      0.203 
# 5 productsC  1.00    31      31      0.437 
# 6 productsC  2.00    40      71      0.437

数据：

LikelyRenew_ReasonB  <- structure(list(Name = c("costC", "costC", "costC", "costC", "productsC", 
"productsC"), x = c(1, 2, 3, 4, 1, 2), freq = c(10L, 11L, 17L, 
149L, 31L, 40L)), .Names = c("Name", "x", "freq"), row.names = c(NA, 
6L), class = "data.frame")

按组组织的 R 相对频率

R relative frequencies organized by Group

r

group-by

dplyr

frequency

summarize