组内的新变量，每个观察值的总和

Question

我有年、月、商品和数量的贸易数据。我想创建总量，x_total，每种商品，每月，每年，并将其显示为 新变量该组中的每个观察值都具有相同的数字。

例如：

我有：

Year Month Commodity X_Quantity
2010   1     apples    10
2010   1     bananas    5 
2010   2     apples     9 
2010   2     bananas    4

我想看的是：

Year Month Commodity X_Quantity X_total
2010   1     apples     10        15
2010   1     bananas     5        15
2010   2     apples      9        13
2010   2     bananas     4        13

到目前为止我的代码如下所示：

totals <- original.data [c("Year", "Month", "Commodity", "X_Quantity")] %>%
  group_by(Year, Month, Commodity)%>%
  summarise(X_total=sum(X_Quantity)) %>%
  arrange(year, month, desc(X_total)) %>%
  ungroup()

我一直在使用 mutate 来创建以前的变量。

我希望保留 X_Quantity 变量，最终通过除以创建 X_share 变量每个商品的数量 X_total。

我希望说的有道理，请原谅我犯的任何 posting 错误（这是我的第一个 post）。

提前致谢。

Answer 1

试试这个。您需要按 Year 和 Month 分组以获得预期的输出。这里的代码：

library(dplyr)
#Code
newdf <- Totals %>% group_by(Year,Month) %>% mutate(X_total=sum(X_Quantity),
                                           X_share=X_Quantity/X_total)

输出：

# A tibble: 4 x 6
# Groups:   Year, Month [2]
   Year Month Commodity X_Quantity X_total X_share
  <int> <int> <chr>          <int>   <int>   <dbl>
1  2010     1 apples            10      15   0.667
2  2010     1 bananas            5      15   0.333
3  2010     2 apples             9      13   0.692
4  2010     2 bananas            4      13   0.308

使用了一些数据：

#Data
Totals <- structure(list(Year = c(2010L, 2010L, 2010L, 2010L), Month = c(1L, 
1L, 2L, 2L), Commodity = c("apples", "bananas", "apples", "bananas"
), X_Quantity = c(10L, 5L, 9L, 4L)), class = "data.frame", row.names = c(NA, 
-4L))

组内的新变量，每个观察值的总和

New variable, within groups, sum for each observation

group-by

r

dplyr

summarize