如何进行汇总,然后将结果乘以组?
How do I make a summary and then multiply the result by group?
这是我的数据框。 Country1 代表居住在德国的人,Country 2 代表他们在搬到 Country1 之前居住过 5 年的国家。
Country1
Country2
Weight
obs
Germany
Germany
4
1
Germany
Germany
119
2
France
Germany
3
3
France
Germany
2
4
Italy
France
1
5
基本上我想要的是总结每个组合的列权重并乘以观察值(由 obs 列表示。例如,在第一行中我有德国到德国的组合所以什么我想要的是对 Weight (119+4=123) 列的权重求和,然后将此总和 (123* 1=123) 的结果乘以 Obs (1) 列(在第一行中)的相应观察值. 对于第二行,德国的权重汇总是 (119+4=123),在这种情况下,这个结果必须乘以这一行的观察结果 (123* 2=246)。第三行,权重总和为 (3+2=5),然后将此结果乘以该行的观察值 (5* 3=15),依此类推。
我想要的输出由 x 列表示,它应该是这样的。
Country1
Country2
Weight
obs
x
Germany
Germany
4
1
123
Germany
Germany
119
2
246
France
Germany
3
3
15
France
Germany
2
4
20
Italy
France
1
5
5
我尝试应用的公式也是这个。
试试这个:
library(dplyr)
#Code
new <- df %>% group_by(Country1) %>%
mutate(x=sum(Weight)*obs)
输出:
# A tibble: 5 x 5
# Groups: Country1 [3]
Country1 Country2 Weight obs x
<chr> <chr> <int> <int> <int>
1 Germany Germany 4 1 123
2 Germany Germany 119 2 246
3 France Germany 3 3 15
4 France Germany 2 4 20
5 Italy France 1 5 5
使用了一些数据:
#Data
df <- structure(list(Country1 = c("Germany", "Germany", "France", "France",
"Italy"), Country2 = c("Germany", "Germany", "Germany", "Germany",
"France"), Weight = c(4L, 119L, 3L, 2L, 1L), obs = 1:5), class = "data.frame", row.names = c(NA,
-5L))
我们可以使用data.table
方法
library(data.table)
setDT(df1)[, x := sum(Weight) *obs, by = Country1][]
-输出
# Country1 Country2 Weight obs x
#1: Germany Germany 4 1 123
#2: Germany Germany 119 2 246
#3: France Germany 3 3 15
#4: France Germany 2 4 20
#5: Italy France 1 5 5
或使用 base R
和 ave
df1$x <- with(df1, ave(Weight, Country1, FUN = sum) * obs)
数据
df1 <- structure(list(Country1 = c("Germany", "Germany", "France", "France",
"Italy"), Country2 = c("Germany", "Germany", "Germany", "Germany",
"France"), Weight = c(4L, 119L, 3L, 2L, 1L), obs = 1:5),
class = "data.frame", row.names = c(NA,
-5L))
你也可以这样解决:
df1$x <- tapply(df1$Weight, df1$Country1, sum)[df1$Country1] * df1$obs
Country1 Country2 Weight obs x
1 Germany Germany 4 1 123
2 Germany Germany 119 2 246
3 France Germany 3 3 15
4 France Germany 2 4 20
5 Italy France 1 5 5
这是我的数据框。 Country1 代表居住在德国的人,Country 2 代表他们在搬到 Country1 之前居住过 5 年的国家。
Country1 | Country2 | Weight | obs |
---|---|---|---|
Germany | Germany | 4 | 1 |
Germany | Germany | 119 | 2 |
France | Germany | 3 | 3 |
France | Germany | 2 | 4 |
Italy | France | 1 | 5 |
基本上我想要的是总结每个组合的列权重并乘以观察值(由 obs 列表示。例如,在第一行中我有德国到德国的组合所以什么我想要的是对 Weight (119+4=123) 列的权重求和,然后将此总和 (123* 1=123) 的结果乘以 Obs (1) 列(在第一行中)的相应观察值. 对于第二行,德国的权重汇总是 (119+4=123),在这种情况下,这个结果必须乘以这一行的观察结果 (123* 2=246)。第三行,权重总和为 (3+2=5),然后将此结果乘以该行的观察值 (5* 3=15),依此类推。
我想要的输出由 x 列表示,它应该是这样的。
Country1 | Country2 | Weight | obs | x |
---|---|---|---|---|
Germany | Germany | 4 | 1 | 123 |
Germany | Germany | 119 | 2 | 246 |
France | Germany | 3 | 3 | 15 |
France | Germany | 2 | 4 | 20 |
Italy | France | 1 | 5 | 5 |
我尝试应用的公式也是这个。
试试这个:
library(dplyr)
#Code
new <- df %>% group_by(Country1) %>%
mutate(x=sum(Weight)*obs)
输出:
# A tibble: 5 x 5
# Groups: Country1 [3]
Country1 Country2 Weight obs x
<chr> <chr> <int> <int> <int>
1 Germany Germany 4 1 123
2 Germany Germany 119 2 246
3 France Germany 3 3 15
4 France Germany 2 4 20
5 Italy France 1 5 5
使用了一些数据:
#Data
df <- structure(list(Country1 = c("Germany", "Germany", "France", "France",
"Italy"), Country2 = c("Germany", "Germany", "Germany", "Germany",
"France"), Weight = c(4L, 119L, 3L, 2L, 1L), obs = 1:5), class = "data.frame", row.names = c(NA,
-5L))
我们可以使用data.table
方法
library(data.table)
setDT(df1)[, x := sum(Weight) *obs, by = Country1][]
-输出
# Country1 Country2 Weight obs x
#1: Germany Germany 4 1 123
#2: Germany Germany 119 2 246
#3: France Germany 3 3 15
#4: France Germany 2 4 20
#5: Italy France 1 5 5
或使用 base R
和 ave
df1$x <- with(df1, ave(Weight, Country1, FUN = sum) * obs)
数据
df1 <- structure(list(Country1 = c("Germany", "Germany", "France", "France",
"Italy"), Country2 = c("Germany", "Germany", "Germany", "Germany",
"France"), Weight = c(4L, 119L, 3L, 2L, 1L), obs = 1:5),
class = "data.frame", row.names = c(NA,
-5L))
你也可以这样解决:
df1$x <- tapply(df1$Weight, df1$Country1, sum)[df1$Country1] * df1$obs
Country1 Country2 Weight obs x
1 Germany Germany 4 1 123
2 Germany Germany 119 2 246
3 France Germany 3 3 15
4 France Germany 2 4 20
5 Italy France 1 5 5