按组插入新观察值,即 r 中的总和(或加权总和)
Insert new observation by group that is a sum(or weighted sum) in r
我对 R 还是个新手,很多事情仍然很难执行。这里的社区非常有帮助!我还有另一个问题。
1. 为每个组创建一个新的观察值,它是某些变量的总和(或加权总和)
2. 为有时包含 NA 的变量创建加权和
我的数据集:
df = structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L), ID_name = c("AA", "AA", "BB", "BB", "CC","CC", "DD","DD","DD"),
Volume = c(10L, 20L, 30L, 50L, 50L, 40L, 20L,
30L, 10L), Score= c(0.1L, 0.3L, 0.5L, NA, 0.6L, NA,
0.6L, 0.2L, 0.6L)).Names = c("ID", "ID_name","Volume","Score"), class = "data.frame", row.names = c(NA, -9L))
我要
1.Create 每个唯一 ID 的新观察,即 ID 1、ID 2、ID 3 和 ID 4
2。这些新观察结果如下:
ID ID_name 成交量得分(加权平均)
1AA 30(即10+20)(10*0.1+0.3*20)/(10+20)=0.23
2 BB 80 (30+50) (30*0.5)/30=0.5 (NA行在分数计算中被忽略)
3 CC 90 (50+40) (60*0.6)/60=0.6(分数计算时忽略NA行)
4DD 60(20+30+10)(20*0.6+30*0.2+10*0.6)/60=0.4
我尝试了 mutate 函数,但似乎不起作用。任何线索将不胜感激。
谢谢
library(dplyr)
df = data.frame(ID = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L),
ID_name = c("AA", "AA", "BB", "BB", "CC", "CC", "DD", "DD", "DD"),
Volume = c(10L, 20L, 30L, 50L, 50L, 40L, 20L, 30L, 10L),
Score = c(0.1, 0.3, 0.5, NA, 0.6, NA, 0.6, 0.2, 0.6))
df %>%
mutate(HasScore = ifelse(is.na(Score), 0, 1)) %>%
group_by(ID, ID_name) %>%
summarise(WA = sum(Volume*Score, na.rm = T)/sum(Volume*HasScore),
Volume = sum(Volume)) %>%
ungroup()
# # A tibble: 4 x 4
# ID ID_name WA Volume
# <int> <fctr> <dbl> <int>
# 1 1 AA 0.2333333 30
# 2 2 BB 0.5000000 80
# 3 3 CC 0.6000000 90
# 4 4 DD 0.4000000 60
我对 R 还是个新手,很多事情仍然很难执行。这里的社区非常有帮助!我还有另一个问题。 1. 为每个组创建一个新的观察值,它是某些变量的总和(或加权总和) 2. 为有时包含 NA 的变量创建加权和
我的数据集:
df = structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L), ID_name = c("AA", "AA", "BB", "BB", "CC","CC", "DD","DD","DD"),
Volume = c(10L, 20L, 30L, 50L, 50L, 40L, 20L,
30L, 10L), Score= c(0.1L, 0.3L, 0.5L, NA, 0.6L, NA,
0.6L, 0.2L, 0.6L)).Names = c("ID", "ID_name","Volume","Score"), class = "data.frame", row.names = c(NA, -9L))
我要 1.Create 每个唯一 ID 的新观察,即 ID 1、ID 2、ID 3 和 ID 4
2。这些新观察结果如下: ID ID_name 成交量得分(加权平均) 1AA 30(即10+20)(10*0.1+0.3*20)/(10+20)=0.23 2 BB 80 (30+50) (30*0.5)/30=0.5 (NA行在分数计算中被忽略) 3 CC 90 (50+40) (60*0.6)/60=0.6(分数计算时忽略NA行) 4DD 60(20+30+10)(20*0.6+30*0.2+10*0.6)/60=0.4
我尝试了 mutate 函数,但似乎不起作用。任何线索将不胜感激。 谢谢
library(dplyr)
df = data.frame(ID = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L),
ID_name = c("AA", "AA", "BB", "BB", "CC", "CC", "DD", "DD", "DD"),
Volume = c(10L, 20L, 30L, 50L, 50L, 40L, 20L, 30L, 10L),
Score = c(0.1, 0.3, 0.5, NA, 0.6, NA, 0.6, 0.2, 0.6))
df %>%
mutate(HasScore = ifelse(is.na(Score), 0, 1)) %>%
group_by(ID, ID_name) %>%
summarise(WA = sum(Volume*Score, na.rm = T)/sum(Volume*HasScore),
Volume = sum(Volume)) %>%
ungroup()
# # A tibble: 4 x 4
# ID ID_name WA Volume
# <int> <fctr> <dbl> <int>
# 1 1 AA 0.2333333 30
# 2 2 BB 0.5000000 80
# 3 3 CC 0.6000000 90
# 4 4 DD 0.4000000 60