基于 Unique ID 的 cumsum 和 product

cumsum and product based on Unique ID

我正在处理大型数据集以计算 R 中的单个值。我相信 CUMSUM 和 cum 乘积会起作用。但是我不知道怎么办

county_id <- c(1,1,1,1,2,2,2,3,3)
res <- c(2,3,2,4,2,4,3,3,2)

我需要一个可以简单地给我一个值的函数,如下所示 对于每个 county_id,那么我需要总数。 例如,对于 county_id=1,res 的总计手动计算为

2(3+2+4)+3(2+4)+2(4)

for county_id=2 res 的总计手动计算为

2(4+3)+4(3)

for county_id=3 res 的总计手动计算为

3(2)

然后它将所有这些汇总到一个变量中

44+26+6=76

注意我的 county_id 运行 来自 1:47 并且每个 county_id 最多可以有 200 res

谢谢

这是使用 tidyverse 函数执行此操作的一种方法。

对于每个 county_id,我们将当前 res 值与其后 res 值的 sum 相乘。

library(dplyr)
library(purrr)

df1 <- df %>%
         group_by(county_id) %>%
         summarise(result = sum(map_dbl(row_number(), 
                           ~res[.x] * sum(res[(.x + 1):n()])), na.rm = TRUE))

df1
#  county_id result
#      <dbl>  <dbl>
#1         1     44
#2         2     26
#3         3      6

要获得总数 sum,您可以这样做:

sum(df1$result)
#[1] 76

数据

county_id <- c(1,1,1,1,2,2,2,3,3)
res <- c(2,3,2,4,2,4,3,3,2)
df <- data.frame(county_id, res)

您可以将 aggregatecumsum 一起使用,例如:

x <- aggregate(res, list(county_id)
 , function(x) sum(rev(cumsum(rev(x[-1])))*x[-length(x)]))
#Group.1  x
#1       1 44
#2       2 26
#3       3  6
sum(x[,2])
#[1] 76

您可以对两两组合的乘积求和:

library(dplyr)

dat %>%
  group_by(county_id) %>%
  summarise(x = sum(combn(res, 2, FUN = prod)))

# A tibble: 3 x 2
  county_id     x
      <dbl> <dbl>
1         1    44
2         2    26
3         3     6

基数 R:

aggregate(res ~ county_id, dat, FUN = function(x) sum(combn(x, 2, FUN = prod)))
Another option is to use SPSS syntax

// You need to count the number of variables with valid responses
count x1=var1 to var4(1 thr hi).
execute.

// 1st thing is to declare a variable that will hold your cumulative sum
// Declare your variables in terms of a vector
//You then loop twice. The 1st loop being from the 1st variable to the number of 
//variables with data (x1). The 2nd loop will be from the 1st variable to the    `
//variable in (1st loop-1) for all variables with data.`
//Lastly you need to get a cumulative sum based on your formulae
// This syntax can be replicated in other software.

compute index1=0.
vector x=var1 to var4.
loop #i=1 to x1.
loop #j=1 to #i-1 if not missing(x(#i)).
compute index1=index1+(x(#j)*sum(x(#i))).
end loop.
end loop.
execute.