基于 Unique ID 的 cumsum 和 product
cumsum and product based on Unique ID
我正在处理大型数据集以计算 R 中的单个值。我相信 CUMSUM 和 cum 乘积会起作用。但是我不知道怎么办
county_id <- c(1,1,1,1,2,2,2,3,3)
res <- c(2,3,2,4,2,4,3,3,2)
我需要一个可以简单地给我一个值的函数,如下所示
对于每个 county_id,那么我需要总数。
例如,对于 county_id=1,res 的总计手动计算为
2(3+2+4)+3(2+4)+2(4)
for county_id=2 res 的总计手动计算为
2(4+3)+4(3)
for county_id=3 res 的总计手动计算为
3(2)
然后它将所有这些汇总到一个变量中
44+26+6=76
注意我的 county_id 运行 来自 1:47 并且每个 county_id 最多可以有 200 res
谢谢
这是使用 tidyverse
函数执行此操作的一种方法。
对于每个 county_id
,我们将当前 res
值与其后 res
值的 sum
相乘。
library(dplyr)
library(purrr)
df1 <- df %>%
group_by(county_id) %>%
summarise(result = sum(map_dbl(row_number(),
~res[.x] * sum(res[(.x + 1):n()])), na.rm = TRUE))
df1
# county_id result
# <dbl> <dbl>
#1 1 44
#2 2 26
#3 3 6
要获得总数 sum
,您可以这样做:
sum(df1$result)
#[1] 76
数据
county_id <- c(1,1,1,1,2,2,2,3,3)
res <- c(2,3,2,4,2,4,3,3,2)
df <- data.frame(county_id, res)
您可以将 aggregate
与 cumsum
一起使用,例如:
x <- aggregate(res, list(county_id)
, function(x) sum(rev(cumsum(rev(x[-1])))*x[-length(x)]))
#Group.1 x
#1 1 44
#2 2 26
#3 3 6
sum(x[,2])
#[1] 76
您可以对两两组合的乘积求和:
library(dplyr)
dat %>%
group_by(county_id) %>%
summarise(x = sum(combn(res, 2, FUN = prod)))
# A tibble: 3 x 2
county_id x
<dbl> <dbl>
1 1 44
2 2 26
3 3 6
基数 R:
aggregate(res ~ county_id, dat, FUN = function(x) sum(combn(x, 2, FUN = prod)))
Another option is to use SPSS syntax
// You need to count the number of variables with valid responses
count x1=var1 to var4(1 thr hi).
execute.
// 1st thing is to declare a variable that will hold your cumulative sum
// Declare your variables in terms of a vector
//You then loop twice. The 1st loop being from the 1st variable to the number of
//variables with data (x1). The 2nd loop will be from the 1st variable to the `
//variable in (1st loop-1) for all variables with data.`
//Lastly you need to get a cumulative sum based on your formulae
// This syntax can be replicated in other software.
compute index1=0.
vector x=var1 to var4.
loop #i=1 to x1.
loop #j=1 to #i-1 if not missing(x(#i)).
compute index1=index1+(x(#j)*sum(x(#i))).
end loop.
end loop.
execute.
我正在处理大型数据集以计算 R 中的单个值。我相信 CUMSUM 和 cum 乘积会起作用。但是我不知道怎么办
county_id <- c(1,1,1,1,2,2,2,3,3)
res <- c(2,3,2,4,2,4,3,3,2)
我需要一个可以简单地给我一个值的函数,如下所示 对于每个 county_id,那么我需要总数。 例如,对于 county_id=1,res 的总计手动计算为
2(3+2+4)+3(2+4)+2(4)
for county_id=2 res 的总计手动计算为
2(4+3)+4(3)
for county_id=3 res 的总计手动计算为
3(2)
然后它将所有这些汇总到一个变量中
44+26+6=76
注意我的 county_id 运行 来自 1:47 并且每个 county_id 最多可以有 200 res
谢谢
这是使用 tidyverse
函数执行此操作的一种方法。
对于每个 county_id
,我们将当前 res
值与其后 res
值的 sum
相乘。
library(dplyr)
library(purrr)
df1 <- df %>%
group_by(county_id) %>%
summarise(result = sum(map_dbl(row_number(),
~res[.x] * sum(res[(.x + 1):n()])), na.rm = TRUE))
df1
# county_id result
# <dbl> <dbl>
#1 1 44
#2 2 26
#3 3 6
要获得总数 sum
,您可以这样做:
sum(df1$result)
#[1] 76
数据
county_id <- c(1,1,1,1,2,2,2,3,3)
res <- c(2,3,2,4,2,4,3,3,2)
df <- data.frame(county_id, res)
您可以将 aggregate
与 cumsum
一起使用,例如:
x <- aggregate(res, list(county_id)
, function(x) sum(rev(cumsum(rev(x[-1])))*x[-length(x)]))
#Group.1 x
#1 1 44
#2 2 26
#3 3 6
sum(x[,2])
#[1] 76
您可以对两两组合的乘积求和:
library(dplyr)
dat %>%
group_by(county_id) %>%
summarise(x = sum(combn(res, 2, FUN = prod)))
# A tibble: 3 x 2
county_id x
<dbl> <dbl>
1 1 44
2 2 26
3 3 6
基数 R:
aggregate(res ~ county_id, dat, FUN = function(x) sum(combn(x, 2, FUN = prod)))
Another option is to use SPSS syntax
// You need to count the number of variables with valid responses
count x1=var1 to var4(1 thr hi).
execute.
// 1st thing is to declare a variable that will hold your cumulative sum
// Declare your variables in terms of a vector
//You then loop twice. The 1st loop being from the 1st variable to the number of
//variables with data (x1). The 2nd loop will be from the 1st variable to the `
//variable in (1st loop-1) for all variables with data.`
//Lastly you need to get a cumulative sum based on your formulae
// This syntax can be replicated in other software.
compute index1=0.
vector x=var1 to var4.
loop #i=1 to x1.
loop #j=1 to #i-1 if not missing(x(#i)).
compute index1=index1+(x(#j)*sum(x(#i))).
end loop.
end loop.
execute.