使用 R 的数据框中基于堆栈的列总和
Stack Based Column Sum in a data frame using R
我现有的数据框看起来如下所示
NAV_Date NAV Year Day Units Amount Balance_Units
2013-06-01 282.5 2013 Saturday 3.540 1000 3.540
2013-06-08 279.3 2013 Saturday 3.581 1000 3.581
2013-06-15 276.0 2013 Saturday 3.623 1000 3.623
2013-06-22 261.6 2013 Saturday 3.822 1000 3.822
2013-06-29 273.3 2013 Saturday 3.659 1000 3.659
我希望我的新数据框有 Balance_Units 列具有如下使用 R 语言给出的条目
即余额单位栏应为前值与现值之和
这需要在数据框列表上完成
NAV_Date NAV Year Day Units Amount Balance_Units
2013-06-01 282.5 2013 Saturday 3.540 1000 3.540
2013-06-08 279.3 2013 Saturday 3.581 1000 7.121
2013-06-15 276.0 2013 Saturday 3.623 1000 10.744
2013-06-22 261.6 2013 Saturday 3.822 1000 14.566
2013-06-29 273.3 2013 Saturday 3.659 1000 18.225
我试过了,但是没用
for( i in 1:length(W)) {
W[[i]]$Units = 1000/W[[i]]$NAV
W[[i]]$Amount = 1000
W[[i]]$Balance_Units = 0
W[[i]]$Balance_Units = W[[i]]$Units + W[[i]]$Balance_Units
}
这可以通过 base R
中的一个方便的函数 cumsum
来完成
df1$Balance_Units <- cumsum(df1$Balance_Units)
使用dplyr
,可以在mutate
内创建
library(dplyr)
df1 %>%
mutate(Balance_Units = cumsum(Balance_Units))
如果 'W' 是 data.frame
的 list
,我们可以使用 lapply
W <- lapply(W, transform, Balance_Units = cumsum(Balance_Units))
数据
df1 <- structure(list(NAV_Date = c("2013-06-01", "2013-06-08", "2013-06-15",
"2013-06-22", "2013-06-29"), NAV = c(282.5, 279.3, 276, 261.6,
273.3), Year = c(2013L, 2013L, 2013L, 2013L, 2013L), Day = c("Saturday",
"Saturday", "Saturday", "Saturday", "Saturday"), Units = c(3.54,
3.581, 3.623, 3.822, 3.659), Amount = c(1000L, 1000L, 1000L,
1000L, 1000L), Balance_Units = c(3.54, 3.581, 3.623, 3.822, 3.659
)), class = "data.frame", row.names = c(NA, -5L))
这是一个 data.table
解决方案。我也为顺序总和添加了一些东西..
library(data.table)
> dat
NAV_Date NAV Year Day Units Amount Balance_Units
1: 2013-06-01 282.5 2013 Saturday 3.540 1000 3.540
2: 2013-06-08 279.3 2013 Saturday 3.581 1000 3.581
3: 2013-06-15 276.0 2013 Saturday 3.623 1000 3.623
4: 2013-06-22 261.6 2013 Saturday 3.822 1000 3.822
5: 2013-06-29 273.3 2013 Saturday 3.659 1000 3.659
# Cumulative sum
> dat[, cumulative_sum := cumsum(Balance_Units)]
> dat
NAV_Date NAV Year Day Units Amount Balance_Units cumulative_sum
1: 2013-06-01 282.5 2013 Saturday 3.540 1000 3.540 3.540
2: 2013-06-08 279.3 2013 Saturday 3.581 1000 3.581 7.121
3: 2013-06-15 276.0 2013 Saturday 3.623 1000 3.623 10.744
4: 2013-06-22 261.6 2013 Saturday 3.822 1000 3.822 14.566
5: 2013-06-29 273.3 2013 Saturday 3.659 1000 3.659 18.225
# Sequential sum
> dat[, sequential_sum := Balance_Units + shift(x = Balance_Units, fill = 0)]
> dat
NAV_Date NAV Year Day Units Amount Balance_Units cumulative_sum sequential_sum
1: 2013-06-01 282.5 2013 Saturday 3.540 1000 3.540 3.540 3.540
2: 2013-06-08 279.3 2013 Saturday 3.581 1000 3.581 7.121 7.121
3: 2013-06-15 276.0 2013 Saturday 3.623 1000 3.623 10.744 7.204
4: 2013-06-22 261.6 2013 Saturday 3.822 1000 3.822 14.566 7.445
5: 2013-06-29 273.3 2013 Saturday 3.659 1000 3.659 18.225 7.481
我现有的数据框看起来如下所示
NAV_Date NAV Year Day Units Amount Balance_Units
2013-06-01 282.5 2013 Saturday 3.540 1000 3.540
2013-06-08 279.3 2013 Saturday 3.581 1000 3.581
2013-06-15 276.0 2013 Saturday 3.623 1000 3.623
2013-06-22 261.6 2013 Saturday 3.822 1000 3.822
2013-06-29 273.3 2013 Saturday 3.659 1000 3.659
我希望我的新数据框有 Balance_Units 列具有如下使用 R 语言给出的条目 即余额单位栏应为前值与现值之和 这需要在数据框列表上完成
NAV_Date NAV Year Day Units Amount Balance_Units
2013-06-01 282.5 2013 Saturday 3.540 1000 3.540
2013-06-08 279.3 2013 Saturday 3.581 1000 7.121
2013-06-15 276.0 2013 Saturday 3.623 1000 10.744
2013-06-22 261.6 2013 Saturday 3.822 1000 14.566
2013-06-29 273.3 2013 Saturday 3.659 1000 18.225
我试过了,但是没用
for( i in 1:length(W)) {
W[[i]]$Units = 1000/W[[i]]$NAV
W[[i]]$Amount = 1000
W[[i]]$Balance_Units = 0
W[[i]]$Balance_Units = W[[i]]$Units + W[[i]]$Balance_Units
}
这可以通过 base R
cumsum
来完成
df1$Balance_Units <- cumsum(df1$Balance_Units)
使用dplyr
,可以在mutate
library(dplyr)
df1 %>%
mutate(Balance_Units = cumsum(Balance_Units))
如果 'W' 是 data.frame
的 list
,我们可以使用 lapply
W <- lapply(W, transform, Balance_Units = cumsum(Balance_Units))
数据
df1 <- structure(list(NAV_Date = c("2013-06-01", "2013-06-08", "2013-06-15",
"2013-06-22", "2013-06-29"), NAV = c(282.5, 279.3, 276, 261.6,
273.3), Year = c(2013L, 2013L, 2013L, 2013L, 2013L), Day = c("Saturday",
"Saturday", "Saturday", "Saturday", "Saturday"), Units = c(3.54,
3.581, 3.623, 3.822, 3.659), Amount = c(1000L, 1000L, 1000L,
1000L, 1000L), Balance_Units = c(3.54, 3.581, 3.623, 3.822, 3.659
)), class = "data.frame", row.names = c(NA, -5L))
这是一个 data.table
解决方案。我也为顺序总和添加了一些东西..
library(data.table)
> dat
NAV_Date NAV Year Day Units Amount Balance_Units
1: 2013-06-01 282.5 2013 Saturday 3.540 1000 3.540
2: 2013-06-08 279.3 2013 Saturday 3.581 1000 3.581
3: 2013-06-15 276.0 2013 Saturday 3.623 1000 3.623
4: 2013-06-22 261.6 2013 Saturday 3.822 1000 3.822
5: 2013-06-29 273.3 2013 Saturday 3.659 1000 3.659
# Cumulative sum
> dat[, cumulative_sum := cumsum(Balance_Units)]
> dat
NAV_Date NAV Year Day Units Amount Balance_Units cumulative_sum
1: 2013-06-01 282.5 2013 Saturday 3.540 1000 3.540 3.540
2: 2013-06-08 279.3 2013 Saturday 3.581 1000 3.581 7.121
3: 2013-06-15 276.0 2013 Saturday 3.623 1000 3.623 10.744
4: 2013-06-22 261.6 2013 Saturday 3.822 1000 3.822 14.566
5: 2013-06-29 273.3 2013 Saturday 3.659 1000 3.659 18.225
# Sequential sum
> dat[, sequential_sum := Balance_Units + shift(x = Balance_Units, fill = 0)]
> dat
NAV_Date NAV Year Day Units Amount Balance_Units cumulative_sum sequential_sum
1: 2013-06-01 282.5 2013 Saturday 3.540 1000 3.540 3.540 3.540
2: 2013-06-08 279.3 2013 Saturday 3.581 1000 3.581 7.121 7.121
3: 2013-06-15 276.0 2013 Saturday 3.623 1000 3.623 10.744 7.204
4: 2013-06-22 261.6 2013 Saturday 3.822 1000 3.822 14.566 7.445
5: 2013-06-29 273.3 2013 Saturday 3.659 1000 3.659 18.225 7.481