data.table .SD 操作:简明计算百分比变化
data.table operation with .SD: calculating percentage change concisely
我正在尝试使用 data.table 简明地计算一些百分比变化,但我在完全理解 .SD 操作的工作原理时遇到了一些麻烦。
假设我有以下 table
dt = structure(list(type = c("A", "A", "A", "B", "B", "B"), Year = c(2000L,
2005L, 2010L, 2000L, 2005L, 2010L), alpha = c(0.0364325563237498,
0.0401968159729988, 0.0357395587861466, 0.0317236054181487, 0.0328213742235379,
0.0294694430578336), beta = c(0.0364325563237498, 0.0401968159729988,
0.0357395587861466, 0.0317236054181487, 0.0328213742235379, 0.0294694430578336
)), .Names = c("type", "Year", "alpha", "beta"), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))
> dt
## type Year alpha beta
## 1: A 2000 0.03643256 0.03643256
## 2: A 2005 0.04019682 0.04019682
## 3: A 2010 0.03573956 0.03573956
## 4: B 2000 0.03172361 0.03172361
## 5: B 2005 0.03282137 0.03282137
## 6: B 2010 0.02946944 0.02946944
为了按类别计算 alpha 的百分比变化,我想出了以下代码:
dt[,change:=list(lapply(3:2,function(x)(.SD[x,alpha]/.SD[
(x-1),alpha]))),by=list(type)][][Year==2000,change:=NA]
但有些事情告诉我,他们可能是一种更简洁的方式。特别是如果想要对两列执行百分比更改,则以下内容将不起作用
dt[,c("changeAlpha","changeBeta"):=list(lapply(3:2,
function(x)(.SD[x]/.SD[(x-1)]))),by=list(type)][Year==2000,change:=NA][]
所以我采取了:
dt[,c("changeAlpha","changeBeta"):=list(
lapply(3:2,function(x)(.SD[x,alpha]/.SD[(x-1),alpha])),
lapply(3:2,function(x)(.SD[x,beta]/.SD[(x-1),beta]))),by=list(type)][
Year==2000,c("changeAlpha","changeBeta"):=list(NA,NA)][]
## type Year alpha beta changeAlpha changeBeta
## 1: A 2000 0.03643256 0.03643256 NA NA
## 2: A 2005 0.04019682 0.04019682 1.10332131557826 1.10332131557826
## 3: A 2010 0.03573956 0.03573956 0.889114172877617 0.889114172877617
## 4: B 2000 0.03172361 0.03172361 NA NA
## 5: B 2005 0.03282137 0.03282137 1.03460416276522 1.03460416276522
## 6: B 2010 0.02946944 0.02946944 0.897873527693412 0.897873527693412
但操作似乎是正确的,但收到了很多警告,导致我来到这里。
- 我的思路是完全错误的还是正确的操作方式?
您可以使用 data.table v1.9.6+
中的 shift
函数
定义你的函数
myFunc <- function(x) x/shift(x)
Select 您要为其计算百分比的列
cols <- c("alpha", "beta")
或者,如果您想 运行 在 所有 列中,除了前两个
cols <- names(dt)[-(1:2)]
运行 列上的函数
dt[, paste0("change", cols) := lapply(.SD, myFunc), by = type, .SDcols = cols][]
# type Year alpha beta changealpha changebeta
# 1: A 2000 0.03643256 0.03643256 NA NA
# 2: A 2005 0.04019682 0.04019682 1.1033213 1.1033213
# 3: A 2010 0.03573956 0.03573956 0.8891142 0.8891142
# 4: B 2000 0.03172361 0.03172361 NA NA
# 5: B 2005 0.03282137 0.03282137 1.0346042 1.0346042
# 6: B 2010 0.02946944 0.02946944 0.8978735 0.8978735
我正在尝试使用 data.table 简明地计算一些百分比变化,但我在完全理解 .SD 操作的工作原理时遇到了一些麻烦。
假设我有以下 table
dt = structure(list(type = c("A", "A", "A", "B", "B", "B"), Year = c(2000L,
2005L, 2010L, 2000L, 2005L, 2010L), alpha = c(0.0364325563237498,
0.0401968159729988, 0.0357395587861466, 0.0317236054181487, 0.0328213742235379,
0.0294694430578336), beta = c(0.0364325563237498, 0.0401968159729988,
0.0357395587861466, 0.0317236054181487, 0.0328213742235379, 0.0294694430578336
)), .Names = c("type", "Year", "alpha", "beta"), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))
> dt
## type Year alpha beta
## 1: A 2000 0.03643256 0.03643256
## 2: A 2005 0.04019682 0.04019682
## 3: A 2010 0.03573956 0.03573956
## 4: B 2000 0.03172361 0.03172361
## 5: B 2005 0.03282137 0.03282137
## 6: B 2010 0.02946944 0.02946944
为了按类别计算 alpha 的百分比变化,我想出了以下代码:
dt[,change:=list(lapply(3:2,function(x)(.SD[x,alpha]/.SD[
(x-1),alpha]))),by=list(type)][][Year==2000,change:=NA]
但有些事情告诉我,他们可能是一种更简洁的方式。特别是如果想要对两列执行百分比更改,则以下内容将不起作用
dt[,c("changeAlpha","changeBeta"):=list(lapply(3:2,
function(x)(.SD[x]/.SD[(x-1)]))),by=list(type)][Year==2000,change:=NA][]
所以我采取了:
dt[,c("changeAlpha","changeBeta"):=list(
lapply(3:2,function(x)(.SD[x,alpha]/.SD[(x-1),alpha])),
lapply(3:2,function(x)(.SD[x,beta]/.SD[(x-1),beta]))),by=list(type)][
Year==2000,c("changeAlpha","changeBeta"):=list(NA,NA)][]
## type Year alpha beta changeAlpha changeBeta
## 1: A 2000 0.03643256 0.03643256 NA NA
## 2: A 2005 0.04019682 0.04019682 1.10332131557826 1.10332131557826
## 3: A 2010 0.03573956 0.03573956 0.889114172877617 0.889114172877617
## 4: B 2000 0.03172361 0.03172361 NA NA
## 5: B 2005 0.03282137 0.03282137 1.03460416276522 1.03460416276522
## 6: B 2010 0.02946944 0.02946944 0.897873527693412 0.897873527693412
但操作似乎是正确的,但收到了很多警告,导致我来到这里。
- 我的思路是完全错误的还是正确的操作方式?
您可以使用 data.table v1.9.6+
中的shift
函数
定义你的函数
myFunc <- function(x) x/shift(x)
Select 您要为其计算百分比的列
cols <- c("alpha", "beta")
或者,如果您想 运行 在 所有 列中,除了前两个
cols <- names(dt)[-(1:2)]
运行 列上的函数
dt[, paste0("change", cols) := lapply(.SD, myFunc), by = type, .SDcols = cols][]
# type Year alpha beta changealpha changebeta
# 1: A 2000 0.03643256 0.03643256 NA NA
# 2: A 2005 0.04019682 0.04019682 1.1033213 1.1033213
# 3: A 2010 0.03573956 0.03573956 0.8891142 0.8891142
# 4: B 2000 0.03172361 0.03172361 NA NA
# 5: B 2005 0.03282137 0.03282137 1.0346042 1.0346042
# 6: B 2010 0.02946944 0.02946944 0.8978735 0.8978735