dcast() - 添加 R 中不存在的列
dcast() - adding a column that doesnt exist in R
我遇到了一个问题,我确信有一个简单的解决方案,但我找不到。我基本上总结了我的 table 以获得因子变量每个级别的值的总和:
NOdependants <- unique(claimsMonthly[policyID == policy, .(exposure = sum(exposure)),
by = c("productID", "Year", "product", "QualityCheck", "dependant")][order(Year)])
productID Year product QualityCheck dependant exposure
1: 1 2016 ELI18 0 EMPLOYEE 17.041096
2: 1 2016 ELI18 0 SPOUSE 40.484932
3: 1 2016 ELI18 0 CHILD 5.164384
然后我执行以下操作:
NOdependants <- dcast(NOdependants, productID + Year ~ dependant, value.var = "exposure", fill = 0, drop = FALSE, fun.aggregate = sum)
setnames(NOdependants, c("CHILD", "EMPLOYEE", "SPOUSE"), c("childno", "employeeno", "spouseno"), skip_absent=TRUE)
> NOdependants
productRank startYear childno employeeno spouseno
1: 1 2016 5.164384 17.041096 41.484932
到目前为止一切都很好。问题是当产品没有任何关于依赖因素之一的数据时。假设没有 children:
NOdependants <- unique(claimsMonthly[policyID == policy, .(exposure = sum(exposure)),
by = c("productID", "Year", "product", "QualityCheck", "dependant")][order(Year)])
productID Year product QualityCheck dependant exposure
1: 1 2016 ELI18 0 EMPLOYEE 17.041096
2: 1 2016 ELI18 0 SPOUSE 40.484932
然后我的 dcast 执行以下操作:
> NOdependants
productRank startYear employeeno spouseno
1: 1 2016 17.041096 41.484932
这对我来说是个问题,我需要所有三列。所以我需要人为地创建一个额外的列,以防因子水平没有数据(比如这里的 child),所以我会得到这样的东西:
> NOdependants
productRank startYear childno employeeno spouseno
1: 1 2016 0 17.041096 41.484932
现在我已经创建了一个工作区,我首先创建一个空的 data.table 然后使用 rbindlist
和 fill=0
来合并论文,但是必须有一些更简单的解决方案。
有什么想法吗?
注意:我正在处理大量数据,此操作是循环的一部分,将重复大约 80 次左右,因此理想情况下,高效的操作是可能的。
带数据的简化示例:
#
> claimsMonthly <- data.table(productID = c(rep(1,6), rep(2,3), rep(3,2)),
+ Year = c(rep(2015,9), 2016, 2016),
+ product = c(rep("ELI18",6), rep("JCI22",3), rep("ZDP01",2)),
+ dependant = c(rep(c("EMPLOYEE", "SPOUSE", "CHILD"), 3),"EMPLOYEE", "SPOUSE"),
+ QualityCheck = c(rep(0,11)),
+ exposure = c(abs(rnorm(11))))
>
> productIDs <- unique(claimsMonthly$productID)
> for(prod in productIDs){
+
+ NOdependants <- unique(claimsMonthly[ productID == prod, .(exposure = sum(exposure)),
+ by = c("productID", "Year", "product", "QualityCheck", "dependant")][order(Year)])
+
+ NOdependants <- dcast(NOdependants, productID + Year ~ dependant, value.var = "exposure", fill = 0, drop = FALSE, fun.aggregate = sum)
+ setnames(NOdependants, c("CHILD", "EMPLOYEE", "SPOUSE"), c("childno", "employeeno", "spouseno"), skip_absent=TRUE)
+
+ NOdependants[order(childno)]
+
+ }
Error in .checkTypos(e, names_x) :
Object 'childno' not found amongst productID, Year, employeeno, spouseno
您在 data.table 括号外使用 'unique' 可能会使 data.table 感到困惑。请参阅:https://www.rdocumentation.org/packages/data.table/versions/1.12.8/topics/duplicated
我想知道您的代码是否可以更简单并同样达到您的结果。 rdata.table 的一些优点在于它能够消除对循环和控制结构的需要。使用 'claimsMonthly':
的示例数据
claimsMonthly[, .(exposure = sum(exposure)),
.(productID,Year,product,QualityCheck,dependant)][
,dcast(.SD, productID + Year ~ dependant,
value.var = "exposure", drop = FALSE, fun.aggregate = sum)][
CHILD == 0 &
EMPLOYEE == 0 &
SPOUSE == 0,.(productID,Year,CHILD,EMPLOYEE,SPOUSE)]
productID Year CHILD EMPLOYEE SPOUSE
1: 1 2016 0 0 0
2: 2 2016 0 0 0
3: 3 2015 0 0 0
我遇到了一个问题,我确信有一个简单的解决方案,但我找不到。我基本上总结了我的 table 以获得因子变量每个级别的值的总和:
NOdependants <- unique(claimsMonthly[policyID == policy, .(exposure = sum(exposure)),
by = c("productID", "Year", "product", "QualityCheck", "dependant")][order(Year)])
productID Year product QualityCheck dependant exposure
1: 1 2016 ELI18 0 EMPLOYEE 17.041096
2: 1 2016 ELI18 0 SPOUSE 40.484932
3: 1 2016 ELI18 0 CHILD 5.164384
然后我执行以下操作:
NOdependants <- dcast(NOdependants, productID + Year ~ dependant, value.var = "exposure", fill = 0, drop = FALSE, fun.aggregate = sum)
setnames(NOdependants, c("CHILD", "EMPLOYEE", "SPOUSE"), c("childno", "employeeno", "spouseno"), skip_absent=TRUE)
> NOdependants
productRank startYear childno employeeno spouseno
1: 1 2016 5.164384 17.041096 41.484932
到目前为止一切都很好。问题是当产品没有任何关于依赖因素之一的数据时。假设没有 children:
NOdependants <- unique(claimsMonthly[policyID == policy, .(exposure = sum(exposure)),
by = c("productID", "Year", "product", "QualityCheck", "dependant")][order(Year)])
productID Year product QualityCheck dependant exposure
1: 1 2016 ELI18 0 EMPLOYEE 17.041096
2: 1 2016 ELI18 0 SPOUSE 40.484932
然后我的 dcast 执行以下操作:
> NOdependants
productRank startYear employeeno spouseno
1: 1 2016 17.041096 41.484932
这对我来说是个问题,我需要所有三列。所以我需要人为地创建一个额外的列,以防因子水平没有数据(比如这里的 child),所以我会得到这样的东西:
> NOdependants
productRank startYear childno employeeno spouseno
1: 1 2016 0 17.041096 41.484932
现在我已经创建了一个工作区,我首先创建一个空的 data.table 然后使用 rbindlist
和 fill=0
来合并论文,但是必须有一些更简单的解决方案。
有什么想法吗?
注意:我正在处理大量数据,此操作是循环的一部分,将重复大约 80 次左右,因此理想情况下,高效的操作是可能的。
带数据的简化示例:
#
> claimsMonthly <- data.table(productID = c(rep(1,6), rep(2,3), rep(3,2)),
+ Year = c(rep(2015,9), 2016, 2016),
+ product = c(rep("ELI18",6), rep("JCI22",3), rep("ZDP01",2)),
+ dependant = c(rep(c("EMPLOYEE", "SPOUSE", "CHILD"), 3),"EMPLOYEE", "SPOUSE"),
+ QualityCheck = c(rep(0,11)),
+ exposure = c(abs(rnorm(11))))
>
> productIDs <- unique(claimsMonthly$productID)
> for(prod in productIDs){
+
+ NOdependants <- unique(claimsMonthly[ productID == prod, .(exposure = sum(exposure)),
+ by = c("productID", "Year", "product", "QualityCheck", "dependant")][order(Year)])
+
+ NOdependants <- dcast(NOdependants, productID + Year ~ dependant, value.var = "exposure", fill = 0, drop = FALSE, fun.aggregate = sum)
+ setnames(NOdependants, c("CHILD", "EMPLOYEE", "SPOUSE"), c("childno", "employeeno", "spouseno"), skip_absent=TRUE)
+
+ NOdependants[order(childno)]
+
+ }
Error in .checkTypos(e, names_x) :
Object 'childno' not found amongst productID, Year, employeeno, spouseno
您在 data.table 括号外使用 'unique' 可能会使 data.table 感到困惑。请参阅:https://www.rdocumentation.org/packages/data.table/versions/1.12.8/topics/duplicated
我想知道您的代码是否可以更简单并同样达到您的结果。 rdata.table 的一些优点在于它能够消除对循环和控制结构的需要。使用 'claimsMonthly':
的示例数据claimsMonthly[, .(exposure = sum(exposure)),
.(productID,Year,product,QualityCheck,dependant)][
,dcast(.SD, productID + Year ~ dependant,
value.var = "exposure", drop = FALSE, fun.aggregate = sum)][
CHILD == 0 &
EMPLOYEE == 0 &
SPOUSE == 0,.(productID,Year,CHILD,EMPLOYEE,SPOUSE)]
productID Year CHILD EMPLOYEE SPOUSE
1: 1 2016 0 0 0
2: 2 2016 0 0 0
3: 3 2015 0 0 0