如何使用变量查找在 R 中创建新列? R编程
How do I create a new column in R with variable look up? R programming
我有一个数据 table 看起来像:
Cause of Death Ethnicity Count
1: ACCIDENTS EXCEPT DRUG POISONING ASIAN & PACIFIC ISLANDER 1368
2: ACCIDENTS EXCEPT DRUG POISONING HISPANIC 3387
3: ACCIDENTS EXCEPT DRUG POISONING NON-HISPANIC BLACK 3240
4: ACCIDENTS EXCEPT DRUG POISONING NON-HISPANIC WHITE 6825
5: ALZHEIMERS DISEASE ASIAN & PACIFIC ISLANDER 285
---
我想创建一个新列,该列仅显示因特定死因而死亡的不同种族人口的百分比。像这样:
Cause of Death Ethnicity Count PercentofDeath
1: ACCIDENTS EXCEPT DRUG POISONING ASIAN & PACIFIC ISLANDER 1368 0.09230769
2: ACCIDENTS EXCEPT DRUG POISONING HISPANIC 3387 0.22854251
3: ACCIDENTS EXCEPT DRUG POISONING NON-HISPANIC BLACK 3240 0.21862348
4: ACCIDENTS EXCEPT DRUG POISONING NON-HISPANIC WHITE 6825 0.46052632
5: ALZHEIMERS DISEASE ASIAN & PACIFIC ISLANDER 285 0.04049446
---
这是我的代码,非常难看:
library(data.table)
#load library, change to data table
COD.dt <- as.data.table(COD)
#function for adding the percent column
lala <- function(x){
#see if I have initialized data.table I'm going to append to
if(exists("started")){
p <- COD.dt[x ==`Cause of Death`]
blah <- COD.dt[x ==`Cause of Death`]$Count/sum(COD.dt[x ==`Cause of Death`]$Count)
p$PercentofDeath <- blah
started <<- rbind(started,p)
}
#initialize data table
else{
l <- COD.dt[x ==`Cause of Death`]
blah <- COD.dt[x ==`Cause of Death`]$Count/sum(COD.dt[x ==`Cause of Death`]$Count)
l$PercentofDeath <- (blah)
started <<- l
}
#if finished return
if(x == unique(COD.dt$`Cause of Death`)[length(unique(COD.dt$`Cause of Death`))]){
return(started)
}
}
#run function
h <- sapply(unique(COD.dt$`Cause of Death`), lala)
#remove from environment
rm(started)
#h is actually ends up being a list, the last object happen to be the one I want so I take that one
finalTable <- h$`VIRAL HEPATITIS`
所以,如您所见。这段代码相当难看,而且不适应table。我希望从一些指导中了解如何让它变得更好。也许使用 dpylr 或其他一些功能?
最佳
纯数据-table 解决方案也很简单,但这里是 dplyr:
library(dplyr)
COD.dt %>% group_by(`Cause of Death`) %>%
mutate(PercentofDeath = Count / sum(Count))
您可以将其转换为一个函数,但这是一个非常小的基本操作,大多数人不会费心。
我刚找到更好的方法:
library(data.table)
#load library, change to data table
COD.dt <- as.data.table(COD)
#make column of disease total counts
COD.dt[,disease:=sum(Count), by = list(`Cause of Death`)]
#use that column to make percents
COD.dt[,percent:=Count/disease, by = list(`Cause of Death`)]
我有一个数据 table 看起来像:
Cause of Death Ethnicity Count
1: ACCIDENTS EXCEPT DRUG POISONING ASIAN & PACIFIC ISLANDER 1368
2: ACCIDENTS EXCEPT DRUG POISONING HISPANIC 3387
3: ACCIDENTS EXCEPT DRUG POISONING NON-HISPANIC BLACK 3240
4: ACCIDENTS EXCEPT DRUG POISONING NON-HISPANIC WHITE 6825
5: ALZHEIMERS DISEASE ASIAN & PACIFIC ISLANDER 285
---
我想创建一个新列,该列仅显示因特定死因而死亡的不同种族人口的百分比。像这样:
Cause of Death Ethnicity Count PercentofDeath
1: ACCIDENTS EXCEPT DRUG POISONING ASIAN & PACIFIC ISLANDER 1368 0.09230769
2: ACCIDENTS EXCEPT DRUG POISONING HISPANIC 3387 0.22854251
3: ACCIDENTS EXCEPT DRUG POISONING NON-HISPANIC BLACK 3240 0.21862348
4: ACCIDENTS EXCEPT DRUG POISONING NON-HISPANIC WHITE 6825 0.46052632
5: ALZHEIMERS DISEASE ASIAN & PACIFIC ISLANDER 285 0.04049446
---
这是我的代码,非常难看:
library(data.table)
#load library, change to data table
COD.dt <- as.data.table(COD)
#function for adding the percent column
lala <- function(x){
#see if I have initialized data.table I'm going to append to
if(exists("started")){
p <- COD.dt[x ==`Cause of Death`]
blah <- COD.dt[x ==`Cause of Death`]$Count/sum(COD.dt[x ==`Cause of Death`]$Count)
p$PercentofDeath <- blah
started <<- rbind(started,p)
}
#initialize data table
else{
l <- COD.dt[x ==`Cause of Death`]
blah <- COD.dt[x ==`Cause of Death`]$Count/sum(COD.dt[x ==`Cause of Death`]$Count)
l$PercentofDeath <- (blah)
started <<- l
}
#if finished return
if(x == unique(COD.dt$`Cause of Death`)[length(unique(COD.dt$`Cause of Death`))]){
return(started)
}
}
#run function
h <- sapply(unique(COD.dt$`Cause of Death`), lala)
#remove from environment
rm(started)
#h is actually ends up being a list, the last object happen to be the one I want so I take that one
finalTable <- h$`VIRAL HEPATITIS`
所以,如您所见。这段代码相当难看,而且不适应table。我希望从一些指导中了解如何让它变得更好。也许使用 dpylr 或其他一些功能?
最佳
纯数据-table 解决方案也很简单,但这里是 dplyr:
library(dplyr)
COD.dt %>% group_by(`Cause of Death`) %>%
mutate(PercentofDeath = Count / sum(Count))
您可以将其转换为一个函数,但这是一个非常小的基本操作,大多数人不会费心。
我刚找到更好的方法:
library(data.table)
#load library, change to data table
COD.dt <- as.data.table(COD)
#make column of disease total counts
COD.dt[,disease:=sum(Count), by = list(`Cause of Death`)]
#use that column to make percents
COD.dt[,percent:=Count/disease, by = list(`Cause of Death`)]