如何用另一个数据框中的值替换变量的 NA
How to replace NAs of a variable with values from another dataframe
我希望这个人不是傻子。
我有两个带有变量 ID 和 gender/sex 的数据框。在 df1 中,有 NA。在 df2 中,变量是完整的。我想用 df2 中的值来完成 df1 中的列。
(在 df1 中,变量称为 "gender"。在 df2 中,它称为 "sex"。)
这是我到目前为止尝试过的方法:
#example-data
ID<-seq(1,30,by=1)
df1<-as.data.frame(ID)
df2<-df1
df1$gender<-c(NA,"2","1",NA,"2","2","2","2","2","2",NA,"2","1","1",NA,"2","2","2","2","2","1","2","2",NA,"2","2","2","2","2",NA)
df2$sex<-c("2","2","1","2","2","2","2","2","2","2","2","2","1","1","2","2","2","2","2","2","1","2","2","2","2","2","2","2","2","2")
#Approach 1:
NAs.a <- is.na(df1$gender)
df1$gender[NAs.a] <- df2[match(df1$ID[NAs.a], df2$ID),]$sex
#Approach 2 (i like dplyr a lot, perhaps there´s a way to use it):
library("dplyr")
temp<-df2 %>% select(ID,gender)
#EDIT:
#df<-left_join(df1$gender,df2$gender, by="ID")
df<-left_join(df1,df2, by="ID")
非常感谢。
这可能是最简单的基础 R。
idx <- is.na(df1$gender)
df1$gender[idx] = df2$sex[idx]
你可以
df1 %>% select(ID) %>% left_join(df2, by = "ID")
# ID sex
#1 1 2
#2 2 2
#3 3 1
#4 4 2
#5 5 2
#6 6 2
#.. ..
这假设 - 如示例中 - 来自 df1 的所有 ID 也存在于 df2 中,并且那里有 sex/gender 信息。
如果您的数据中有其他列,您也可以尝试这样做:
df1 %>% select(-gender) %>% left_join(df2[c("ID", "sex")], by = "ID")
这是一个使用 data.table
s 二进制连接的快速解决方案,这将连接 仅 gender
与 sex
并保留所有其余部分未触及的列
library(data.table)
setkey(setDT(df1), ID)
df1[df2, gender := i.sex][]
# ID gender
# 1: 1 2
# 2: 2 2
# 3: 3 1
# 4: 4 2
# 5: 5 2
# 6: 6 2
# 7: 7 2
# 8: 8 2
# 9: 9 2
# 10: 10 2
# 11: 11 2
# 12: 12 2
# 13: 13 1
# 14: 14 1
# 15: 15 2
# 16: 16 2
# 17: 17 2
# 18: 18 2
# 19: 19 2
# 20: 20 2
# 21: 21 1
# 22: 22 2
# 23: 23 2
# 24: 24 2
# 25: 25 2
# 26: 26 2
# 27: 27 2
# 28: 28 2
# 29: 29 2
# 30: 30 2
我希望这个人不是傻子。
我有两个带有变量 ID 和 gender/sex 的数据框。在 df1 中,有 NA。在 df2 中,变量是完整的。我想用 df2 中的值来完成 df1 中的列。 (在 df1 中,变量称为 "gender"。在 df2 中,它称为 "sex"。)
这是我到目前为止尝试过的方法:
#example-data
ID<-seq(1,30,by=1)
df1<-as.data.frame(ID)
df2<-df1
df1$gender<-c(NA,"2","1",NA,"2","2","2","2","2","2",NA,"2","1","1",NA,"2","2","2","2","2","1","2","2",NA,"2","2","2","2","2",NA)
df2$sex<-c("2","2","1","2","2","2","2","2","2","2","2","2","1","1","2","2","2","2","2","2","1","2","2","2","2","2","2","2","2","2")
#Approach 1:
NAs.a <- is.na(df1$gender)
df1$gender[NAs.a] <- df2[match(df1$ID[NAs.a], df2$ID),]$sex
#Approach 2 (i like dplyr a lot, perhaps there´s a way to use it):
library("dplyr")
temp<-df2 %>% select(ID,gender)
#EDIT:
#df<-left_join(df1$gender,df2$gender, by="ID")
df<-left_join(df1,df2, by="ID")
非常感谢。
这可能是最简单的基础 R。
idx <- is.na(df1$gender)
df1$gender[idx] = df2$sex[idx]
你可以
df1 %>% select(ID) %>% left_join(df2, by = "ID")
# ID sex
#1 1 2
#2 2 2
#3 3 1
#4 4 2
#5 5 2
#6 6 2
#.. ..
这假设 - 如示例中 - 来自 df1 的所有 ID 也存在于 df2 中,并且那里有 sex/gender 信息。
如果您的数据中有其他列,您也可以尝试这样做:
df1 %>% select(-gender) %>% left_join(df2[c("ID", "sex")], by = "ID")
这是一个使用 data.table
s 二进制连接的快速解决方案,这将连接 仅 gender
与 sex
并保留所有其余部分未触及的列
library(data.table)
setkey(setDT(df1), ID)
df1[df2, gender := i.sex][]
# ID gender
# 1: 1 2
# 2: 2 2
# 3: 3 1
# 4: 4 2
# 5: 5 2
# 6: 6 2
# 7: 7 2
# 8: 8 2
# 9: 9 2
# 10: 10 2
# 11: 11 2
# 12: 12 2
# 13: 13 1
# 14: 14 1
# 15: 15 2
# 16: 16 2
# 17: 17 2
# 18: 18 2
# 19: 19 2
# 20: 20 2
# 21: 21 1
# 22: 22 2
# 23: 23 2
# 24: 24 2
# 25: 25 2
# 26: 26 2
# 27: 27 2
# 28: 28 2
# 29: 29 2
# 30: 30 2