如果其他变量在 R 中重复,则将变量设置为 NA
Set a variable to NA if other variables are duplicates in R
根据给药途径,我有以下包含药物代码的数据框:
code <- data.frame(inn = c("ibuprofen", "ibuprofen", "ibuprofen", "fusidic acid", "fusidic acid"),
route = c("unknown", "unknown", "unknown", "oral", "topical"),
atc = c("R02AX02", "G02CC01", "M01AE01", "J01XC01", "D06AX01"))
inn route atc
1 ibuprofen unknown R02AX02
2 ibuprofen unknown G02CC01
3 ibuprofen unknown M01AE01
4 fusidic acid oral J01XC01
5 fusidic acid topical D06AX01
另一个包含患者治疗和事件:
event <- data.frame(id = c(1, 1, 2),
inn = c("ibuprofen", "fusidic acid", "fusidic acid"),
route = c("unknown", "oral", "topical"),
event = c(TRUE, FALSE, TRUE))
id inn route event
1 1 ibuprofen unknown TRUE
2 1 fusidic acid oral FALSE
3 2 fusidic acid topical TRUE
我需要合并这些数据框以获得以下结果:
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE NA
我没有用简单的 merge
:
得到这个结果
merge(x = event,
y = code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE R02AX02
4 ibuprofen unknown 1 TRUE G02CC01
5 ibuprofen unknown 1 TRUE M01AE01
我想到了两个解决方案,但都没有实现:
- 修改
merge
之前的code
数据框,将atc
设置为NA
,如果atc
一组[=20] =] 和 route
(这似乎更合适)
- 修改
merge
的结果,将atc
设置为NA
,如果inn
组有不同的atc
,route
] 和 id
我如何在 base R 中做到这一点?还有其他更好的方法吗?我在一个限制性环境中工作,我只能访问基础 R。
案例2的代码:
code$inn_route <- paste0(code$inn,'_',code$route)
code$count <- table(code$inn_route)[code$inn_route]
code[code$count>1,3]<-NA
code$inn_route <- NULL
code$count <- NULL
code <- unique(code)
merge(event,code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>
引导我找到以下解决方案:
code$atc <- as.character(x = code$atc)
code$atc <- ifelse(test = ave(x = code$atc,
code$inn,
code$route,
FUN = length) > 1,
yes = NA,
no = code$atc)
code <- unique(x = code)
merge(x = event,
y = code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>
但是,由于 ave
在我的真实数据上相当慢,我想知道是否有更快的基本 R 方法。
这是完成选项 2 的直接方法。从简单合并的结果开始:
mrg <- merge(x = event,
y = code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE R02AX02
4 ibuprofen unknown 1 TRUE G02CC01
5 ibuprofen unknown 1 TRUE M01AE01
然后我们检查哪些行是重复的(删除 atc
变量)。我们需要使用 duplicated 两次,因为它实际上找到了 duplicate 行,而不是有重复的行。因此,它会捕获第 4 行和第 5 行,但不会捕获第 3 行——为此,我们需要从相反方向重复 duplicated
。在此处阅读更多信息:Finding ALL duplicate rows, including “elements with smaller subscripts”:
mrg$atc <- ifelse(duplicated(mrg[,-5]) | duplicated(mrg[,-5], fromLast = T),
NA,
mrg$atc)
mrg
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>
4 ibuprofen unknown 1 TRUE <NA>
5 ibuprofen unknown 1 TRUE <NA>
如果您想删除重复的第 4 行和第 5 行,只需 运行 duplicated
再删除一次即可:
mrg[!duplicated(mrg),]
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>
根据给药途径,我有以下包含药物代码的数据框:
code <- data.frame(inn = c("ibuprofen", "ibuprofen", "ibuprofen", "fusidic acid", "fusidic acid"),
route = c("unknown", "unknown", "unknown", "oral", "topical"),
atc = c("R02AX02", "G02CC01", "M01AE01", "J01XC01", "D06AX01"))
inn route atc
1 ibuprofen unknown R02AX02
2 ibuprofen unknown G02CC01
3 ibuprofen unknown M01AE01
4 fusidic acid oral J01XC01
5 fusidic acid topical D06AX01
另一个包含患者治疗和事件:
event <- data.frame(id = c(1, 1, 2),
inn = c("ibuprofen", "fusidic acid", "fusidic acid"),
route = c("unknown", "oral", "topical"),
event = c(TRUE, FALSE, TRUE))
id inn route event
1 1 ibuprofen unknown TRUE
2 1 fusidic acid oral FALSE
3 2 fusidic acid topical TRUE
我需要合并这些数据框以获得以下结果:
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE NA
我没有用简单的 merge
:
merge(x = event,
y = code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE R02AX02
4 ibuprofen unknown 1 TRUE G02CC01
5 ibuprofen unknown 1 TRUE M01AE01
我想到了两个解决方案,但都没有实现:
- 修改
merge
之前的code
数据框,将atc
设置为NA
,如果atc
一组[=20] =] 和route
(这似乎更合适) - 修改
merge
的结果,将atc
设置为NA
,如果inn
组有不同的atc
,route
] 和id
我如何在 base R 中做到这一点?还有其他更好的方法吗?我在一个限制性环境中工作,我只能访问基础 R。
案例2的代码:
code$inn_route <- paste0(code$inn,'_',code$route)
code$count <- table(code$inn_route)[code$inn_route]
code[code$count>1,3]<-NA
code$inn_route <- NULL
code$count <- NULL
code <- unique(code)
merge(event,code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>
code$atc <- as.character(x = code$atc)
code$atc <- ifelse(test = ave(x = code$atc,
code$inn,
code$route,
FUN = length) > 1,
yes = NA,
no = code$atc)
code <- unique(x = code)
merge(x = event,
y = code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>
但是,由于 ave
在我的真实数据上相当慢,我想知道是否有更快的基本 R 方法。
这是完成选项 2 的直接方法。从简单合并的结果开始:
mrg <- merge(x = event,
y = code)
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE R02AX02
4 ibuprofen unknown 1 TRUE G02CC01
5 ibuprofen unknown 1 TRUE M01AE01
然后我们检查哪些行是重复的(删除 atc
变量)。我们需要使用 duplicated 两次,因为它实际上找到了 duplicate 行,而不是有重复的行。因此,它会捕获第 4 行和第 5 行,但不会捕获第 3 行——为此,我们需要从相反方向重复 duplicated
。在此处阅读更多信息:Finding ALL duplicate rows, including “elements with smaller subscripts”:
mrg$atc <- ifelse(duplicated(mrg[,-5]) | duplicated(mrg[,-5], fromLast = T),
NA,
mrg$atc)
mrg
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>
4 ibuprofen unknown 1 TRUE <NA>
5 ibuprofen unknown 1 TRUE <NA>
如果您想删除重复的第 4 行和第 5 行,只需 运行 duplicated
再删除一次即可:
mrg[!duplicated(mrg),]
inn route id event atc
1 fusidic acid oral 1 FALSE J01XC01
2 fusidic acid topical 2 TRUE D06AX01
3 ibuprofen unknown 1 TRUE <NA>