如何匹配模式不同的三列?
How to match three columns differing in the pattern?
我有与以下类似的数据(但更大):
example <- rbind(data.frame(species = "A", trait1 = "yes", trait2 = NA),
data.frame(species = "A", trait1 = NA, trait2 = "yes"),
data.frame(species = "B", trait1 = NA, trait2 = "no"),
data.frame(species = "B", trait1 = "yes", trait2 = NA),
data.frame(species = "B", trait1 = "no", trait2 = NA),
data.frame(species = "B", trait1 = "no", trait2 = NA),
data.frame(species = "C", trait1 = NA, trait2 = "no"),
data.frame(species = "C", trait1 = "no", trait2 = NA),
data.frame(species = "D", trait1 = "yes", trait2 = NA),
data.frame(species = "D", trait1 = NA, trait2 = "yes"),
data.frame(species = "E", trait1 = NA, trait2 = "no"),
data.frame(species = "E", trait1 = "no", trait2 = NA),
data.frame(species = "E", trait1 = "no", trait2 = NA))
这里,trait2是一个固定值(1个物种的1个值),但trait1在物种内是可变的。对于每个特征值,物种数据来自不同的行。数据管理后,我想保留trait1中存在的可变性,这似乎使过程有点复杂。
最后,我想将 R 中的这个数据框转换为以下数据框:
ex.res <- rbind(data.frame(species = "A", trait1 = "yes", trait2 = "yes"),
data.frame(species = "B", trait1 = "yes", trait2 = "no"),
data.frame(species = "B", trait1 = "no", trait2 = "no"),
data.frame(species = "B", trait1 = "no", trait2 = "no"),
data.frame(species = "C", trait1 = "no", trait2 = "no"),
data.frame(species = "D", trait1 = "yes", trait2 = "yes"),
data.frame(species = "E", trait1 = "no", trait2 = "no"),
data.frame(species = "E", trait1 = "no", trait2 = "no"))
我尝试了很多东西,包括R中的一些基本数据管理工具,还有duplicated
、unique
、match_df
函数,但是都没有成功完全。
也许这些功能的组合版本可以工作,但我做不到。有什么简单的方法吗?
一种使用基础 R 的方法,
first_part <- example[!is.na(example$trait1),]
second_part <- example[!is.na(example$trait2),]
merge(first_part[,-3], second_part[,-2], by="species")
species trait1 trait2
1 A yes yes
2 B yes no
3 B no no
4 B no no
5 C no no
6 D yes yes
7 E no no
8 E no no