如何根据一组偏好条件提取唯一元素
How to extract an unique element according a set of prefered conditions
获取数据帧 df,我想提取 unique value根据每个 Field 的以下首选条件:
1-如果C1存在,提取相应的值并忽略其他
2-如果C2存在,提取相应的值并忽略其他
...等等直到 C5
数据:
df <- data.frame (Field=rep(c("F1","F2","F3","F4","F5"),each=3),
Cond=rep(c("C1","C2","C3","C4","C5"),3),
Value=c(1:15))
所需的输出:
output <- data.frame (F= c("F1","F2","F3","F4","F5"),
C= c("C1","C1","C2","C1","C3"),
Value= c(1,6,7,11,13))
(注 1:值 仅作为示例设置,真实数据值 未排序)
(注 2: 真实条件 列根本没有按字母顺序排列。我的想法是,如果 A 存在而不是选择 "A value",否则通过到下一个条件 "if B exists ..." 等等)
如果你能在处理前对 data.frame 进行排序,这就相当容易了。请注意,这适用于这种特殊情况。如果您的 Cond
值发生变化,字母排序可能会超出 window。
library(dplyr)
df <- data.frame (Field=rep(c("F1","F2","F3","F4","F5"),each=3),
Cond=rep(c("C1","C2","C3","C4","C5"),3),
Value=c(1:15))
df <- df[with(df, order(Field, Cond)), ]
res <- df %>%
group_by(Field) %>%
filter(row_number() == 1)
Source: local data frame [5 x 3]
Groups: Field [5]
Field Cond Value
<fctr> <fctr> <int>
1 F1 C1 1
2 F2 C1 6
3 F3 C2 7
4 F4 C1 11
5 F5 C3 13
这是执行此操作的另一种更通用的方法。排序顺序在 so
中定义(参见 this question)。请注意我是如何破坏 Cond
的值以表明它没有按字母顺序排序。
df <- data.frame (Field=rep(c("F1","F2","F3","F4","F5"),each=3),
Cond=rep(c("rg1","kl2","xy3","rq4","ab5"),3),
Value=c(1:15))
so <- c("rg1","kl2","xy3","rq4","ab5")
df %>%
group_by(Field) %>%
slice(match(so, Cond)) %>%
filter(row_number() == 1)
Field Cond Value
<fctr> <fctr> <int>
1 F1 rg1 1
2 F2 rg1 6
3 F3 kl2 7
4 F4 rg1 11
5 F5 xy3 13
另一种选择是使用 data.table
library(data.table)
setDT(df)[order(Field, Cond), head(.SD, 1), by = Field]
# Field Cond Value
#1: F1 C1 1
#2: F2 C1 6
#3: F3 C2 7
#4: F4 C1 11
#5: F5 C3 13
获取数据帧 df,我想提取 unique value根据每个 Field 的以下首选条件:
1-如果C1存在,提取相应的值并忽略其他
2-如果C2存在,提取相应的值并忽略其他
...等等直到 C5
数据:
df <- data.frame (Field=rep(c("F1","F2","F3","F4","F5"),each=3),
Cond=rep(c("C1","C2","C3","C4","C5"),3),
Value=c(1:15))
所需的输出:
output <- data.frame (F= c("F1","F2","F3","F4","F5"),
C= c("C1","C1","C2","C1","C3"),
Value= c(1,6,7,11,13))
(注 1:值 仅作为示例设置,真实数据值 未排序)
(注 2: 真实条件 列根本没有按字母顺序排列。我的想法是,如果 A 存在而不是选择 "A value",否则通过到下一个条件 "if B exists ..." 等等)
如果你能在处理前对 data.frame 进行排序,这就相当容易了。请注意,这适用于这种特殊情况。如果您的 Cond
值发生变化,字母排序可能会超出 window。
library(dplyr)
df <- data.frame (Field=rep(c("F1","F2","F3","F4","F5"),each=3),
Cond=rep(c("C1","C2","C3","C4","C5"),3),
Value=c(1:15))
df <- df[with(df, order(Field, Cond)), ]
res <- df %>%
group_by(Field) %>%
filter(row_number() == 1)
Source: local data frame [5 x 3]
Groups: Field [5]
Field Cond Value
<fctr> <fctr> <int>
1 F1 C1 1
2 F2 C1 6
3 F3 C2 7
4 F4 C1 11
5 F5 C3 13
这是执行此操作的另一种更通用的方法。排序顺序在 so
中定义(参见 this question)。请注意我是如何破坏 Cond
的值以表明它没有按字母顺序排序。
df <- data.frame (Field=rep(c("F1","F2","F3","F4","F5"),each=3),
Cond=rep(c("rg1","kl2","xy3","rq4","ab5"),3),
Value=c(1:15))
so <- c("rg1","kl2","xy3","rq4","ab5")
df %>%
group_by(Field) %>%
slice(match(so, Cond)) %>%
filter(row_number() == 1)
Field Cond Value
<fctr> <fctr> <int>
1 F1 rg1 1
2 F2 rg1 6
3 F3 kl2 7
4 F4 rg1 11
5 F5 xy3 13
另一种选择是使用 data.table
library(data.table)
setDT(df)[order(Field, Cond), head(.SD, 1), by = Field]
# Field Cond Value
#1: F1 C1 1
#2: F2 C1 6
#3: F3 C2 7
#4: F4 C1 11
#5: F5 C3 13