如何根据一组偏好条件提取唯一元素

How to extract an unique element according a set of prefered conditions

获取数据帧 df,我想提取 unique value根据每个 Field 的以下首选条件:

1-如果C1存在,提取相应的值并忽略其他

2-如果C2存在,提取相应的值并忽略其他

...等等直到 C5

数据:

df <- data.frame (Field=rep(c("F1","F2","F3","F4","F5"),each=3),
              Cond=rep(c("C1","C2","C3","C4","C5"),3),
              Value=c(1:15))

所需的输出

output <-  data.frame (F= c("F1","F2","F3","F4","F5"),
                   C= c("C1","C1","C2","C1","C3"),
                   Value= c(1,6,7,11,13))

(注 1: 仅作为示例设置,真实数据值 未排序)

(注 2: 真实条件 列根本没有按字母顺序排列。我的想法是,如果 A 存在而不是选择 "A value",否则通过到下一个条件 "if B exists ..." 等等)

如果你能在处理前对 data.frame 进行排序,这就相当容易了。请注意,这适用于这种特殊情况。如果您的 Cond 值发生变化,字母排序可能会超出 window。

library(dplyr)
df <- data.frame (Field=rep(c("F1","F2","F3","F4","F5"),each=3),
                  Cond=rep(c("C1","C2","C3","C4","C5"),3),
                  Value=c(1:15))

df <- df[with(df, order(Field, Cond)), ]
res <- df %>%
  group_by(Field) %>%
  filter(row_number() == 1)

Source: local data frame [5 x 3]
Groups: Field [5]

   Field   Cond Value
  <fctr> <fctr> <int>
1     F1     C1     1
2     F2     C1     6
3     F3     C2     7
4     F4     C1    11
5     F5     C3    13

这是执行此操作的另一种更通用的方法。排序顺序在 so 中定义(参见 this question)。请注意我是如何破坏 Cond 的值以表明它没有按字母顺序排序。

df <- data.frame (Field=rep(c("F1","F2","F3","F4","F5"),each=3),
                  Cond=rep(c("rg1","kl2","xy3","rq4","ab5"),3),
                  Value=c(1:15))

so <- c("rg1","kl2","xy3","rq4","ab5")

df %>%
  group_by(Field) %>%
  slice(match(so, Cond)) %>%
  filter(row_number() == 1)

   Field   Cond Value
  <fctr> <fctr> <int>
1     F1    rg1     1
2     F2    rg1     6
3     F3    kl2     7
4     F4    rg1    11
5     F5    xy3    13

另一种选择是使用 data.table

library(data.table)
setDT(df)[order(Field, Cond), head(.SD, 1), by = Field]
#    Field Cond Value
#1:    F1   C1     1
#2:    F2   C1     6
#3:    F3   C2     7
#4:    F4   C1    11
#5:    F5   C3    13