R合并某些值高于其他值的行

R combine rows where certain value overrules other values

我有一个数据框如下。我想根据列人物中的重复项合并行。但是,对于指定的列(在本例中为啤酒、可乐、伏特加),是否有可能某个值(在本例中为 1)会否决其他值(在本例中为 0)。

当前数据帧:

person <- c("John", "John", "Alex", "Nicole", "Nicole")
Sex <- c("M","M","W", "W", "W")
Beer <- c(1,1,1,1,0)
Cola <- c(0,1,0,0,0)
Wodka <- c(0,1,0,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)

结果应该是:

person <- c("John", "Alex", "Nicole")
Sex <- c("M", "W", "W")
Beer <- c(1,1,1)
Cola <- c(1,0,0)
Wodka <- c(1,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)

谢谢。

使用dplyr,可以summarise()每人排一排,然后拿 指定列的最大值:

library(tidyverse)

person <- c("John", "John", "Alex", "Nicole", "Nicole")
Sex <- c("M", "M", "W", "W", "W")
Beer <- c(1, 1, 1, 1, 0)
Cola <- c(0, 1, 0, 0, 0)
Wodka <- c(0, 1, 0, 0, 1)

df <- data.frame(person, Sex, Beer, Cola, Wodka)

df %>% 
  group_by(person, Sex) %>% 
  summarise(across(c(Beer, Cola, Wodka), max))
#> `summarise()` regrouping output by 'person' (override with `.groups` argument)
#> # A tibble: 3 x 5
#> # Groups:   person [3]
#>   person Sex    Beer  Cola Wodka
#>   <chr>  <chr> <dbl> <dbl> <dbl>
#> 1 Alex   W         1     0     0
#> 2 John   M         1     1     1
#> 3 Nicole W         1     0     1

建议使用 tidyverse 中的 dplyr 库。那么这应该可以实现您想要实现的目标:

df %>%
    group_by(person) %>% 
    summarize(Beer = max(Beer), Cola = max(Cola), Wodka = max(Wodka), Sex = max(Sex))

person <- c("John", "Alex", "Nicole")
Sex <- c("M", "W", "W")
Beer <- c(1,1,1)
Cola <- c(1,0,0)
Wodka <- c(1,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)

一个简单的基础 R 解决方案可以是:

#Split according to persons
#Every element of the list personSplit is a dataframe containing all available
#informations regarding one person
personSplit <- split(df,df$person)

#Out of these informations, choose the one value overruling each other.
#In my case, overruling only applies to numeric values, where you can simply take the max.
#For non-numerics, I simply use the first value.
valuesToTake <- lapply(personSplit, function(personalInfoDf) {
  vals <- lapply(personalInfoDf, function(column) {
    if(is.numeric(column)) {
      max(column, na.rm=T)
    } else {
      column[1]
    }
  })
  data.frame(vals)
})

result <- do.call("rbind",valuesToTake)
rownames(result) <- NULL
result