R合并某些值高于其他值的行
R combine rows where certain value overrules other values
我有一个数据框如下。我想根据列人物中的重复项合并行。但是,对于指定的列(在本例中为啤酒、可乐、伏特加),是否有可能某个值(在本例中为 1)会否决其他值(在本例中为 0)。
当前数据帧:
person <- c("John", "John", "Alex", "Nicole", "Nicole")
Sex <- c("M","M","W", "W", "W")
Beer <- c(1,1,1,1,0)
Cola <- c(0,1,0,0,0)
Wodka <- c(0,1,0,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)
结果应该是:
person <- c("John", "Alex", "Nicole")
Sex <- c("M", "W", "W")
Beer <- c(1,1,1)
Cola <- c(1,0,0)
Wodka <- c(1,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)
谢谢。
使用dplyr,可以summarise()
每人排一排,然后拿
指定列的最大值:
library(tidyverse)
person <- c("John", "John", "Alex", "Nicole", "Nicole")
Sex <- c("M", "M", "W", "W", "W")
Beer <- c(1, 1, 1, 1, 0)
Cola <- c(0, 1, 0, 0, 0)
Wodka <- c(0, 1, 0, 0, 1)
df <- data.frame(person, Sex, Beer, Cola, Wodka)
df %>%
group_by(person, Sex) %>%
summarise(across(c(Beer, Cola, Wodka), max))
#> `summarise()` regrouping output by 'person' (override with `.groups` argument)
#> # A tibble: 3 x 5
#> # Groups: person [3]
#> person Sex Beer Cola Wodka
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Alex W 1 0 0
#> 2 John M 1 1 1
#> 3 Nicole W 1 0 1
建议使用 tidyverse 中的 dplyr
库。那么这应该可以实现您想要实现的目标:
df %>%
group_by(person) %>%
summarize(Beer = max(Beer), Cola = max(Cola), Wodka = max(Wodka), Sex = max(Sex))
person <- c("John", "Alex", "Nicole")
Sex <- c("M", "W", "W")
Beer <- c(1,1,1)
Cola <- c(1,0,0)
Wodka <- c(1,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)
一个简单的基础 R 解决方案可以是:
#Split according to persons
#Every element of the list personSplit is a dataframe containing all available
#informations regarding one person
personSplit <- split(df,df$person)
#Out of these informations, choose the one value overruling each other.
#In my case, overruling only applies to numeric values, where you can simply take the max.
#For non-numerics, I simply use the first value.
valuesToTake <- lapply(personSplit, function(personalInfoDf) {
vals <- lapply(personalInfoDf, function(column) {
if(is.numeric(column)) {
max(column, na.rm=T)
} else {
column[1]
}
})
data.frame(vals)
})
result <- do.call("rbind",valuesToTake)
rownames(result) <- NULL
result
我有一个数据框如下。我想根据列人物中的重复项合并行。但是,对于指定的列(在本例中为啤酒、可乐、伏特加),是否有可能某个值(在本例中为 1)会否决其他值(在本例中为 0)。
当前数据帧:
person <- c("John", "John", "Alex", "Nicole", "Nicole")
Sex <- c("M","M","W", "W", "W")
Beer <- c(1,1,1,1,0)
Cola <- c(0,1,0,0,0)
Wodka <- c(0,1,0,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)
结果应该是:
person <- c("John", "Alex", "Nicole")
Sex <- c("M", "W", "W")
Beer <- c(1,1,1)
Cola <- c(1,0,0)
Wodka <- c(1,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)
谢谢。
使用dplyr,可以summarise()
每人排一排,然后拿
指定列的最大值:
library(tidyverse)
person <- c("John", "John", "Alex", "Nicole", "Nicole")
Sex <- c("M", "M", "W", "W", "W")
Beer <- c(1, 1, 1, 1, 0)
Cola <- c(0, 1, 0, 0, 0)
Wodka <- c(0, 1, 0, 0, 1)
df <- data.frame(person, Sex, Beer, Cola, Wodka)
df %>%
group_by(person, Sex) %>%
summarise(across(c(Beer, Cola, Wodka), max))
#> `summarise()` regrouping output by 'person' (override with `.groups` argument)
#> # A tibble: 3 x 5
#> # Groups: person [3]
#> person Sex Beer Cola Wodka
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Alex W 1 0 0
#> 2 John M 1 1 1
#> 3 Nicole W 1 0 1
建议使用 tidyverse 中的 dplyr
库。那么这应该可以实现您想要实现的目标:
df %>%
group_by(person) %>%
summarize(Beer = max(Beer), Cola = max(Cola), Wodka = max(Wodka), Sex = max(Sex))
person <- c("John", "Alex", "Nicole")
Sex <- c("M", "W", "W")
Beer <- c(1,1,1)
Cola <- c(1,0,0)
Wodka <- c(1,0,1)
df <- data.frame(person,Sex,Beer,Cola,Wodka)
一个简单的基础 R 解决方案可以是:
#Split according to persons
#Every element of the list personSplit is a dataframe containing all available
#informations regarding one person
personSplit <- split(df,df$person)
#Out of these informations, choose the one value overruling each other.
#In my case, overruling only applies to numeric values, where you can simply take the max.
#For non-numerics, I simply use the first value.
valuesToTake <- lapply(personSplit, function(personalInfoDf) {
vals <- lapply(personalInfoDf, function(column) {
if(is.numeric(column)) {
max(column, na.rm=T)
} else {
column[1]
}
})
data.frame(vals)
})
result <- do.call("rbind",valuesToTake)
rownames(result) <- NULL
result