如果包含字符串,则按一列分组并获取 R 中另一列的最大值

Groupby one column if string contained and get maximum values of another column in R

给定如下数据框:

df <- structure(list(city = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 1L, 
1L), .Label = c("bj", "sh"), class = "factor"), type = structure(c(3L, 
1L, 3L, 1L, 4L, 2L, 4L, 2L), .Label = c("buy_area", "buy_price", 
"sale_area", "sale_price"), class = "factor"), value = c(1200L, 
800L, 1900L, 1500L, 15L, 10L, 17L, 9L)), class = "data.frame", row.names = c(NA, 
-8L))

输出:

如何从 value 列中获取两种类型 type 的最大值:分别包含 areaprice

预期结果将是两个值:面积1900价格17

要按 type 分组并获得最大值 value,我们可以使用:

ddply(df, .(variable), summarise, max.value = max(value))

更新: @det 解决方案的输出:

创建将 type 分类为面积或价格的列,并按该列分组:

df %>%
  mutate(
    type2 = case_when(
      str_detect(type, "_area$") ~ "area",
      str_detect(type, "_price$") ~ "price",
      TRUE ~ NA_character_
    )
  ) %>%
  group_by(type2) %>%
  summarise(max_value = max(value))

输出:

  type2 max_value
  <chr>     <int>
1 area       1900
2 price        17

更新:这个更简洁(这是对Ronak Shah的回答的一个小修改:

df %>% 
    separate(type, c("sale_buy", "area_price")) %>% 
    group_by(area_price) %>% 
    summarise(max = max(value))

输出:

  area_price   max
  <chr>      <int>
1 area        1900
2 price         17

第一个回答: 一种方式可能是:

library(dplyr)
df %>% 
    group_by(type) %>% 
    summarise(max = max(value)) %>% 
    filter(grepl("sale", type))

输出:

  type         max
  <fct>      <int>
1 sale_area   1900
2 sale_price    17

type 列分成两列并按组查找最大值。

library(dplyr)
library(tidyr)

df %>%
  separate(type, c('type', 'col'), sep = '_') %>%
  group_by(col) %>%
  summarise(value = max(value, na.rm = TRUE))

#  col   value
#  <chr> <int>
#1 area   1900
#2 price    17

您还可以从 type 中提取 'area''price' 并将其用作分组列。

df %>%
  group_by(type = stringr::str_extract(type, 'area|price')) %>%
  summarise(value = max(value, na.rm = TRUE))

试试这个:

df %>% separate(type,c("type","area")) %>% group_by(area) %>% filter(value == max(value,na.rm = TRUE))