如何根据其他三列获得一列中的最大值？

Question

目的是在不考虑 children 的教育水平的情况下，在家庭中的两个伙伴之间获得最高的教育价值。第一列hhid是户号id，第二列是个人id。第三列relation是一个家庭成员之间的关系：1是户主，2是伴侣，3是children。第四列是指这些人的受教育程度。

第五列是我想使用代码获得的列。目的是仅关注家庭中的最高教育水平，但仅限于 parents 之间。我通常使用 pmax 来获取两列之间的最大值，并使用 group_by 来聚合一个组（如家庭）下的个人，但这两个命令在这种情况下似乎不起作用。有人可以帮忙吗？

 hhid id     relation    education    highest_education
    
  1     1        1         3                 3
  1     2        2         2                 3
  1     3        3         5                 3
  2     4        1         4                 4
  2     5        2         2                 4
  3     6        1         1                 2
  3     7        2         2                 2
  4     8        1         1                 3
  4     9        2         3                 3
  4    10        3         4                 3

这是数据：

structure(list(hhid = c(1, 1, 1, 2, 2, 3, 3, 4, 4, 4), id = c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10), relation = c(1, 2, 3, 1, 2, 1, 2, 
1, 2, 3), education = c(3, 2, 5, 4, 2, 1, 2, 1, 3, 4)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Answer 1

你可以这样做：

library(dplyr)

df %>% 
  group_by(hhid) %>% 
  mutate(highest_education = max(education[relation %in% c(1, 2)])) %>% 
  ungroup()
#> # A tibble: 10 × 5
#>     hhid    id relation education highest_education
#>    <dbl> <dbl>    <dbl>     <dbl>             <dbl>
#>  1     1     1        1         3                 3
#>  2     1     2        2         2                 3
#>  3     1     3        3         5                 3
#>  4     2     4        1         4                 4
#>  5     2     5        2         2                 4
#>  6     3     6        1         1                 2
#>  7     3     7        2         2                 2
#>  8     4     8        1         1                 3
#>  9     4     9        2         3                 3
#> 10     4    10        3         4                 3

如何根据其他三列获得一列中的最大值？

How to get the highest value in a column depending on three other columns?

r

dplyr

tidyr