汇总一列,从而删除其他列中不需要的 NA

Summarise a column and thereby remove unwanted NAs in others

我又一次有点卡住了,想寻求帮助。我希望有一天能够回馈这种帮助...

无论如何,我有一个看起来像这样的小标题:

# A tibble: 20 x 6
# Groups:   tipologia [6]
   tipologia                                   date_info pct_day pct_month pct_year pct_no_date
   <chr>                                       <chr>       <dbl>     <dbl>    <dbl>       <dbl>
 1 Aree soggette a crolli/ribaltamenti diffusi day        0.0508  NA         NA         NA     
 2 Aree soggette a crolli/ribaltamenti diffusi month     NA        0.0217    NA         NA     
 3 Aree soggette a crolli/ribaltamenti diffusi no date   NA       NA         NA          0.227 
 4 Aree soggette a crolli/ribaltamenti diffusi year      NA       NA          0.701     NA     
 5 Aree soggette a frane superficiali diffuse  day        0.0721  NA         NA         NA     
 6 Aree soggette a frane superficiali diffuse  month     NA        0.0218    NA         NA     
 7 Aree soggette a frane superficiali diffuse  no date   NA       NA         NA          0.570 
 8 Aree soggette a frane superficiali diffuse  year      NA       NA          0.336     NA     
 9 Aree soggette a sprofondamenti diffusi      day        0.143   NA         NA         NA     
10 Aree soggette a sprofondamenti diffusi      no date   NA       NA         NA          0.286 
11 Aree soggette a sprofondamenti diffusi      year      NA       NA          0.571     NA     
12 Colamento lento                             day        0.119   NA         NA         NA     
13 Colamento lento                             month     NA        0.0475    NA         NA     
14 Colamento lento                             no date   NA       NA         NA          0.122 
15 Colamento lento                             year      NA       NA          0.712     NA     
16 Colamento rapido                            day        0.478   NA         NA         NA     
17 Colamento rapido                            month     NA        0.00838   NA         NA     
18 Colamento rapido                            no date   NA       NA         NA          0.0642
19 Colamento rapido                            year      NA       NA          0.450     NA     
20 Complesso                                   day        0.262   NA         NA         NA     

“tipologia”中有四个条目,因为有四种可能 date-informations(日、年、月或根本没有信息)。我想要的是每个 tipologia 只有一行,并且基本上删除了这个不必要的 NA。 NA 不能有任何值,所以它们有点烦人。

我尝试了很多重新分组和总结,但没有达到我想做的。所以任何想法都会非常有帮助:)

您可以使用 na.omit 删除 NA 值。

library(dplyr)
df %>%
  group_by(tipologia) %>%
  summarise(across(starts_with('pct'), na.omit))

na.omit 应该适用于上述数据,但更安全的选择是:

df %>%
  group_by(tipologia) %>%
  summarise(across(starts_with('pct'), ~.x[!is.na(x)][1]))

您可以使用 aggregate 并使用 lapply 遍历列,然后 merge

Reduce(function(...) merge(..., all=T), lapply(names(dat)[3:6], function(x) 
  aggregate(as.formula(paste(x, "~ tipologia")), dat, I)))
#                                     tipologia pct_day pct_month pct_year pct_no_date
# 1 Aree soggette a crolli/ribaltamenti diffusi  0.0508   0.02170    0.701      0.2270
# 2  Aree soggette a frane superficiali diffuse  0.0721   0.02180    0.336      0.5700
# 3      Aree soggette a sprofondamenti diffusi  0.1430        NA    0.571      0.2860
# 4                             Colamento lento  0.1190   0.04750    0.712      0.1220
# 5                            Colamento rapido  0.4780   0.00838    0.450      0.0642
# 6                                   Complesso  0.2620        NA       NA          NA

数据:

dat <- structure(list(tipologia = c("Aree soggette a crolli/ribaltamenti diffusi", 
"Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a crolli/ribaltamenti diffusi", 
"Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a frane superficiali diffuse", 
"Aree soggette a frane superficiali diffuse", "Aree soggette a frane superficiali diffuse", 
"Aree soggette a frane superficiali diffuse", "Aree soggette a sprofondamenti diffusi", 
"Aree soggette a sprofondamenti diffusi", "Aree soggette a sprofondamenti diffusi", 
"Colamento lento", "Colamento lento", "Colamento lento", "Colamento lento", 
"Colamento rapido", "Colamento rapido", "Colamento rapido", "Colamento rapido", 
"Complesso"), date_info = c("day", "month", "no date", "year", 
"day", "month", "no date", "year", "day", "no date", "year", 
"day", "month", "no date", "year", "day", "month", "no date", 
"year", "day"), pct_day = c(0.0508, NA, NA, NA, 0.0721, NA, NA, 
NA, 0.143, NA, NA, 0.119, NA, NA, NA, 0.478, NA, NA, NA, 0.262
), pct_month = c(NA, 0.0217, NA, NA, NA, 0.0218, NA, NA, NA, 
NA, NA, NA, 0.0475, NA, NA, NA, 0.00838, NA, NA, NA), pct_year = c(NA, 
NA, NA, 0.701, NA, NA, NA, 0.336, NA, NA, 0.571, NA, NA, NA, 
0.712, NA, NA, NA, 0.45, NA), pct_no_date = c(NA, NA, 0.227, 
NA, NA, NA, 0.57, NA, NA, 0.286, NA, NA, NA, 0.122, NA, NA, NA, 
0.0642, NA, NA)), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
"15", "16", "17", "18", "19", "20"))