汇总一列,从而删除其他列中不需要的 NA
Summarise a column and thereby remove unwanted NAs in others
我又一次有点卡住了,想寻求帮助。我希望有一天能够回馈这种帮助...
无论如何,我有一个看起来像这样的小标题:
# A tibble: 20 x 6
# Groups: tipologia [6]
tipologia date_info pct_day pct_month pct_year pct_no_date
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aree soggette a crolli/ribaltamenti diffusi day 0.0508 NA NA NA
2 Aree soggette a crolli/ribaltamenti diffusi month NA 0.0217 NA NA
3 Aree soggette a crolli/ribaltamenti diffusi no date NA NA NA 0.227
4 Aree soggette a crolli/ribaltamenti diffusi year NA NA 0.701 NA
5 Aree soggette a frane superficiali diffuse day 0.0721 NA NA NA
6 Aree soggette a frane superficiali diffuse month NA 0.0218 NA NA
7 Aree soggette a frane superficiali diffuse no date NA NA NA 0.570
8 Aree soggette a frane superficiali diffuse year NA NA 0.336 NA
9 Aree soggette a sprofondamenti diffusi day 0.143 NA NA NA
10 Aree soggette a sprofondamenti diffusi no date NA NA NA 0.286
11 Aree soggette a sprofondamenti diffusi year NA NA 0.571 NA
12 Colamento lento day 0.119 NA NA NA
13 Colamento lento month NA 0.0475 NA NA
14 Colamento lento no date NA NA NA 0.122
15 Colamento lento year NA NA 0.712 NA
16 Colamento rapido day 0.478 NA NA NA
17 Colamento rapido month NA 0.00838 NA NA
18 Colamento rapido no date NA NA NA 0.0642
19 Colamento rapido year NA NA 0.450 NA
20 Complesso day 0.262 NA NA NA
“tipologia”中有四个条目,因为有四种可能 date-informations(日、年、月或根本没有信息)。我想要的是每个 tipologia 只有一行,并且基本上删除了这个不必要的 NA。 NA 不能有任何值,所以它们有点烦人。
我尝试了很多重新分组和总结,但没有达到我想做的。所以任何想法都会非常有帮助:)
您可以使用 na.omit
删除 NA
值。
library(dplyr)
df %>%
group_by(tipologia) %>%
summarise(across(starts_with('pct'), na.omit))
na.omit
应该适用于上述数据,但更安全的选择是:
df %>%
group_by(tipologia) %>%
summarise(across(starts_with('pct'), ~.x[!is.na(x)][1]))
您可以使用 aggregate
并使用 lapply
遍历列,然后 merge
。
Reduce(function(...) merge(..., all=T), lapply(names(dat)[3:6], function(x)
aggregate(as.formula(paste(x, "~ tipologia")), dat, I)))
# tipologia pct_day pct_month pct_year pct_no_date
# 1 Aree soggette a crolli/ribaltamenti diffusi 0.0508 0.02170 0.701 0.2270
# 2 Aree soggette a frane superficiali diffuse 0.0721 0.02180 0.336 0.5700
# 3 Aree soggette a sprofondamenti diffusi 0.1430 NA 0.571 0.2860
# 4 Colamento lento 0.1190 0.04750 0.712 0.1220
# 5 Colamento rapido 0.4780 0.00838 0.450 0.0642
# 6 Complesso 0.2620 NA NA NA
数据:
dat <- structure(list(tipologia = c("Aree soggette a crolli/ribaltamenti diffusi",
"Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a crolli/ribaltamenti diffusi",
"Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a frane superficiali diffuse",
"Aree soggette a frane superficiali diffuse", "Aree soggette a frane superficiali diffuse",
"Aree soggette a frane superficiali diffuse", "Aree soggette a sprofondamenti diffusi",
"Aree soggette a sprofondamenti diffusi", "Aree soggette a sprofondamenti diffusi",
"Colamento lento", "Colamento lento", "Colamento lento", "Colamento lento",
"Colamento rapido", "Colamento rapido", "Colamento rapido", "Colamento rapido",
"Complesso"), date_info = c("day", "month", "no date", "year",
"day", "month", "no date", "year", "day", "no date", "year",
"day", "month", "no date", "year", "day", "month", "no date",
"year", "day"), pct_day = c(0.0508, NA, NA, NA, 0.0721, NA, NA,
NA, 0.143, NA, NA, 0.119, NA, NA, NA, 0.478, NA, NA, NA, 0.262
), pct_month = c(NA, 0.0217, NA, NA, NA, 0.0218, NA, NA, NA,
NA, NA, NA, 0.0475, NA, NA, NA, 0.00838, NA, NA, NA), pct_year = c(NA,
NA, NA, 0.701, NA, NA, NA, 0.336, NA, NA, 0.571, NA, NA, NA,
0.712, NA, NA, NA, 0.45, NA), pct_no_date = c(NA, NA, 0.227,
NA, NA, NA, 0.57, NA, NA, 0.286, NA, NA, NA, 0.122, NA, NA, NA,
0.0642, NA, NA)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20"))
我又一次有点卡住了,想寻求帮助。我希望有一天能够回馈这种帮助...
无论如何,我有一个看起来像这样的小标题:
# A tibble: 20 x 6
# Groups: tipologia [6]
tipologia date_info pct_day pct_month pct_year pct_no_date
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Aree soggette a crolli/ribaltamenti diffusi day 0.0508 NA NA NA
2 Aree soggette a crolli/ribaltamenti diffusi month NA 0.0217 NA NA
3 Aree soggette a crolli/ribaltamenti diffusi no date NA NA NA 0.227
4 Aree soggette a crolli/ribaltamenti diffusi year NA NA 0.701 NA
5 Aree soggette a frane superficiali diffuse day 0.0721 NA NA NA
6 Aree soggette a frane superficiali diffuse month NA 0.0218 NA NA
7 Aree soggette a frane superficiali diffuse no date NA NA NA 0.570
8 Aree soggette a frane superficiali diffuse year NA NA 0.336 NA
9 Aree soggette a sprofondamenti diffusi day 0.143 NA NA NA
10 Aree soggette a sprofondamenti diffusi no date NA NA NA 0.286
11 Aree soggette a sprofondamenti diffusi year NA NA 0.571 NA
12 Colamento lento day 0.119 NA NA NA
13 Colamento lento month NA 0.0475 NA NA
14 Colamento lento no date NA NA NA 0.122
15 Colamento lento year NA NA 0.712 NA
16 Colamento rapido day 0.478 NA NA NA
17 Colamento rapido month NA 0.00838 NA NA
18 Colamento rapido no date NA NA NA 0.0642
19 Colamento rapido year NA NA 0.450 NA
20 Complesso day 0.262 NA NA NA
“tipologia”中有四个条目,因为有四种可能 date-informations(日、年、月或根本没有信息)。我想要的是每个 tipologia 只有一行,并且基本上删除了这个不必要的 NA。 NA 不能有任何值,所以它们有点烦人。
我尝试了很多重新分组和总结,但没有达到我想做的。所以任何想法都会非常有帮助:)
您可以使用 na.omit
删除 NA
值。
library(dplyr)
df %>%
group_by(tipologia) %>%
summarise(across(starts_with('pct'), na.omit))
na.omit
应该适用于上述数据,但更安全的选择是:
df %>%
group_by(tipologia) %>%
summarise(across(starts_with('pct'), ~.x[!is.na(x)][1]))
您可以使用 aggregate
并使用 lapply
遍历列,然后 merge
。
Reduce(function(...) merge(..., all=T), lapply(names(dat)[3:6], function(x)
aggregate(as.formula(paste(x, "~ tipologia")), dat, I)))
# tipologia pct_day pct_month pct_year pct_no_date
# 1 Aree soggette a crolli/ribaltamenti diffusi 0.0508 0.02170 0.701 0.2270
# 2 Aree soggette a frane superficiali diffuse 0.0721 0.02180 0.336 0.5700
# 3 Aree soggette a sprofondamenti diffusi 0.1430 NA 0.571 0.2860
# 4 Colamento lento 0.1190 0.04750 0.712 0.1220
# 5 Colamento rapido 0.4780 0.00838 0.450 0.0642
# 6 Complesso 0.2620 NA NA NA
数据:
dat <- structure(list(tipologia = c("Aree soggette a crolli/ribaltamenti diffusi",
"Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a crolli/ribaltamenti diffusi",
"Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a frane superficiali diffuse",
"Aree soggette a frane superficiali diffuse", "Aree soggette a frane superficiali diffuse",
"Aree soggette a frane superficiali diffuse", "Aree soggette a sprofondamenti diffusi",
"Aree soggette a sprofondamenti diffusi", "Aree soggette a sprofondamenti diffusi",
"Colamento lento", "Colamento lento", "Colamento lento", "Colamento lento",
"Colamento rapido", "Colamento rapido", "Colamento rapido", "Colamento rapido",
"Complesso"), date_info = c("day", "month", "no date", "year",
"day", "month", "no date", "year", "day", "no date", "year",
"day", "month", "no date", "year", "day", "month", "no date",
"year", "day"), pct_day = c(0.0508, NA, NA, NA, 0.0721, NA, NA,
NA, 0.143, NA, NA, 0.119, NA, NA, NA, 0.478, NA, NA, NA, 0.262
), pct_month = c(NA, 0.0217, NA, NA, NA, 0.0218, NA, NA, NA,
NA, NA, NA, 0.0475, NA, NA, NA, 0.00838, NA, NA, NA), pct_year = c(NA,
NA, NA, 0.701, NA, NA, NA, 0.336, NA, NA, 0.571, NA, NA, NA,
0.712, NA, NA, NA, 0.45, NA), pct_no_date = c(NA, NA, 0.227,
NA, NA, NA, 0.57, NA, NA, 0.286, NA, NA, NA, 0.122, NA, NA, NA,
0.0642, NA, NA)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20"))