group_by 并根据大写行观察结果填充特定行
group_by and fill specific rows based on capitalised row observations
我有一些数据如下所示:
# A tibble: 10 × 4
RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3`
<chr> <dbl> <dbl> <dbl>
1 ANDALUCÍA NA NA NA
2 Almería NA NA NA
3 Abla 58 61 54
4 Abrucena 6 2 1
5 Adra 146 211 101
6 ALBÁNCHEZ 12 3 3
7 Alboloduy 2 2 2
8 Albox 33 66 35
9 ALCOLEA 0 1 1
10 Alcóntar 1 1 2
我想做的是 group_by
并计算每个大写 RegionName
的 sum
。即 mutate(across(where(is.numeric)...
然后在每个大写区域旁边添加值。
例如:
使用这个 post 我可以提取大写的单词然后存储在一个新的列中使用:
data %>%
group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
mutate(REGIONNAME = first(RegionName)) %>%
relocate(REGIONNAME, .before = RegionName)
所以数据看起来像:
# A tibble: 10 × 6
# Groups: grp [3]
REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3` grp
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 ANDALUCÍA ANDALUCÍA NA NA NA 1
2 ANDALUCÍA Almería NA NA NA 1
3 ANDALUCÍA Abla 58 61 54 1
4 ANDALUCÍA Abrucena 6 2 1 1
5 ANDALUCÍA Adra 146 211 101 1
6 ALBÁNCHEZ ALBÁNCHEZ 12 3 3 2
7 ALBÁNCHEZ Alboloduy 2 2 2 2
8 ALBÁNCHEZ Albox 33 66 35 2
9 ALCOLEA ALCOLEA 0 1 1 3
10 ALCOLEA Alcóntar 1 1 2 3
**忽略 grp
列,我想 group_by(REGIONNAME)
和 mutate(across...
Año...
列以便给我一个 sum
每个REGIONNAME
。然后我想在每一列下填写 NA
值。
预期输出(在 ***x***
旁边进行修改):
REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3` grp
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 ANDALUCÍA ANDALUCÍA ***212*** ***274*** ***155*** 1
2 ANDALUCÍA Almería NA NA NA 1
3 ANDALUCÍA Abla 58 61 54 1
4 ANDALUCÍA Abrucena 6 2 1 1
5 ANDALUCÍA Adra 146 211 101 1
6 ALBÁNCHEZ ALBÁNCHEZ ***35*** ***68*** ***37*** 2
7 ALBÁNCHEZ Alboloduy 2 2 2 2
8 ALBÁNCHEZ Albox 33 66 35 2
9 ALCOLEA ALCOLEA ***1*** ***1*** ***2*** 3
10 ALCOLEA Alcóntar 1 1 2 3
数据:
data <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla",
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA",
"Alcóntar"), `Año 2004_1` = c(NA, NA, 58, 6, 146, 12, 2, 33,
0, 1), `Año 2004_2` = c(NA, NA, 61, 2, 211, 3, 2, 66, 1, 1),
`Año 2004_3` = c(NA, NA, 54, 1, 101, 3, 2, 35, 1, 2)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
您可以用每个组的 non-capitalized 行的总和替换每个大写的行:
#Data
data %>%
group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
mutate(REGIONNAME = first(RegionName)) %>%
relocate(REGIONNAME, .before = RegionName) %>%
# Here
mutate(across(starts_with("Año"),
~ ifelse(REGIONNAME == RegionName, sum(.x[REGIONNAME != RegionName], na.rm = T), .x)))
# A tibble: 10 x 6
# Groups: grp [3]
REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3` grp
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 ANDALUCÍA ANDALUCÍA 210 274 156 1
2 ANDALUCÍA Almería NA NA NA 1
3 ANDALUCÍA Abla 58 61 54 1
4 ANDALUCÍA Abrucena 6 2 1 1
5 ANDALUCÍA Adra 146 211 101 1
6 ALBÁNCHEZ ALBÁNCHEZ 35 68 37 2
7 ALBÁNCHEZ Alboloduy 2 2 2 2
8 ALBÁNCHEZ Albox 33 66 35 2
9 ALCOLEA ALCOLEA 1 1 2 3
10 ALCOLEA Alcóntar 1 1 2 3
我有一些数据如下所示:
# A tibble: 10 × 4
RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3`
<chr> <dbl> <dbl> <dbl>
1 ANDALUCÍA NA NA NA
2 Almería NA NA NA
3 Abla 58 61 54
4 Abrucena 6 2 1
5 Adra 146 211 101
6 ALBÁNCHEZ 12 3 3
7 Alboloduy 2 2 2
8 Albox 33 66 35
9 ALCOLEA 0 1 1
10 Alcóntar 1 1 2
我想做的是 group_by
并计算每个大写 RegionName
的 sum
。即 mutate(across(where(is.numeric)...
然后在每个大写区域旁边添加值。
例如:
使用这个 post
data %>%
group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
mutate(REGIONNAME = first(RegionName)) %>%
relocate(REGIONNAME, .before = RegionName)
所以数据看起来像:
# A tibble: 10 × 6
# Groups: grp [3]
REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3` grp
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 ANDALUCÍA ANDALUCÍA NA NA NA 1
2 ANDALUCÍA Almería NA NA NA 1
3 ANDALUCÍA Abla 58 61 54 1
4 ANDALUCÍA Abrucena 6 2 1 1
5 ANDALUCÍA Adra 146 211 101 1
6 ALBÁNCHEZ ALBÁNCHEZ 12 3 3 2
7 ALBÁNCHEZ Alboloduy 2 2 2 2
8 ALBÁNCHEZ Albox 33 66 35 2
9 ALCOLEA ALCOLEA 0 1 1 3
10 ALCOLEA Alcóntar 1 1 2 3
**忽略 grp
列,我想 group_by(REGIONNAME)
和 mutate(across...
Año...
列以便给我一个 sum
每个REGIONNAME
。然后我想在每一列下填写 NA
值。
预期输出(在 ***x***
旁边进行修改):
REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3` grp
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 ANDALUCÍA ANDALUCÍA ***212*** ***274*** ***155*** 1
2 ANDALUCÍA Almería NA NA NA 1
3 ANDALUCÍA Abla 58 61 54 1
4 ANDALUCÍA Abrucena 6 2 1 1
5 ANDALUCÍA Adra 146 211 101 1
6 ALBÁNCHEZ ALBÁNCHEZ ***35*** ***68*** ***37*** 2
7 ALBÁNCHEZ Alboloduy 2 2 2 2
8 ALBÁNCHEZ Albox 33 66 35 2
9 ALCOLEA ALCOLEA ***1*** ***1*** ***2*** 3
10 ALCOLEA Alcóntar 1 1 2 3
数据:
data <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla",
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA",
"Alcóntar"), `Año 2004_1` = c(NA, NA, 58, 6, 146, 12, 2, 33,
0, 1), `Año 2004_2` = c(NA, NA, 61, 2, 211, 3, 2, 66, 1, 1),
`Año 2004_3` = c(NA, NA, 54, 1, 101, 3, 2, 35, 1, 2)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
您可以用每个组的 non-capitalized 行的总和替换每个大写的行:
#Data
data %>%
group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
mutate(REGIONNAME = first(RegionName)) %>%
relocate(REGIONNAME, .before = RegionName) %>%
# Here
mutate(across(starts_with("Año"),
~ ifelse(REGIONNAME == RegionName, sum(.x[REGIONNAME != RegionName], na.rm = T), .x)))
# A tibble: 10 x 6
# Groups: grp [3]
REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3` grp
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 ANDALUCÍA ANDALUCÍA 210 274 156 1
2 ANDALUCÍA Almería NA NA NA 1
3 ANDALUCÍA Abla 58 61 54 1
4 ANDALUCÍA Abrucena 6 2 1 1
5 ANDALUCÍA Adra 146 211 101 1
6 ALBÁNCHEZ ALBÁNCHEZ 35 68 37 2
7 ALBÁNCHEZ Alboloduy 2 2 2 2
8 ALBÁNCHEZ Albox 33 66 35 2
9 ALCOLEA ALCOLEA 1 1 2 3
10 ALCOLEA Alcóntar 1 1 2 3