提取大写行并向下填充直到下一个大写行
Extract uppercase rows and fill down until next uppercase row
我有一些数据看起来像:
RegionName
<chr>
1 ANDALUCÍA
2 Almería
3 Abla
4 Abrucena
5 Adra
6 ALBÁNCHEZ
7 Alboloduy
8 Albox
9 ALCOLEA
10 Alcóntar
其中一些列是 uppercase
。我想将大写列提取到新列中 fill(down)
直到下一个大写列。
预期输出:
RegionName REGIONNAME
<chr> <chr>
1 ANDALUCÍA ANDALUCÍA -first result
2 Almería ANDALUCÍA
3 Abla ANDALUCÍA
4 Abrucena ANDALUCÍA
5 Adra ANDALUCÍA
6 ALBÁNCHEZ ALBÁNCHEZ - change here
7 Alboloduy ALBÁNCHEZ
8 Albox ALBÁNCHEZ
9 ALCOLEA ALCOLEA - change here
10 Alcóntar ALCOLEA
数据:
data = structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla",
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA",
"Alcóntar")), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
您可以根据区域名称是否 ==
全部大写来将区域分组在一起。然后将组内的所有名称设置为全部大写的 first
RegionName
。
library(tidyverse)
df %>%
group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
mutate(REGIONNAME = first(RegionName))
输出
RegionName grp REGIONNAME
<chr> <int> <chr>
1 ANDALUCÍA 1 ANDALUCÍA
2 Almería 1 ANDALUCÍA
3 Abla 1 ANDALUCÍA
4 Abrucena 1 ANDALUCÍA
5 Adra 1 ANDALUCÍA
6 ALBÁNCHEZ 2 ALBÁNCHEZ
7 Alboloduy 2 ALBÁNCHEZ
8 Albox 2 ALBÁNCHEZ
9 ALCOLEA 3 ALCOLEA
10 Alcóntar 3 ALCOLEA
数据
df <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla",
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA",
"Alcóntar")), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"))
一个想法是使用grepl()
识别[[:upper:]]
,将其他的转换为NA和fill()
,即
library(dplyr)
library(tidyr)
data %>%
mutate(new = replace(RegionName, !grepl("^[[:upper:]]+$", RegionName), NA)) %>%
fill(new)
# A tibble: 10 x 2
RegionName new
<chr> <chr>
1 ANDALUCÍA ANDALUCÍA
2 Almería ANDALUCÍA
3 Abla ANDALUCÍA
4 Abrucena ANDALUCÍA
5 Adra ANDALUCÍA
6 ALBÁNCHEZ ALBÁNCHEZ
7 Alboloduy ALBÁNCHEZ
8 Albox ALBÁNCHEZ
9 ALCOLEA ALCOLEA
10 Alcóntar ALCOLEA
ifelse
和 fill
的替代方案:
library(tidyverse)
df %>%
mutate(REGIONNAME = ifelse(RegionName == toupper(RegionName), RegionName, NA)) %>%
fill(REGIONNAME)
RegionName REGIONNAME
1 ANDALUCÍA ANDALUCÍA
2 Almería ANDALUCÍA
3 Abla ANDALUCÍA
4 Abrucena ANDALUCÍA
5 Adra ANDALUCÍA
6 ALBÁNCHEZ ALBÁNCHEZ
7 Alboloduy ALBÁNCHEZ
8 Albox ALBÁNCHEZ
9 ALCOLEA ALCOLEA
10 Alcóntar ALCOLEA
我有一些数据看起来像:
RegionName
<chr>
1 ANDALUCÍA
2 Almería
3 Abla
4 Abrucena
5 Adra
6 ALBÁNCHEZ
7 Alboloduy
8 Albox
9 ALCOLEA
10 Alcóntar
其中一些列是 uppercase
。我想将大写列提取到新列中 fill(down)
直到下一个大写列。
预期输出:
RegionName REGIONNAME
<chr> <chr>
1 ANDALUCÍA ANDALUCÍA -first result
2 Almería ANDALUCÍA
3 Abla ANDALUCÍA
4 Abrucena ANDALUCÍA
5 Adra ANDALUCÍA
6 ALBÁNCHEZ ALBÁNCHEZ - change here
7 Alboloduy ALBÁNCHEZ
8 Albox ALBÁNCHEZ
9 ALCOLEA ALCOLEA - change here
10 Alcóntar ALCOLEA
数据:
data = structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla",
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA",
"Alcóntar")), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
您可以根据区域名称是否 ==
全部大写来将区域分组在一起。然后将组内的所有名称设置为全部大写的 first
RegionName
。
library(tidyverse)
df %>%
group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
mutate(REGIONNAME = first(RegionName))
输出
RegionName grp REGIONNAME
<chr> <int> <chr>
1 ANDALUCÍA 1 ANDALUCÍA
2 Almería 1 ANDALUCÍA
3 Abla 1 ANDALUCÍA
4 Abrucena 1 ANDALUCÍA
5 Adra 1 ANDALUCÍA
6 ALBÁNCHEZ 2 ALBÁNCHEZ
7 Alboloduy 2 ALBÁNCHEZ
8 Albox 2 ALBÁNCHEZ
9 ALCOLEA 3 ALCOLEA
10 Alcóntar 3 ALCOLEA
数据
df <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla",
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA",
"Alcóntar")), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"))
一个想法是使用grepl()
识别[[:upper:]]
,将其他的转换为NA和fill()
,即
library(dplyr)
library(tidyr)
data %>%
mutate(new = replace(RegionName, !grepl("^[[:upper:]]+$", RegionName), NA)) %>%
fill(new)
# A tibble: 10 x 2
RegionName new
<chr> <chr>
1 ANDALUCÍA ANDALUCÍA
2 Almería ANDALUCÍA
3 Abla ANDALUCÍA
4 Abrucena ANDALUCÍA
5 Adra ANDALUCÍA
6 ALBÁNCHEZ ALBÁNCHEZ
7 Alboloduy ALBÁNCHEZ
8 Albox ALBÁNCHEZ
9 ALCOLEA ALCOLEA
10 Alcóntar ALCOLEA
ifelse
和 fill
的替代方案:
library(tidyverse)
df %>%
mutate(REGIONNAME = ifelse(RegionName == toupper(RegionName), RegionName, NA)) %>%
fill(REGIONNAME)
RegionName REGIONNAME
1 ANDALUCÍA ANDALUCÍA
2 Almería ANDALUCÍA
3 Abla ANDALUCÍA
4 Abrucena ANDALUCÍA
5 Adra ANDALUCÍA
6 ALBÁNCHEZ ALBÁNCHEZ
7 Alboloduy ALBÁNCHEZ
8 Albox ALBÁNCHEZ
9 ALCOLEA ALCOLEA
10 Alcóntar ALCOLEA