使用正则表达式重命名多列
Renaming multiple columns using regexp
问题:
我想通过替换某些重复的字符串来重命名大量列名。
Reprex:
library(dplyr)
library(stringr)
code <- c(round(runif(26, 0, 100),0))
names <- letters
AIYN <- stringi::stri_rand_strings(26, 2)
SIYN <- stringi::stri_rand_strings(26, 2)
df <- bind_cols(code, names, AIYN, SIYN)
colnames(df) <- c("code (2021)", "names (2021)", "all the info you need (AIYN) from A to Z",
"some info you need (SIYN) from A to Z")
View(df)
尝试的解决方案
colnames(df) <- str_replace_all(colnames(df), "[(2021)]", "")
colnames(df) <- str_replace_all(colnames(df), "all the info you need (AIYN) from A to Z", "AIYN")
colnames(df) <- str_replace_all(colnames(df), "some info you need (SIYN) from A to Z", "SIYN")
目标
我想删除括号中的数字(例如“(2019)”),并保留括号中的字符,其中只有字符(例如“(AIYN)”,“(SIYN)”)。我的解决方案很啰嗦,因为我的数据框有一百多列。
要删除带数字的括号,您需要
stringr::str_replace_all(colnames(df), "\s*\(\d+\)", "")
stringr::str_remove_all(colnames(df), "\s*\(\d+\)")
gsub("\s*\(\d+\)", "", colnames(df))
如果括号内的数字必须由 4 位数字组成,请将 \d+
替换为 \d{4}
。
把上面的代码放在trimws(...)
里面,去掉leading/trailing空格。
参见regex demo。
要将第一个仅包含字母的值保留在括号内,您需要
stringr::str_extract(colnames(df), '(?<=\()[A-Za-z]+(?=\))') # ASCII only
stringr::str_extract(colnames(df), '(?<=\()\p{L}+(?=\))') # Any Unicode
两者结合:
colnames(df) <- coalesce(str_extract(colnames(df), '(?<=\()[A-Za-z]+(?=\))'), str_replace_all(colnames(df), "\s*\(\d+\)", ""))
R测试
library(dplyr)
library(stringr)
x <- c("code (2021)", "names (2021)", "all the info you need (AIYN) from A to Z",
"some info you need (SIYN) from A to Z")
z <- str_replace_all(x, "\s*\(\d+\)", "")
# => [1] "code" "names" "all the info you need (AIYN) from A to Z" [4] "some info you need (SIYN) from A to Z"
y <- str_extract(z, '(?<=\()[A-Za-z]+(?=\))')
# => [1] NA NA "AIYN" "SIYN"
coalesce(y, z)
# => "code" "names" "AIYN" "SIYN"
你可以试试-
library(magrittr)
names(df) <- sub('\s\(\d+\)', '', names(df)) %>%
sub('.*\(([A-Z]+)\).*', '\1', .)
names(df)
#[1] "code" "names" "AIYN" "SIYN"
第一个 sub
将数字和空格放在括号内。
第二个 sub
提取括号内的多个 [A-Z]
值。
将此与 dplyr
和管道一起使用 -
library(dplyr)
df %>%
rename_with(~sub('\s\(\d+\)', '', .) %>%
sub('.*\(([A-Z]+)\).*', '\1', .))
# code names AIYN SIYN
# <dbl> <chr> <chr> <chr>
# 1 1 a 1A NR
# 2 96 b Dq hi
# 3 46 c 28 AQ
# 4 78 d Y8 xH
# 5 76 e ps ES
# 6 56 f m5 gQ
# 7 51 g vV 8u
# 8 72 h Hw JV
# 9 24 i 0T 7A
#10 76 j mq Qy
# … with 16 more rows
问题:
我想通过替换某些重复的字符串来重命名大量列名。
Reprex:
library(dplyr)
library(stringr)
code <- c(round(runif(26, 0, 100),0))
names <- letters
AIYN <- stringi::stri_rand_strings(26, 2)
SIYN <- stringi::stri_rand_strings(26, 2)
df <- bind_cols(code, names, AIYN, SIYN)
colnames(df) <- c("code (2021)", "names (2021)", "all the info you need (AIYN) from A to Z",
"some info you need (SIYN) from A to Z")
View(df)
尝试的解决方案
colnames(df) <- str_replace_all(colnames(df), "[(2021)]", "")
colnames(df) <- str_replace_all(colnames(df), "all the info you need (AIYN) from A to Z", "AIYN")
colnames(df) <- str_replace_all(colnames(df), "some info you need (SIYN) from A to Z", "SIYN")
目标
我想删除括号中的数字(例如“(2019)”),并保留括号中的字符,其中只有字符(例如“(AIYN)”,“(SIYN)”)。我的解决方案很啰嗦,因为我的数据框有一百多列。
要删除带数字的括号,您需要
stringr::str_replace_all(colnames(df), "\s*\(\d+\)", "")
stringr::str_remove_all(colnames(df), "\s*\(\d+\)")
gsub("\s*\(\d+\)", "", colnames(df))
如果括号内的数字必须由 4 位数字组成,请将 \d+
替换为 \d{4}
。
把上面的代码放在trimws(...)
里面,去掉leading/trailing空格。
参见regex demo。
要将第一个仅包含字母的值保留在括号内,您需要
stringr::str_extract(colnames(df), '(?<=\()[A-Za-z]+(?=\))') # ASCII only
stringr::str_extract(colnames(df), '(?<=\()\p{L}+(?=\))') # Any Unicode
两者结合:
colnames(df) <- coalesce(str_extract(colnames(df), '(?<=\()[A-Za-z]+(?=\))'), str_replace_all(colnames(df), "\s*\(\d+\)", ""))
R测试
library(dplyr)
library(stringr)
x <- c("code (2021)", "names (2021)", "all the info you need (AIYN) from A to Z",
"some info you need (SIYN) from A to Z")
z <- str_replace_all(x, "\s*\(\d+\)", "")
# => [1] "code" "names" "all the info you need (AIYN) from A to Z" [4] "some info you need (SIYN) from A to Z"
y <- str_extract(z, '(?<=\()[A-Za-z]+(?=\))')
# => [1] NA NA "AIYN" "SIYN"
coalesce(y, z)
# => "code" "names" "AIYN" "SIYN"
你可以试试-
library(magrittr)
names(df) <- sub('\s\(\d+\)', '', names(df)) %>%
sub('.*\(([A-Z]+)\).*', '\1', .)
names(df)
#[1] "code" "names" "AIYN" "SIYN"
第一个 sub
将数字和空格放在括号内。
第二个 sub
提取括号内的多个 [A-Z]
值。
将此与 dplyr
和管道一起使用 -
library(dplyr)
df %>%
rename_with(~sub('\s\(\d+\)', '', .) %>%
sub('.*\(([A-Z]+)\).*', '\1', .))
# code names AIYN SIYN
# <dbl> <chr> <chr> <chr>
# 1 1 a 1A NR
# 2 96 b Dq hi
# 3 46 c 28 AQ
# 4 78 d Y8 xH
# 5 76 e ps ES
# 6 56 f m5 gQ
# 7 51 g vV 8u
# 8 72 h Hw JV
# 9 24 i 0T 7A
#10 76 j mq Qy
# … with 16 more rows