使用 tidyselect 和正则表达式重命名 R 数据框的列
Rename columns of R dataframe with tidyselect and regular expression
我有一个数据框,其列名是编号和一些复杂文本的组合:
A1。美好的一天
A1a。祝你有美好的一天
......
- Z7d。其他一些标题
现在我只想保留“A1.”、“A1a.”、“Z7d.”,同时删除前面的数字和结尾的文本。有什么想法如何用 tidyselect
和 regex
做到这一点?
你可以使用这个正则表达式 -
names(df) <- sub('\d+\.\s+([A-Za-z0-9]+).*', '\1', names(df))
names(df)
#[1] "A1" "A1a" "Z7d"
如果您想要 tidyverse
答案,同样的正则表达式也可以用在 rename_with
中。
library(dplyr)
df %>% rename_with(~sub('\d+\.\s+([A-Za-z0-9]+).*', '\1', .))
# A1 A1a Z7d
#1 0.5755992 0.4147519 -0.1474461
#2 0.1347792 -0.6277678 0.3263348
#3 1.6884930 1.3931306 0.8809109
#4 -0.4269351 -1.2922231 -0.3362182
#5 -2.0032113 0.2619571 0.4496466
数据
df <- structure(list(`1. A1. Good day` = c(0.575599213383783, 0.134779160673435,
1.68849296209512, -0.426935114884432, -2.00321125417319), `2. A1a. Have a nice day` = c(0.414751904860513,
-0.627767775889949, 1.39313055331098, -1.29222310608057, 0.261957078465535
), `99. Z7d. Some other titles` = c(-0.147446140558093, 0.326334824433201,
0.880910933597998, -0.336218174873965, 0.449646567320979)),
class = "data.frame", row.names = c(NA, -5L))
我们可以使用str_extract
library(stringr)
names(df) <- str_extract(names(df), "(?<=\.\s)[^.]+")
names(df)
[1] "A1" "A1a" "Z7d"
数据
df <- structure(list(`1. A1. Good day` = c(0.575599213383783, 0.134779160673435,
1.68849296209512, -0.426935114884432, -2.00321125417319), `2. A1a. Have a nice day` = c(0.414751904860513,
-0.627767775889949, 1.39313055331098, -1.29222310608057, 0.261957078465535
), `99. Z7d. Some other titles` = c(-0.147446140558093, 0.326334824433201,
0.880910933597998, -0.336218174873965, 0.449646567320979)),
class = "data.frame", row.names = c(NA, -5L))
我有一个数据框,其列名是编号和一些复杂文本的组合:
A1。美好的一天
A1a。祝你有美好的一天
......
- Z7d。其他一些标题
现在我只想保留“A1.”、“A1a.”、“Z7d.”,同时删除前面的数字和结尾的文本。有什么想法如何用 tidyselect
和 regex
做到这一点?
你可以使用这个正则表达式 -
names(df) <- sub('\d+\.\s+([A-Za-z0-9]+).*', '\1', names(df))
names(df)
#[1] "A1" "A1a" "Z7d"
如果您想要 tidyverse
答案,同样的正则表达式也可以用在 rename_with
中。
library(dplyr)
df %>% rename_with(~sub('\d+\.\s+([A-Za-z0-9]+).*', '\1', .))
# A1 A1a Z7d
#1 0.5755992 0.4147519 -0.1474461
#2 0.1347792 -0.6277678 0.3263348
#3 1.6884930 1.3931306 0.8809109
#4 -0.4269351 -1.2922231 -0.3362182
#5 -2.0032113 0.2619571 0.4496466
数据
df <- structure(list(`1. A1. Good day` = c(0.575599213383783, 0.134779160673435,
1.68849296209512, -0.426935114884432, -2.00321125417319), `2. A1a. Have a nice day` = c(0.414751904860513,
-0.627767775889949, 1.39313055331098, -1.29222310608057, 0.261957078465535
), `99. Z7d. Some other titles` = c(-0.147446140558093, 0.326334824433201,
0.880910933597998, -0.336218174873965, 0.449646567320979)),
class = "data.frame", row.names = c(NA, -5L))
我们可以使用str_extract
library(stringr)
names(df) <- str_extract(names(df), "(?<=\.\s)[^.]+")
names(df)
[1] "A1" "A1a" "Z7d"
数据
df <- structure(list(`1. A1. Good day` = c(0.575599213383783, 0.134779160673435,
1.68849296209512, -0.426935114884432, -2.00321125417319), `2. A1a. Have a nice day` = c(0.414751904860513,
-0.627767775889949, 1.39313055331098, -1.29222310608057, 0.261957078465535
), `99. Z7d. Some other titles` = c(-0.147446140558093, 0.326334824433201,
0.880910933597998, -0.336218174873965, 0.449646567320979)),
class = "data.frame", row.names = c(NA, -5L))