如何从R中数字前的字符串中提取大写字母
How to extract capital letters from string before number in R
我有一个列中包含字符串的数据框。如何仅提取数字前的大写子字符串并将它们添加到另一列?一个例子是 DE,但还有更多的国家/地区缩写,它们总是出现在数字之前。
TD<-data.frame(a=c("WHATEVERDE 11111","","Whatever DE 11111","DE 11111",""),
b=c("","What DE EverDE 1111","","",""),
c=c("Whatever","","","","WhateverDE 11111"))
我想创建另一个列如下:
> TD
a b c result
1 WHATEVERDE 11111 Whatever DE
2 What DE EverDE 1111 DE
3 Whatever DE 11111 DE
4 DE 11111 DE
5 WhateverDE 11111 DE
我尝试应用解决方案:
sub("^([[:alpha:]]*).*", "\1", "DE 11111") but is not universal.
带有缩写的向量:
names<-c('AT','BE','DE','BG','CZ','DK','FR','GR','ES','NL','HU','GB','IT')
我们循环 across
列,提取在零个或多个空格和一个或多个数字之前的 2 个字母大写国家代码子字符串,coalesce
输出以便它 returns每行第一个非 NA 提取元素
library(dplyr)
library(stringr)
library(purrr)
library(countrycode)
pat <- countrycode::codelist %>%
pull(iso2c) %>%
na.omit %>%
str_c(collapse = "|") %>%
sprintf(fmt = "(%s)(?=\s*\d+)")
TD %>%
mutate(result = invoke(coalesce,
across(everything(), ~ str_extract(., pat))))
-输出
a b c result
1 WHATEVERDE 11111 Whatever DE
2 What DE EverDE 1111 DE
3 Whatever DE 11111 DE
4 DE 11111 DE
5 WhateverDE 11111 DE
我有一个列中包含字符串的数据框。如何仅提取数字前的大写子字符串并将它们添加到另一列?一个例子是 DE,但还有更多的国家/地区缩写,它们总是出现在数字之前。
TD<-data.frame(a=c("WHATEVERDE 11111","","Whatever DE 11111","DE 11111",""),
b=c("","What DE EverDE 1111","","",""),
c=c("Whatever","","","","WhateverDE 11111"))
我想创建另一个列如下:
> TD
a b c result
1 WHATEVERDE 11111 Whatever DE
2 What DE EverDE 1111 DE
3 Whatever DE 11111 DE
4 DE 11111 DE
5 WhateverDE 11111 DE
我尝试应用解决方案:
sub("^([[:alpha:]]*).*", "\1", "DE 11111") but is not universal.
带有缩写的向量:
names<-c('AT','BE','DE','BG','CZ','DK','FR','GR','ES','NL','HU','GB','IT')
我们循环 across
列,提取在零个或多个空格和一个或多个数字之前的 2 个字母大写国家代码子字符串,coalesce
输出以便它 returns每行第一个非 NA 提取元素
library(dplyr)
library(stringr)
library(purrr)
library(countrycode)
pat <- countrycode::codelist %>%
pull(iso2c) %>%
na.omit %>%
str_c(collapse = "|") %>%
sprintf(fmt = "(%s)(?=\s*\d+)")
TD %>%
mutate(result = invoke(coalesce,
across(everything(), ~ str_extract(., pat))))
-输出
a b c result
1 WHATEVERDE 11111 Whatever DE
2 What DE EverDE 1111 DE
3 Whatever DE 11111 DE
4 DE 11111 DE
5 WhateverDE 11111 DE