R从地址字符串中提取房屋/街道号码
R extract house / street numers from adress string
假设我有以下地址数据,即街道名称。我的目标是将街道名称与门牌号分开。
mydf <- tribble(
~street,
"Some Way 10",
"Shiny Street 12b",
"Dark Street from Netflix Movie 17c - 17d",
"Seasame Street",
"Dark Alley 15c",
)
mydf <- mydf %>% mutate(street= str_squish(street)) # get rid of whitespace
我尝试了以下方法
sub <- tidyr::extract(mydf, "street", c("street_name_only", "house_number"), "(\D+)(\d.*)") %>%
print(n=5)
只要有街道或门牌号就可以正常工作。如果字符串“street”没有街道号码,那么 NA 将出现在新变量“street_name_only”和“house_number”中,就像“芝麻街”的情况一样。 (我想在“new_street_column”中有“芝麻街”,理想情况下在 house_number 列中有“”(空),尽管之后我可以在 house_number 列中管理 NA ).
谁能告诉我哪里出错了以及如何解决这个问题?
非常感谢您。
这行得通吗:
mydf %>%
transmute(street_name_only = str_remove(street, '\d.*'),
house_number = str_extract(street, '\d.*'))
# A tibble: 5 x 2
street_name_only house_number
<chr> <chr>
1 "Some Way " 10
2 "Shiny Street " 12b
3 "Dark Street from Netflix Movie " 17c - 17d
4 "Seasame Street" NA
5 "Dark Alley " 15c
使用tidyr::separate
:
tidyr::separate(mydf, street, c("street_name_only", "house_number"),
'(?=\d)', extra = 'merge', fill = 'right')
# street_name_only house_number
# <chr> <chr>
#1 "Some Way " 10
#2 "Shiny Street " 12b
#3 "Dark Street from Netflix Movie " 17c - 17d
#4 "Seasame Street" NA
#5 "Dark Alley " 15c
假设我有以下地址数据,即街道名称。我的目标是将街道名称与门牌号分开。
mydf <- tribble(
~street,
"Some Way 10",
"Shiny Street 12b",
"Dark Street from Netflix Movie 17c - 17d",
"Seasame Street",
"Dark Alley 15c",
)
mydf <- mydf %>% mutate(street= str_squish(street)) # get rid of whitespace
我尝试了以下方法
sub <- tidyr::extract(mydf, "street", c("street_name_only", "house_number"), "(\D+)(\d.*)") %>%
print(n=5)
只要有街道或门牌号就可以正常工作。如果字符串“street”没有街道号码,那么 NA 将出现在新变量“street_name_only”和“house_number”中,就像“芝麻街”的情况一样。 (我想在“new_street_column”中有“芝麻街”,理想情况下在 house_number 列中有“”(空),尽管之后我可以在 house_number 列中管理 NA ).
谁能告诉我哪里出错了以及如何解决这个问题?
非常感谢您。
这行得通吗:
mydf %>%
transmute(street_name_only = str_remove(street, '\d.*'),
house_number = str_extract(street, '\d.*'))
# A tibble: 5 x 2
street_name_only house_number
<chr> <chr>
1 "Some Way " 10
2 "Shiny Street " 12b
3 "Dark Street from Netflix Movie " 17c - 17d
4 "Seasame Street" NA
5 "Dark Alley " 15c
使用tidyr::separate
:
tidyr::separate(mydf, street, c("street_name_only", "house_number"),
'(?=\d)', extra = 'merge', fill = 'right')
# street_name_only house_number
# <chr> <chr>
#1 "Some Way " 10
#2 "Shiny Street " 12b
#3 "Dark Street from Netflix Movie " 17c - 17d
#4 "Seasame Street" NA
#5 "Dark Alley " 15c