如何在 R 中使用 stringr 获取 select 适当街道名称的正则表达式?
How can I get regular expressions to select appropriate street names using stringr in R?
我刚刚开始使用正则表达式(使用 stringr
程序包),并且我编写的一些代码并没有完全按照我的要求执行。我正在处理一个包含一些非常混乱的字符串数据的数据集,并试图清理它以便与 google 地图 API.
一起使用
我附上了下面的数据样本。
基本上,我想 select 每一行,其中 loc_01
是一个简单的街道名称。通过这个,我的意思是我希望它采用以下格式:
编号的街道,例如10th Ave
;命名的街道,例如 MAIN ST
,以及此类街道名称的任何定向修改(例如 10TH AVE NW, W MAIN ST, or W 10TH AVE
。)
我试过以下表达式:
df %>% filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$"))
但这给了我 10 AVE 1300 BLK E
这样的输出,这不是我想要 select 的观察结果。我将正则表达式解释为:
- "select 以字符开头的行(可选)- 处理以 N、NW 等开头的字符串,后跟 space(如有必要)、'word' 以任意字符开头的任意长度(对于 10th st 或 MLK Ave),后跟 space 和后缀 'AVE, ST, or BLVD',以及一个可选单词(用于处理 10th st W 和喜欢)。
很明显,我的解释是错误的,因为我得到了 10 AVD 1300 BLK E
这样的东西。在这种情况下,为了获得我想要的结果,正确的正则表达式是什么?
非常感谢您的帮助!
structure(list(ID = c("387", "404", "422", "425", "432", "443",
"526", "536", "580", "658", "665", "666", "735", "880", "910",
"911", "912", "913", "916", "917", "972", "1098", "1194", "1231",
"1298", "1309", "1310", "1311", "1312", "1316", "1328", "1354",
"1371", "1373", "1374", "1376", "1381", "1388", "1389", "1390",
"1391", "1392", "1393", "1406", "1407", "1408", "1409", "1410",
"1411", "1412", "1413", "1414", "1418", "1420", "1422", "1429",
"1430", "1433", "1434", "1437", "1441", "1442", "1443", "1444",
"1445", "1448", "1451", "1452", "1453", "1454", "1455", "1457",
"1461", "1462", "1463", "1464", "1466", "1468", "1470", "1471",
"1473", "1479", "1480", "1481", "1486", "1489", "1490", "1493",
"1495", "1496", "1498", "1502", "1503", "1509", "1511", "1512",
"1513", "1517", "1", "2"), city = c("DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER"),
loc_01 = c("#50 S KALAMATH ST", "00 BLKS BRYANT CANOSA",
"000 BLK ALLEY", "000 BLK BROADWAY", "000 BLK E 11TH AV",
"000 BLK E 17TH", "000 BLK S BROADWAY", "000 BLK S IRVING JULIAN",
"000 BLK W ALAMEDA AV", "10 AVE 1300 BLK E", "100 BLK ALLEY N BROADWAY/N ACOMA",
"100 BLK ALLEY S", "100 BLK N WASHINGTON ST", "1000 ALLEY LINCOLN/BROADWAY",
"1000 BLK ALLEY CHEROKEE/DELAWARE", "1000 BLK ALLEY GRANT",
"1000 BLK ALLEY MARTIN/LAFAYETT", "1000 BLK ALLEY MONROE/GARFIELD",
"1000 BLK ALLEY OGDEN", "1000 BLK ALLEY S GAYLORD ST", "1000 BLK E GAY",
"1000 BLK S VINE/GAYLORD ALLEY", "1010 CURTIS ST", "1050 ODELL ST",
"109TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "E 10TH AVE",
"MAIN ST NW"), link = c("", "", "", "00125FN", "00025FW",
"AT", "", "00050FS", "00005FW", "00100FE", "00043FN", "",
"", "", "", "AT", "", "00120FS", "", "00070FN", "", "00200FS",
"", "", "00020FS", "09999FN", "AT", "AT", "AT", "AT", "AT",
"AT", "AT", "00080FW", "00175FW", "AT", "00101FW", "AT",
"AT", "AT", "AT", "AT", "AT", "00060FE", "00120FS", "AT",
"AT", "AT", "AT", "00015FW", "00035FW", "00075FW", "00022FE",
"00144FW", "00250FE", "AT", "AT", "00037FW", "00100FE", "00200FW",
"AT", "AT", "00084FW", "00100FW", "AT", "00100FN", "AT",
"AT", "AT", "AT", "AT", "00100FW", "00068FE", "00136FE",
"00200FE", "00150FW", "AT", "00020FE", "00020FW", "00030FE",
"00045FW", "AT", "AT", "AT", "AT", "AT", "AT", "AT", "00163FE",
"AT", "AT", "AT", "AT", "AT", "AT", "AT", "00100FE", "00020FW",
"", ""), loc_02 = c("N SIDE OF BLDG", "ALLEY", "KNOX CT/KING ST",
"IRVINGTON PL", "N LINDA ST", "POLE 94 79", "PKG METER BS-46",
"ALLEY", "POLE 844/005", "MARION ST N", "W 1ST AV", "MEADE/NEWTON",
"E 1ST AVE", "10 AV", "W 11TH AVE", "LOGAN ST", "E 11TH AV",
"E 11TH AVE", "CORONA ST", "E MISSISSIPPI AVE", "CORONA ST",
"E TENNESSEE AVE", "31ST STREET", "E 11TH AVE", "QUEENSBURG ST",
"10TH AVE 1407 E", "10TH AVE 2900 BLK W", "10TH AVE 300 BLK E",
"10TH AVE 3200 BLK W", "1295 W", "2900 W", "500 E", "ACOMA / BANNOCK ALLEY",
"ACOMA ST", "ACOMA ST", "ADAMS ST", "BANNOCK ST", "BANNOCK ST",
"BANNOCK ST", "BANNOCK ST", "BANNOCK ST", "BANNOCK ST", "BANNOCK ST N",
"BROADWAY ST", "BROADWAY ST", "BROADWAY ST", "BROADWAY ST",
"BROADWAY ST", "BROADWAY ST", "BROADWAY ST N", "BRYANT ST",
"BRYANT ST", "CLARKSON ST", "CLARKSON ST", "CLARKSON ST",
"CLARKSON ST", "CLARKSON ST", "CORONA ST", "CORONA ST", "CORONA ST",
"CORONA ST", "CORONA ST", "CORONA ST N", "DECATUR ST", "DECATUR ST",
"DOWNING ST", "DOWNING ST", "DOWNING ST", "DOWNING ST", "DOWNING ST",
"DOWNING ST", "DOWNING ST N", "FEDERAL BLVD", "FEDERAL BLVD",
"FEDERAL BLVD", "GALAPAGO ST", "GARFIELD ST", "GRANT ST",
"GRANT ST", "GRANT ST", "GRANT ST", "GRANT ST", "GRANT ST",
"GRANT ST", "GROVE ST", "GROVE ST", "GROVE ST", "HOOKER ST",
"HUMBOLDT ST", "HUMBOLDT ST", "INCA ST", "KALAMATH ST", "KALAMATH ST",
"KNOX CT", "KNOX CT", "KNOX CT", "LAFAYETTE ST", "LINCOLN ST",
"MAIN ST", "100TH AVE")), row.names = c(NA, -100L), class = "data.frame")
一种方法是添加一个额外的 filter
语句(尽管我确信有更好的方法)。
library(tidyverse)
df %>%
filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$")) %>%
filter(!str_detect(loc_01, 'BLK'))
输出
ID city loc_01 link loc_02
1 387 DENVER #50 S KALAMATH ST N SIDE OF BLDG
2 1194 DENVER 1010 CURTIS ST 31ST STREET
3 1231 DENVER 1050 ODELL ST E 11TH AVE
4 1298 DENVER 109TH AVE 00020FS QUEENSBURG ST
5 1309 DENVER 10TH AVE 09999FN 10TH AVE 1407 E
6 1310 DENVER 10TH AVE AT 10TH AVE 2900 BLK W
7 1311 DENVER 10TH AVE AT 10TH AVE 300 BLK E
8 1312 DENVER 10TH AVE AT 10TH AVE 3200 BLK W
9 1316 DENVER 10TH AVE AT 1295 W
10 1328 DENVER 10TH AVE AT 2900 W
11 1354 DENVER 10TH AVE AT 500 E
12 1371 DENVER 10TH AVE AT ACOMA / BANNOCK ALLEY
13 1373 DENVER 10TH AVE 00080FW ACOMA ST
14 1374 DENVER 10TH AVE 00175FW ACOMA ST
15 1376 DENVER 10TH AVE AT ADAMS ST
16 1381 DENVER 10TH AVE 00101FW BANNOCK ST
17 1388 DENVER 10TH AVE AT BANNOCK ST
18 1389 DENVER 10TH AVE AT BANNOCK ST
19 1390 DENVER 10TH AVE AT BANNOCK ST
20 1391 DENVER 10TH AVE AT BANNOCK ST
21 1392 DENVER 10TH AVE AT BANNOCK ST
22 1393 DENVER 10TH AVE AT BANNOCK ST N
23 1406 DENVER 10TH AVE 00060FE BROADWAY ST
24 1407 DENVER 10TH AVE 00120FS BROADWAY ST
25 1408 DENVER 10TH AVE AT BROADWAY ST
26 1409 DENVER 10TH AVE AT BROADWAY ST
27 1410 DENVER 10TH AVE AT BROADWAY ST
28 1411 DENVER 10TH AVE AT BROADWAY ST
29 1412 DENVER 10TH AVE 00015FW BROADWAY ST N
30 1413 DENVER 10TH AVE 00035FW BRYANT ST
31 1414 DENVER 10TH AVE 00075FW BRYANT ST
32 1418 DENVER 10TH AVE 00022FE CLARKSON ST
33 1420 DENVER 10TH AVE 00144FW CLARKSON ST
34 1422 DENVER 10TH AVE 00250FE CLARKSON ST
35 1429 DENVER 10TH AVE AT CLARKSON ST
36 1430 DENVER 10TH AVE AT CLARKSON ST
37 1433 DENVER 10TH AVE 00037FW CORONA ST
38 1434 DENVER 10TH AVE 00100FE CORONA ST
39 1437 DENVER 10TH AVE 00200FW CORONA ST
40 1441 DENVER 10TH AVE AT CORONA ST
41 1442 DENVER 10TH AVE AT CORONA ST
42 1443 DENVER 10TH AVE 00084FW CORONA ST N
43 1444 DENVER 10TH AVE 00100FW DECATUR ST
44 1445 DENVER 10TH AVE AT DECATUR ST
45 1448 DENVER 10TH AVE 00100FN DOWNING ST
46 1451 DENVER 10TH AVE AT DOWNING ST
47 1452 DENVER 10TH AVE AT DOWNING ST
48 1453 DENVER 10TH AVE AT DOWNING ST
49 1454 DENVER 10TH AVE AT DOWNING ST
50 1455 DENVER 10TH AVE AT DOWNING ST
51 1457 DENVER 10TH AVE 00100FW DOWNING ST N
52 1461 DENVER 10TH AVE 00068FE FEDERAL BLVD
53 1462 DENVER 10TH AVE 00136FE FEDERAL BLVD
54 1463 DENVER 10TH AVE 00200FE FEDERAL BLVD
55 1464 DENVER 10TH AVE 00150FW GALAPAGO ST
56 1466 DENVER 10TH AVE AT GARFIELD ST
57 1468 DENVER 10TH AVE 00020FE GRANT ST
58 1470 DENVER 10TH AVE 00020FW GRANT ST
59 1471 DENVER 10TH AVE 00030FE GRANT ST
60 1473 DENVER 10TH AVE 00045FW GRANT ST
61 1479 DENVER 10TH AVE AT GRANT ST
62 1480 DENVER 10TH AVE AT GRANT ST
63 1481 DENVER 10TH AVE AT GRANT ST
64 1486 DENVER 10TH AVE AT GROVE ST
65 1489 DENVER 10TH AVE AT GROVE ST
66 1490 DENVER 10TH AVE AT GROVE ST
67 1493 DENVER 10TH AVE AT HOOKER ST
68 1495 DENVER 10TH AVE 00163FE HUMBOLDT ST
69 1496 DENVER 10TH AVE AT HUMBOLDT ST
70 1498 DENVER 10TH AVE AT INCA ST
71 1502 DENVER 10TH AVE AT KALAMATH ST
72 1503 DENVER 10TH AVE AT KALAMATH ST
73 1509 DENVER 10TH AVE AT KNOX CT
74 1511 DENVER 10TH AVE AT KNOX CT
75 1512 DENVER 10TH AVE AT KNOX CT
76 1513 DENVER 10TH AVE 00100FE LAFAYETTE ST
77 1517 DENVER 10TH AVE 00020FW LINCOLN ST
78 1 DENVER E 10TH AVE MAIN ST
79 2 DENVER MAIN ST NW 100TH AVE
如果有多个字符串导致问题,那么您可以创建一个列表并将其放入第二个过滤语句中。所以,如果你想删除带有 #50
:
的行
remove.list <- paste(c("#", "BLK"), collapse = '|')
df %>%
filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$")) %>%
filter(!str_detect(loc_01, remove.list))
输出
head()
ID city loc_01 link loc_02
1 1194 DENVER 1010 CURTIS ST 31ST STREET
2 1231 DENVER 1050 ODELL ST E 11TH AVE
3 1298 DENVER 109TH AVE 00020FS QUEENSBURG ST
4 1309 DENVER 10TH AVE 09999FN 10TH AVE 1407 E
5 1310 DENVER 10TH AVE AT 10TH AVE 2900 BLK W
6 1311 DENVER 10TH AVE AT 10TH AVE 300 BLK E
7 1312 DENVER 10TH AVE AT 10TH AVE 3200 BLK W
8 1316 DENVER 10TH AVE AT 1295 W
9 1328 DENVER 10TH AVE AT 2900 W
10 1354 DENVER 10TH AVE AT 500 E
对于 filter
loc_02
,我们可以添加一个额外的过滤语句,以保留以数字开头并以方向结尾的行。
df %>%
filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$")) %>%
filter(!str_detect(loc_01, 'BLK')) %>%
filter(str_detect(loc_02, "^[[:digit:]]+( N| S| E| W| NE| NW| SE| SW)$"))
# Or you could write it like this:
# df %>%
# filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$")) %>%
# filter(!str_detect(loc_01, 'BLK')) %>%
# filter(str_detect(loc_02, paste("^\d+(\s)", "(", direction_abbrev, ")","$", sep = "")))
输出
ID city loc_01 link loc_02
1 1316 DENVER 10TH AVE AT 1295 W
2 1328 DENVER 10TH AVE AT 2900 W
3 1354 DENVER 10TH AVE AT 500 E
我刚刚开始使用正则表达式(使用 stringr
程序包),并且我编写的一些代码并没有完全按照我的要求执行。我正在处理一个包含一些非常混乱的字符串数据的数据集,并试图清理它以便与 google 地图 API.
我附上了下面的数据样本。
基本上,我想 select 每一行,其中 loc_01
是一个简单的街道名称。通过这个,我的意思是我希望它采用以下格式:
编号的街道,例如10th Ave
;命名的街道,例如 MAIN ST
,以及此类街道名称的任何定向修改(例如 10TH AVE NW, W MAIN ST, or W 10TH AVE
。)
我试过以下表达式:
df %>% filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$"))
但这给了我 10 AVE 1300 BLK E
这样的输出,这不是我想要 select 的观察结果。我将正则表达式解释为:
- "select 以字符开头的行(可选)- 处理以 N、NW 等开头的字符串,后跟 space(如有必要)、'word' 以任意字符开头的任意长度(对于 10th st 或 MLK Ave),后跟 space 和后缀 'AVE, ST, or BLVD',以及一个可选单词(用于处理 10th st W 和喜欢)。
很明显,我的解释是错误的,因为我得到了 10 AVD 1300 BLK E
这样的东西。在这种情况下,为了获得我想要的结果,正确的正则表达式是什么?
非常感谢您的帮助!
structure(list(ID = c("387", "404", "422", "425", "432", "443",
"526", "536", "580", "658", "665", "666", "735", "880", "910",
"911", "912", "913", "916", "917", "972", "1098", "1194", "1231",
"1298", "1309", "1310", "1311", "1312", "1316", "1328", "1354",
"1371", "1373", "1374", "1376", "1381", "1388", "1389", "1390",
"1391", "1392", "1393", "1406", "1407", "1408", "1409", "1410",
"1411", "1412", "1413", "1414", "1418", "1420", "1422", "1429",
"1430", "1433", "1434", "1437", "1441", "1442", "1443", "1444",
"1445", "1448", "1451", "1452", "1453", "1454", "1455", "1457",
"1461", "1462", "1463", "1464", "1466", "1468", "1470", "1471",
"1473", "1479", "1480", "1481", "1486", "1489", "1490", "1493",
"1495", "1496", "1498", "1502", "1503", "1509", "1511", "1512",
"1513", "1517", "1", "2"), city = c("DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER",
"DENVER", "DENVER", "DENVER", "DENVER", "DENVER", "DENVER"),
loc_01 = c("#50 S KALAMATH ST", "00 BLKS BRYANT CANOSA",
"000 BLK ALLEY", "000 BLK BROADWAY", "000 BLK E 11TH AV",
"000 BLK E 17TH", "000 BLK S BROADWAY", "000 BLK S IRVING JULIAN",
"000 BLK W ALAMEDA AV", "10 AVE 1300 BLK E", "100 BLK ALLEY N BROADWAY/N ACOMA",
"100 BLK ALLEY S", "100 BLK N WASHINGTON ST", "1000 ALLEY LINCOLN/BROADWAY",
"1000 BLK ALLEY CHEROKEE/DELAWARE", "1000 BLK ALLEY GRANT",
"1000 BLK ALLEY MARTIN/LAFAYETT", "1000 BLK ALLEY MONROE/GARFIELD",
"1000 BLK ALLEY OGDEN", "1000 BLK ALLEY S GAYLORD ST", "1000 BLK E GAY",
"1000 BLK S VINE/GAYLORD ALLEY", "1010 CURTIS ST", "1050 ODELL ST",
"109TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE",
"10TH AVE", "10TH AVE", "10TH AVE", "10TH AVE", "E 10TH AVE",
"MAIN ST NW"), link = c("", "", "", "00125FN", "00025FW",
"AT", "", "00050FS", "00005FW", "00100FE", "00043FN", "",
"", "", "", "AT", "", "00120FS", "", "00070FN", "", "00200FS",
"", "", "00020FS", "09999FN", "AT", "AT", "AT", "AT", "AT",
"AT", "AT", "00080FW", "00175FW", "AT", "00101FW", "AT",
"AT", "AT", "AT", "AT", "AT", "00060FE", "00120FS", "AT",
"AT", "AT", "AT", "00015FW", "00035FW", "00075FW", "00022FE",
"00144FW", "00250FE", "AT", "AT", "00037FW", "00100FE", "00200FW",
"AT", "AT", "00084FW", "00100FW", "AT", "00100FN", "AT",
"AT", "AT", "AT", "AT", "00100FW", "00068FE", "00136FE",
"00200FE", "00150FW", "AT", "00020FE", "00020FW", "00030FE",
"00045FW", "AT", "AT", "AT", "AT", "AT", "AT", "AT", "00163FE",
"AT", "AT", "AT", "AT", "AT", "AT", "AT", "00100FE", "00020FW",
"", ""), loc_02 = c("N SIDE OF BLDG", "ALLEY", "KNOX CT/KING ST",
"IRVINGTON PL", "N LINDA ST", "POLE 94 79", "PKG METER BS-46",
"ALLEY", "POLE 844/005", "MARION ST N", "W 1ST AV", "MEADE/NEWTON",
"E 1ST AVE", "10 AV", "W 11TH AVE", "LOGAN ST", "E 11TH AV",
"E 11TH AVE", "CORONA ST", "E MISSISSIPPI AVE", "CORONA ST",
"E TENNESSEE AVE", "31ST STREET", "E 11TH AVE", "QUEENSBURG ST",
"10TH AVE 1407 E", "10TH AVE 2900 BLK W", "10TH AVE 300 BLK E",
"10TH AVE 3200 BLK W", "1295 W", "2900 W", "500 E", "ACOMA / BANNOCK ALLEY",
"ACOMA ST", "ACOMA ST", "ADAMS ST", "BANNOCK ST", "BANNOCK ST",
"BANNOCK ST", "BANNOCK ST", "BANNOCK ST", "BANNOCK ST", "BANNOCK ST N",
"BROADWAY ST", "BROADWAY ST", "BROADWAY ST", "BROADWAY ST",
"BROADWAY ST", "BROADWAY ST", "BROADWAY ST N", "BRYANT ST",
"BRYANT ST", "CLARKSON ST", "CLARKSON ST", "CLARKSON ST",
"CLARKSON ST", "CLARKSON ST", "CORONA ST", "CORONA ST", "CORONA ST",
"CORONA ST", "CORONA ST", "CORONA ST N", "DECATUR ST", "DECATUR ST",
"DOWNING ST", "DOWNING ST", "DOWNING ST", "DOWNING ST", "DOWNING ST",
"DOWNING ST", "DOWNING ST N", "FEDERAL BLVD", "FEDERAL BLVD",
"FEDERAL BLVD", "GALAPAGO ST", "GARFIELD ST", "GRANT ST",
"GRANT ST", "GRANT ST", "GRANT ST", "GRANT ST", "GRANT ST",
"GRANT ST", "GROVE ST", "GROVE ST", "GROVE ST", "HOOKER ST",
"HUMBOLDT ST", "HUMBOLDT ST", "INCA ST", "KALAMATH ST", "KALAMATH ST",
"KNOX CT", "KNOX CT", "KNOX CT", "LAFAYETTE ST", "LINCOLN ST",
"MAIN ST", "100TH AVE")), row.names = c(NA, -100L), class = "data.frame")
一种方法是添加一个额外的 filter
语句(尽管我确信有更好的方法)。
library(tidyverse)
df %>%
filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$")) %>%
filter(!str_detect(loc_01, 'BLK'))
输出
ID city loc_01 link loc_02
1 387 DENVER #50 S KALAMATH ST N SIDE OF BLDG
2 1194 DENVER 1010 CURTIS ST 31ST STREET
3 1231 DENVER 1050 ODELL ST E 11TH AVE
4 1298 DENVER 109TH AVE 00020FS QUEENSBURG ST
5 1309 DENVER 10TH AVE 09999FN 10TH AVE 1407 E
6 1310 DENVER 10TH AVE AT 10TH AVE 2900 BLK W
7 1311 DENVER 10TH AVE AT 10TH AVE 300 BLK E
8 1312 DENVER 10TH AVE AT 10TH AVE 3200 BLK W
9 1316 DENVER 10TH AVE AT 1295 W
10 1328 DENVER 10TH AVE AT 2900 W
11 1354 DENVER 10TH AVE AT 500 E
12 1371 DENVER 10TH AVE AT ACOMA / BANNOCK ALLEY
13 1373 DENVER 10TH AVE 00080FW ACOMA ST
14 1374 DENVER 10TH AVE 00175FW ACOMA ST
15 1376 DENVER 10TH AVE AT ADAMS ST
16 1381 DENVER 10TH AVE 00101FW BANNOCK ST
17 1388 DENVER 10TH AVE AT BANNOCK ST
18 1389 DENVER 10TH AVE AT BANNOCK ST
19 1390 DENVER 10TH AVE AT BANNOCK ST
20 1391 DENVER 10TH AVE AT BANNOCK ST
21 1392 DENVER 10TH AVE AT BANNOCK ST
22 1393 DENVER 10TH AVE AT BANNOCK ST N
23 1406 DENVER 10TH AVE 00060FE BROADWAY ST
24 1407 DENVER 10TH AVE 00120FS BROADWAY ST
25 1408 DENVER 10TH AVE AT BROADWAY ST
26 1409 DENVER 10TH AVE AT BROADWAY ST
27 1410 DENVER 10TH AVE AT BROADWAY ST
28 1411 DENVER 10TH AVE AT BROADWAY ST
29 1412 DENVER 10TH AVE 00015FW BROADWAY ST N
30 1413 DENVER 10TH AVE 00035FW BRYANT ST
31 1414 DENVER 10TH AVE 00075FW BRYANT ST
32 1418 DENVER 10TH AVE 00022FE CLARKSON ST
33 1420 DENVER 10TH AVE 00144FW CLARKSON ST
34 1422 DENVER 10TH AVE 00250FE CLARKSON ST
35 1429 DENVER 10TH AVE AT CLARKSON ST
36 1430 DENVER 10TH AVE AT CLARKSON ST
37 1433 DENVER 10TH AVE 00037FW CORONA ST
38 1434 DENVER 10TH AVE 00100FE CORONA ST
39 1437 DENVER 10TH AVE 00200FW CORONA ST
40 1441 DENVER 10TH AVE AT CORONA ST
41 1442 DENVER 10TH AVE AT CORONA ST
42 1443 DENVER 10TH AVE 00084FW CORONA ST N
43 1444 DENVER 10TH AVE 00100FW DECATUR ST
44 1445 DENVER 10TH AVE AT DECATUR ST
45 1448 DENVER 10TH AVE 00100FN DOWNING ST
46 1451 DENVER 10TH AVE AT DOWNING ST
47 1452 DENVER 10TH AVE AT DOWNING ST
48 1453 DENVER 10TH AVE AT DOWNING ST
49 1454 DENVER 10TH AVE AT DOWNING ST
50 1455 DENVER 10TH AVE AT DOWNING ST
51 1457 DENVER 10TH AVE 00100FW DOWNING ST N
52 1461 DENVER 10TH AVE 00068FE FEDERAL BLVD
53 1462 DENVER 10TH AVE 00136FE FEDERAL BLVD
54 1463 DENVER 10TH AVE 00200FE FEDERAL BLVD
55 1464 DENVER 10TH AVE 00150FW GALAPAGO ST
56 1466 DENVER 10TH AVE AT GARFIELD ST
57 1468 DENVER 10TH AVE 00020FE GRANT ST
58 1470 DENVER 10TH AVE 00020FW GRANT ST
59 1471 DENVER 10TH AVE 00030FE GRANT ST
60 1473 DENVER 10TH AVE 00045FW GRANT ST
61 1479 DENVER 10TH AVE AT GRANT ST
62 1480 DENVER 10TH AVE AT GRANT ST
63 1481 DENVER 10TH AVE AT GRANT ST
64 1486 DENVER 10TH AVE AT GROVE ST
65 1489 DENVER 10TH AVE AT GROVE ST
66 1490 DENVER 10TH AVE AT GROVE ST
67 1493 DENVER 10TH AVE AT HOOKER ST
68 1495 DENVER 10TH AVE 00163FE HUMBOLDT ST
69 1496 DENVER 10TH AVE AT HUMBOLDT ST
70 1498 DENVER 10TH AVE AT INCA ST
71 1502 DENVER 10TH AVE AT KALAMATH ST
72 1503 DENVER 10TH AVE AT KALAMATH ST
73 1509 DENVER 10TH AVE AT KNOX CT
74 1511 DENVER 10TH AVE AT KNOX CT
75 1512 DENVER 10TH AVE AT KNOX CT
76 1513 DENVER 10TH AVE 00100FE LAFAYETTE ST
77 1517 DENVER 10TH AVE 00020FW LINCOLN ST
78 1 DENVER E 10TH AVE MAIN ST
79 2 DENVER MAIN ST NW 100TH AVE
如果有多个字符串导致问题,那么您可以创建一个列表并将其放入第二个过滤语句中。所以,如果你想删除带有 #50
:
remove.list <- paste(c("#", "BLK"), collapse = '|')
df %>%
filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$")) %>%
filter(!str_detect(loc_01, remove.list))
输出
head()
ID city loc_01 link loc_02
1 1194 DENVER 1010 CURTIS ST 31ST STREET
2 1231 DENVER 1050 ODELL ST E 11TH AVE
3 1298 DENVER 109TH AVE 00020FS QUEENSBURG ST
4 1309 DENVER 10TH AVE 09999FN 10TH AVE 1407 E
5 1310 DENVER 10TH AVE AT 10TH AVE 2900 BLK W
6 1311 DENVER 10TH AVE AT 10TH AVE 300 BLK E
7 1312 DENVER 10TH AVE AT 10TH AVE 3200 BLK W
8 1316 DENVER 10TH AVE AT 1295 W
9 1328 DENVER 10TH AVE AT 2900 W
10 1354 DENVER 10TH AVE AT 500 E
对于 filter
loc_02
,我们可以添加一个额外的过滤语句,以保留以数字开头并以方向结尾的行。
df %>%
filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$")) %>%
filter(!str_detect(loc_01, 'BLK')) %>%
filter(str_detect(loc_02, "^[[:digit:]]+( N| S| E| W| NE| NW| SE| SW)$"))
# Or you could write it like this:
# df %>%
# filter(str_detect(loc_01, "^(\w+)?(\s)?.*(\s)AVE|ST|BLVD(\w+)?$")) %>%
# filter(!str_detect(loc_01, 'BLK')) %>%
# filter(str_detect(loc_02, paste("^\d+(\s)", "(", direction_abbrev, ")","$", sep = "")))
输出
ID city loc_01 link loc_02
1 1316 DENVER 10TH AVE AT 1295 W
2 1328 DENVER 10TH AVE AT 2900 W
3 1354 DENVER 10TH AVE AT 500 E