重新排序地址的元素

Reordering the elements of an address

我有一个旧的地址客户端数据库 (.csv)。最大的问题是它们不一致,当我把它分开时,市政当局要么在地区,要么在城市等等…… 示例:

(header) Country, Municipality, City, Detailed address(street name, number, floor, ap.)

**(proper) Count.xxxxxx, Mun.xxxxx, City.xxxx**

(case 1) Count.xxxxxx, City.xxxx, Mun.xxxxx

(case 2) Count.xxxxxx, City.xxxx, -Mun.xxxxx

(case 3) City.xxxx, Count.xxxxxx, Mun.xxxxx

(case 4) Mun.xxxxx, City.xxxx, Count.xxxxxx

(case 5) Mun.xxxxx, Count.xxxxxx, City.xxxx 

"xxxx" = 各种名称,还包含数字、空格和 ".

我尝试按照以下格式对它们全部重新排序: Count.,Mun.,City. 但我看到和尝试的一切更像是排序和过滤

我需要帮助重新排序,以便数据库保持一致并且所有数据都在适当的列中。


更复杂的示例:

Country,Area,Municipality,City,详细地址street/boulevard number entrance floor ap.号码(详细地址如 Boul. Bulgaria 100 entr.A fl.4 ap.256)

正如您想象的那样,并非所有字段都已填写,有时字段没有用“,”分隔(但这是我将不得不忍受的问题...无法遍历 65k 行...)

Count.xxxxx, Area.xx xxx, Munic.xxxxx, Cit.xxxxx, Addr.xxxxx

Area.xxxxx, Munic.xxxxx, Cit.xxxxx, Addr.xxxxx Munic.xxxxx, Cit.xxxxx,
Addr.xx xxx, Count.xxxxx Count.xxxxx, Munic.xxxxx, Cit.xxxxx, Addr.xxxxx
Munic.xxxxx, Vill.xxxxx Area.xxxxx, Addr.xxxxx Munic.xxxxx, Cit.xxxxx
Cit.xxxxx, Munic.xx xxx, Addr.xxx xx

另一件事是它可以是城市或村庄(ct.vill.)

听起来你只需要从每一行中抓取县、市、直辖市即可。您可以通过使用 grep 来获取正确的行元素来做到这一点:

data.frame(County = apply(dat, 1, grep, pattern="Count\.", value=TRUE),
           City = apply(dat, 1, grep, pattern="City\.", value=TRUE),
           Mun = apply(dat, 1, grep, pattern="Mun\.", value=TRUE))
#     County   City   Mun
# 1  Count.1 City.1 Mun.4
# 2  Count.3 City.2 Mun.7
# 3  Count.2 City.5 Mun.8
# 4  Count.2 City.2 Mun.1
# 5 Count.10 City.2 Mun.6
# 6  Count.1 City.1 Mun.4

数据:

(dat = data.frame(A=c("Count.1", "Count.3", "City.5", "City.2", "Mun.6", "Mun.4"),
                  B=c("City.1", "Mun.7", "Count.2", "Mun.1", "Count.10", "City.1"),
                  C=c("Mun.4", "City.2", "Mun.8", "Count.2", "City.2", "Count.1"),
                  stringsAsFactors=FALSE))
#         A        B       C
# 1 Count.1   City.1   Mun.4
# 2 Count.3    Mun.7  City.2
# 3  City.5  Count.2   Mun.8
# 4  City.2    Mun.1 Count.2
# 5   Mun.6 Count.10  City.2
# 6   Mun.4   City.1 Count.1