非结构化 Text/Number 合并

Question

我正在尝试匹配 2 个独立数据集中的字段。它们都是地址字段。一个数据集可能包含类似“532 Sheffield Dr”的内容，而另一个可能只包含 "Sheffield Dr"。另一个例子是 "US21 Ramp and Hays RD" 与 "US 21"、"N 25th St and Danville RD" 与“25th St”等。所以基本上，第二个数据集中列中的所有 text/numbers 都应该与第一个数据集中的列匹配，即使第一个数据集中的数据可能包含一些额外的 text/numbers。我一直在尝试使用 RegEx，但未能找到合适的代码。我该怎么做？

Answer 1

根据你的例子和我的理解，最简单的方法是这样的：

s1 = ["532 Sheffield Dr",  "US21 Ramp and Hays RD",  "N 25th St and Danville RD"]
s2 = ["Sheffield Dr",  "US 21", "25th St"]

for item2 in s2:
    for item1 in s1:
        if item2 in item1 or item2.replace(' ', '') in item1:
            print('%s in %s' % (item2, item1))

非结构化 Text/Number 合并

Unstructured Text/Number merge

python

regex

textmatching