PostgreSQL 加入相似地址
PostgreSQL join with similar address
我正在尝试合并来自不同来源的数据。唯一要加入的公共字段是地址。在 table 1 中,地址在街道和州之间有额外的数据(代表社区)。有没有办法使用最相似的地址加入这些 tables?我有 85,000 个地址,因此无法使用 LIKE 和通配符进行手动搜索。
Table 1:
“239 Dudley St Dudley Square Roxbury MA 02119”
“539 Dudley St Dudley Square Roxbury MA 02119”
Table 2:
“239 Dudley St Roxbury MA 02119”
“539 Dudley St Roxbury MA 02119”
我有两个建议:
1) "All words in the table 2 address are present in the table 1 address":
select *
from t1 join
t2 on (string_to_array(t2.address,' ') <@ string_to_array(t1.address,' '));
2) "For each table 1 address find the most similar address from the table 2":
select distinct on(t1.address) *
from t1 cross join t2
order by t1.address, similarity(t1.address, t2.address) desc;
我正在尝试合并来自不同来源的数据。唯一要加入的公共字段是地址。在 table 1 中,地址在街道和州之间有额外的数据(代表社区)。有没有办法使用最相似的地址加入这些 tables?我有 85,000 个地址,因此无法使用 LIKE 和通配符进行手动搜索。
Table 1:
“239 Dudley St Dudley Square Roxbury MA 02119”
“539 Dudley St Dudley Square Roxbury MA 02119”
Table 2:
“239 Dudley St Roxbury MA 02119”
“539 Dudley St Roxbury MA 02119”
我有两个建议:
1) "All words in the table 2 address are present in the table 1 address":
select *
from t1 join
t2 on (string_to_array(t2.address,' ') <@ string_to_array(t1.address,' '));
2) "For each table 1 address find the most similar address from the table 2":
select distinct on(t1.address) *
from t1 cross join t2
order by t1.address, similarity(t1.address, t2.address) desc;