从另一个 table 没有匹配列的列中提取列的值

Extract value of a column based off column from another table no matching columns

使用 Snowflake 我试图从 table 中基于街道类型 table 的列中提取街道类型,其中唯一的列是 STREET_TYPES包含类型和缩写类型

STREET_TYPES
------------
ABBAYE
ABE
ACH
AGGLOMERATION
AGL
AIR
AIRE
AIRES
ALL
ALLEE
ALLEES
ANCIEN CHEMIN
...
BD
...
BOULEVARD
...

我的第二个 table 看起来如下:

STREET_LINE_1
-------------
AVENUE ANDRE MORIZET
AVENUE DUQUESNE
AV TREBOIS
RUE HENRI BARBUSSE
AVENUE MARX DORMOY
RUE ANDRE BONNENFANT
AVENUE DU GENERAL RAOUL SALAN
RESIDENCE DU PORT
GRAND BOULEVARD DES RESIDENCES DU PORT
BOULEVARD PIERRE ET MARIE CURIE
CHEMIN DES REGENTS
CHEMIN COMMUNAL CASTEX 

...

table 没有任何共同的列。 STREET_LINE_1 将始终以街道类型开头。我需要提取最多两 (2) 个单词。例如,我们可能有 GRAND BOULEVARDGRANDE RUEGRANDE AVENUE 类型。我还需要注意 CHEMINCHEMIN COMMUNAL 等类型(还有其他实例)

最终,我希望数据表示如下:

STREET_TYPE     | STREET_LINE_1
----------------|---------------
AVENUE          | VENUE ANDRE MORIZET
AVENUE          | DUQUESNE
AV              | TREBOIS
RUE             | HENRI BARBUSSE
AVENUE MARX     | DORMOY
RUE             | ANDRE BONNENFANT
AVENUE          | DU GENERAL RAOUL SALAN
RESIDENCE       | DU PORT
GRAND BOULEVARD | DES RESIDENCES DU PORT
BOULEVARD       | PIERRE ET MARIE CURIE
CHEMIN          | DES REGENTS
CHEMIN COMMUNAL | CASTEX 
...

所以假设它是字符串匹配 STARTSWITH can be used to match the two tables, then the output can be trimmed with SUBSTRING and LENGTH and some handling of NULL for no matches. And then to rank the matches we can use QUALIFY and ROW_NUMBER 的直接开始,粗略的最长街道类型获胜

With street_types as (
    SELECT * FROM VALUES
        ('ABBAYE'),
        ('ABE'),
        ('ACH'),
        ('AGGLOMERATION'),
        ('AGL'),
    ('AVENUE'),
        ('AIRE'),
        ('AIRES'),
    ('AV'),
        ('ALLEE'),
        ('ALLEES'),
        ('ANCIEN CHEMIN'),
        ('BD'),
        ('BOULEVARD'),
    ('CHEMIN'),
    ('CHEMIN COMMUNAL'),
    ('RESIDENCE'),
    ('GRAND BOULEVARD'),
    ('RUE')
    
), street_line_1 as (
     SELECT * FROM VALUES    
        ('AVENUE ANDRE MORIZET'),
        ('AVENUE DUQUESNE'),
        ('AV TREBOIS'),
        ('RUE HENRI BARBUSSE'),
        ('AVENUE MARX DORMOY'),
        ('RUE ANDRE BONNENFANT'),
        ('AVENUE DU GENERAL RAOUL SALAN'),
        ('RESIDENCE DU PORT'),
        ('GRAND BOULEVARD DES RESIDENCES DU PORT'),
        ('BOULEVARD PIERRE ET MARIE CURIE'),
        ('CHEMIN DES REGENTS'),
        ('CHEMIN COMMUNAL CASTEX')
)
SELECT 
    st.column1 as street_type, 
    substring(sl.column1, zeroifnull(length(st.column1))+1)  as street_line_1
FROM street_line_1 as sl
LEFT JOIN street_types as st
    ON startswith(sl.column1,st.column1)
QUALIFY row_number()over(partition by sl.column1 order by length(st.column1) desc ) = 1

给出:

STREET_TYPE STREET_LINE_1
AV TREBOIS
AVENUE ANDRE MORIZET
AVENUE DU GENERAL RAOUL SALAN
AVENUE DUQUESNE
AVENUE MARX DORMOY
BOULEVARD PIERRE ET MARIE CURIE
CHEMIN COMMUNAL CASTEX
CHEMIN DES REGENTS
GRAND BOULEVARD DES RESIDENCES DU PORT
RESIDENCE DU PORT
RUE ANDRE BONNENFANT
RUE HENRI BARBUSSE