从另一个 table 没有匹配列的列中提取列的值
Extract value of a column based off column from another table no matching columns
使用 Snowflake
我试图从 table 中基于街道类型 table 的列中提取街道类型,其中唯一的列是 STREET_TYPES
包含类型和缩写类型
STREET_TYPES
------------
ABBAYE
ABE
ACH
AGGLOMERATION
AGL
AIR
AIRE
AIRES
ALL
ALLEE
ALLEES
ANCIEN CHEMIN
...
BD
...
BOULEVARD
...
我的第二个 table 看起来如下:
STREET_LINE_1
-------------
AVENUE ANDRE MORIZET
AVENUE DUQUESNE
AV TREBOIS
RUE HENRI BARBUSSE
AVENUE MARX DORMOY
RUE ANDRE BONNENFANT
AVENUE DU GENERAL RAOUL SALAN
RESIDENCE DU PORT
GRAND BOULEVARD DES RESIDENCES DU PORT
BOULEVARD PIERRE ET MARIE CURIE
CHEMIN DES REGENTS
CHEMIN COMMUNAL CASTEX
...
table 没有任何共同的列。 STREET_LINE_1
将始终以街道类型开头。我需要提取最多两 (2) 个单词。例如,我们可能有 GRAND BOULEVARD
或 GRANDE RUE
或 GRANDE AVENUE
类型。我还需要注意 CHEMIN
和 CHEMIN COMMUNAL
等类型(还有其他实例)
最终,我希望数据表示如下:
STREET_TYPE | STREET_LINE_1
----------------|---------------
AVENUE | VENUE ANDRE MORIZET
AVENUE | DUQUESNE
AV | TREBOIS
RUE | HENRI BARBUSSE
AVENUE MARX | DORMOY
RUE | ANDRE BONNENFANT
AVENUE | DU GENERAL RAOUL SALAN
RESIDENCE | DU PORT
GRAND BOULEVARD | DES RESIDENCES DU PORT
BOULEVARD | PIERRE ET MARIE CURIE
CHEMIN | DES REGENTS
CHEMIN COMMUNAL | CASTEX
...
所以假设它是字符串匹配 STARTSWITH can be used to match the two tables, then the output can be trimmed with SUBSTRING and LENGTH and some handling of NULL for no matches. And then to rank the matches we can use QUALIFY and ROW_NUMBER 的直接开始,粗略的最长街道类型获胜
With street_types as (
SELECT * FROM VALUES
('ABBAYE'),
('ABE'),
('ACH'),
('AGGLOMERATION'),
('AGL'),
('AVENUE'),
('AIRE'),
('AIRES'),
('AV'),
('ALLEE'),
('ALLEES'),
('ANCIEN CHEMIN'),
('BD'),
('BOULEVARD'),
('CHEMIN'),
('CHEMIN COMMUNAL'),
('RESIDENCE'),
('GRAND BOULEVARD'),
('RUE')
), street_line_1 as (
SELECT * FROM VALUES
('AVENUE ANDRE MORIZET'),
('AVENUE DUQUESNE'),
('AV TREBOIS'),
('RUE HENRI BARBUSSE'),
('AVENUE MARX DORMOY'),
('RUE ANDRE BONNENFANT'),
('AVENUE DU GENERAL RAOUL SALAN'),
('RESIDENCE DU PORT'),
('GRAND BOULEVARD DES RESIDENCES DU PORT'),
('BOULEVARD PIERRE ET MARIE CURIE'),
('CHEMIN DES REGENTS'),
('CHEMIN COMMUNAL CASTEX')
)
SELECT
st.column1 as street_type,
substring(sl.column1, zeroifnull(length(st.column1))+1) as street_line_1
FROM street_line_1 as sl
LEFT JOIN street_types as st
ON startswith(sl.column1,st.column1)
QUALIFY row_number()over(partition by sl.column1 order by length(st.column1) desc ) = 1
给出:
STREET_TYPE
STREET_LINE_1
AV
TREBOIS
AVENUE
ANDRE MORIZET
AVENUE
DU GENERAL RAOUL SALAN
AVENUE
DUQUESNE
AVENUE
MARX DORMOY
BOULEVARD
PIERRE ET MARIE CURIE
CHEMIN COMMUNAL
CASTEX
CHEMIN
DES REGENTS
GRAND BOULEVARD
DES RESIDENCES DU PORT
RESIDENCE
DU PORT
RUE
ANDRE BONNENFANT
RUE
HENRI BARBUSSE
使用 Snowflake
我试图从 table 中基于街道类型 table 的列中提取街道类型,其中唯一的列是 STREET_TYPES
包含类型和缩写类型
STREET_TYPES
------------
ABBAYE
ABE
ACH
AGGLOMERATION
AGL
AIR
AIRE
AIRES
ALL
ALLEE
ALLEES
ANCIEN CHEMIN
...
BD
...
BOULEVARD
...
我的第二个 table 看起来如下:
STREET_LINE_1
-------------
AVENUE ANDRE MORIZET
AVENUE DUQUESNE
AV TREBOIS
RUE HENRI BARBUSSE
AVENUE MARX DORMOY
RUE ANDRE BONNENFANT
AVENUE DU GENERAL RAOUL SALAN
RESIDENCE DU PORT
GRAND BOULEVARD DES RESIDENCES DU PORT
BOULEVARD PIERRE ET MARIE CURIE
CHEMIN DES REGENTS
CHEMIN COMMUNAL CASTEX
...
table 没有任何共同的列。 STREET_LINE_1
将始终以街道类型开头。我需要提取最多两 (2) 个单词。例如,我们可能有 GRAND BOULEVARD
或 GRANDE RUE
或 GRANDE AVENUE
类型。我还需要注意 CHEMIN
和 CHEMIN COMMUNAL
等类型(还有其他实例)
最终,我希望数据表示如下:
STREET_TYPE | STREET_LINE_1
----------------|---------------
AVENUE | VENUE ANDRE MORIZET
AVENUE | DUQUESNE
AV | TREBOIS
RUE | HENRI BARBUSSE
AVENUE MARX | DORMOY
RUE | ANDRE BONNENFANT
AVENUE | DU GENERAL RAOUL SALAN
RESIDENCE | DU PORT
GRAND BOULEVARD | DES RESIDENCES DU PORT
BOULEVARD | PIERRE ET MARIE CURIE
CHEMIN | DES REGENTS
CHEMIN COMMUNAL | CASTEX
...
所以假设它是字符串匹配 STARTSWITH can be used to match the two tables, then the output can be trimmed with SUBSTRING and LENGTH and some handling of NULL for no matches. And then to rank the matches we can use QUALIFY and ROW_NUMBER 的直接开始,粗略的最长街道类型获胜
With street_types as (
SELECT * FROM VALUES
('ABBAYE'),
('ABE'),
('ACH'),
('AGGLOMERATION'),
('AGL'),
('AVENUE'),
('AIRE'),
('AIRES'),
('AV'),
('ALLEE'),
('ALLEES'),
('ANCIEN CHEMIN'),
('BD'),
('BOULEVARD'),
('CHEMIN'),
('CHEMIN COMMUNAL'),
('RESIDENCE'),
('GRAND BOULEVARD'),
('RUE')
), street_line_1 as (
SELECT * FROM VALUES
('AVENUE ANDRE MORIZET'),
('AVENUE DUQUESNE'),
('AV TREBOIS'),
('RUE HENRI BARBUSSE'),
('AVENUE MARX DORMOY'),
('RUE ANDRE BONNENFANT'),
('AVENUE DU GENERAL RAOUL SALAN'),
('RESIDENCE DU PORT'),
('GRAND BOULEVARD DES RESIDENCES DU PORT'),
('BOULEVARD PIERRE ET MARIE CURIE'),
('CHEMIN DES REGENTS'),
('CHEMIN COMMUNAL CASTEX')
)
SELECT
st.column1 as street_type,
substring(sl.column1, zeroifnull(length(st.column1))+1) as street_line_1
FROM street_line_1 as sl
LEFT JOIN street_types as st
ON startswith(sl.column1,st.column1)
QUALIFY row_number()over(partition by sl.column1 order by length(st.column1) desc ) = 1
给出:
STREET_TYPE | STREET_LINE_1 |
---|---|
AV | TREBOIS |
AVENUE | ANDRE MORIZET |
AVENUE | DU GENERAL RAOUL SALAN |
AVENUE | DUQUESNE |
AVENUE | MARX DORMOY |
BOULEVARD | PIERRE ET MARIE CURIE |
CHEMIN COMMUNAL | CASTEX |
CHEMIN | DES REGENTS |
GRAND BOULEVARD | DES RESIDENCES DU PORT |
RESIDENCE | DU PORT |
RUE | ANDRE BONNENFANT |
RUE | HENRI BARBUSSE |