使用 BigQuery SQL 查找字符串的一部分并在分隔符之间提取数据
Finding part of string and extracting data between delimiter using BigQuery SQL
我有这样一个专栏:
String_to_Extract
A~S1_B~S2_C~S11
A~S1_B~S3_C~S12
C~S13_A~S11_B~S4
“~”前的部分应该是列名。 “~”之后的部分应该是行值。这是由 "_" 分隔的。因此,结果应如下所示:
String_to_Extract
A
B
C
A~S1_B~S2_C~S11
S1
S2
S11
A~S1_B~S3_C~S12
S1
S3
S12
C~S13_A~S11_B~S4
S11
S4
S13
这是我的方法:
SELECT
String_to_Extract,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "A~")+2, ?) AS A,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "B~")+2, ?) AS B,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "C~")+2, ?) AS C,
From Table
如何获取每一列的 ~ 和下一个 _ 之间的部分?
很乐意提供帮助!
一种方法使用 REGEXP_EXTRACT
:
SELECT
REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)A~([^_]+)") AS A,
REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)B~([^_]+)") AS B,
REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)C~([^~]+)") AS C
FROM yourTable;
考虑以下方法 (BigQuery)
select * from (
select String_to_Extract, col_val[offset(0)] as col, col_val[offset(1)] as val
from your_table, unnest(split(String_to_Extract, '_')) kv,
unnest([struct(split(kv, '~') as col_val)])
)
pivot (any_value(val) for col in ('A', 'B', 'C'))
如果应用于您问题中的示例数据 - 输出为
您也可以使用这种方法,先对拆分后的项目排序,然后选择值:
select
split(ordered[safe_offset(0)], '~')[safe_offset(1)] as A,
split(ordered[safe_offset(1)], '~')[safe_offset(1)] as B,
split(ordered[safe_offset(2)], '~')[safe_offset(1)] as C
from (
select
array(select _ from unnest(split(Advertiser, '_') ) as _ order by 1) as ordered
from dataset.table
)
我有这样一个专栏:
String_to_Extract |
---|
A~S1_B~S2_C~S11 |
A~S1_B~S3_C~S12 |
C~S13_A~S11_B~S4 |
“~”前的部分应该是列名。 “~”之后的部分应该是行值。这是由 "_" 分隔的。因此,结果应如下所示:
String_to_Extract | A | B | C |
---|---|---|---|
A~S1_B~S2_C~S11 | S1 | S2 | S11 |
A~S1_B~S3_C~S12 | S1 | S3 | S12 |
C~S13_A~S11_B~S4 | S11 | S4 | S13 |
这是我的方法:
SELECT
String_to_Extract,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "A~")+2, ?) AS A,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "B~")+2, ?) AS B,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "C~")+2, ?) AS C,
From Table
如何获取每一列的 ~ 和下一个 _ 之间的部分?
很乐意提供帮助!
一种方法使用 REGEXP_EXTRACT
:
SELECT
REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)A~([^_]+)") AS A,
REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)B~([^_]+)") AS B,
REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)C~([^~]+)") AS C
FROM yourTable;
考虑以下方法 (BigQuery)
select * from (
select String_to_Extract, col_val[offset(0)] as col, col_val[offset(1)] as val
from your_table, unnest(split(String_to_Extract, '_')) kv,
unnest([struct(split(kv, '~') as col_val)])
)
pivot (any_value(val) for col in ('A', 'B', 'C'))
如果应用于您问题中的示例数据 - 输出为
您也可以使用这种方法,先对拆分后的项目排序,然后选择值:
select
split(ordered[safe_offset(0)], '~')[safe_offset(1)] as A,
split(ordered[safe_offset(1)], '~')[safe_offset(1)] as B,
split(ordered[safe_offset(2)], '~')[safe_offset(1)] as C
from (
select
array(select _ from unnest(split(Advertiser, '_') ) as _ order by 1) as ordered
from dataset.table
)