1 步 regexp_replace 而不是第二部分的两步，替换逗号分隔符之间多个单词任一侧的空格

Question

这是 “1 步 regexp_replace 而不是两步” 的后续，因为我没有提供足够的样本数据。 @hatless 为我提供了删除 , 之间的 space 的解决方案，@lemon 还建议我提供更多数据

spaces 应该从定界符之间的单词的任一侧删除。 “纽约”应该是“纽约” 单词两侧可能有 spaces，应该删除但现在在单词之间。逗号分隔字符串可以有任意数量的分隔符，最多 8 个逗号。

我可以对“、”和“,”进行多次嵌套替换，这适用于大多数情况，除非逗号前后有多个 space。可以用一个 regexp_replace 完成还是需要多个？

"RESULT BEFORE"
"university of washington, seattle, washington"
"university of washington, seattle , washington"
"university of washington, , washington"
"university of washington, seattle, washington"
"university of new york,ny , usa"
"university of new york,new york , usa"
"university of new york, new york , usa"

with t1 as 
(
select           1 id,"university of washington,           seattle, washington" location
union all select 2 id,"university of washington, seattle  , washington"
union all select 3 id,"university of washington,      , washington"
union all select 4 id,"university of washington, seattle            , washington"
union all select 5 id,"university of new york,ny  , usa"
union all select 6 id,"university of new york,new york  , usa"
union all select 7 id,"university of new york, new york  , usa"
)
select id,REGEXP_REPLACE(lower(location),r'([^,]+,)[, ]+', r'') location
from t1
order by 1;

"DESIRED RESULT"
"university of washington,seattle,washington"
"university of washington,seattle,washington"
"university of washington,washington"
"university of washington,seattle,washington"
"university of new york,ny,usa"
"university of new york,new york,usa"
"university of new york,new york,usa"

实际结果

id	location
1	university of washington,seattle,washington
2	university of washington,seattle ,washington
3	university of washington,washington
4	university of washington,seattle ,washington
5	university of new york,ny ,usa
6	university of new york,new york ,usa
7	university of new york,new york ,usa

Answer 1

试试这个版本：

SELECT id, REGEXP_REPLACE(LOWER(location),
                          r'([^,]+)\s*,\s*([^,]*?)\s*,\s*(.*?)\s*',
                          r',,') AS location
FROM t1
ORDER BY 1;

这是一个工作 demo 显示正则表达式替换逻辑正在工作。

Answer 2

考虑以下

select id,
  regexp_replace(trim(location), r'\s*,\s*', ',') as location,
from t1
order by 1

如果应用于您问题中的示例数据 - 输出为

1 步 regexp_replace 而不是第二部分的两步，替换逗号分隔符之间多个单词任一侧的空格

1 step regexp_replace instead of two steps part ii, replace spaces either side of multiple words between comma delimiters

regex

sql

google-bigquery