我如何将以指定字符开头的字符串传输到sparksql中的另一列
How do i transfer a string that starts with the specified character into another column in sparksql
给定以下数据框:
Row|Address |Contact |
+---+------------------------------------------------------------------------+-------------+
|1 |J. Borja cor. Guillermo St., Cagayan de Oro City 08822 722-922 / 726-667|null |
|2 |Cruz Taal cor. Apolinar Velez St.,Cagayan de Oro City 08822 725-301 |null |
|3 |R.N. Abejuela St., Cagayan de Oro City |08822 727-864|
如何将它转换成这样:
Row|Address |Contact |
+---+------------------------------------------------------------------------+-------------+
|1 |J. Borja cor. Guillermo St., Cagayan de Oro City | 08822 722-922 / 726-667 |
|2 |Cruz Taal cor. Apolinar Velez St.,Cagayan de Oro City | 08822 725-301 |
|3 |R.N. Abejuela St., Cagayan de Oro City | 08822 727-864 |
首先使用正则表达式从Address
中提取Contact_tmp
,然后从Address
中删除它,最后使用coalesce
函数合并Contact
和Contact_tmp
.
df = df.withColumn('Contact_tmp', F.regexp_extract('Address', '\d+\s+\d+-\d+\s*/*\s*\d*-*\d*', 0)).select(
'Row',
F.expr('replace(Address, Contact_tmp, "")').alias('Address'),
F.coalesce('Contact', 'Contact_tmp').alias('Contact')
)
df.show(truncate=False)
给定以下数据框:
Row|Address |Contact |
+---+------------------------------------------------------------------------+-------------+
|1 |J. Borja cor. Guillermo St., Cagayan de Oro City 08822 722-922 / 726-667|null |
|2 |Cruz Taal cor. Apolinar Velez St.,Cagayan de Oro City 08822 725-301 |null |
|3 |R.N. Abejuela St., Cagayan de Oro City |08822 727-864|
如何将它转换成这样:
Row|Address |Contact |
+---+------------------------------------------------------------------------+-------------+
|1 |J. Borja cor. Guillermo St., Cagayan de Oro City | 08822 722-922 / 726-667 |
|2 |Cruz Taal cor. Apolinar Velez St.,Cagayan de Oro City | 08822 725-301 |
|3 |R.N. Abejuela St., Cagayan de Oro City | 08822 727-864 |
首先使用正则表达式从Address
中提取Contact_tmp
,然后从Address
中删除它,最后使用coalesce
函数合并Contact
和Contact_tmp
.
df = df.withColumn('Contact_tmp', F.regexp_extract('Address', '\d+\s+\d+-\d+\s*/*\s*\d*-*\d*', 0)).select(
'Row',
F.expr('replace(Address, Contact_tmp, "")').alias('Address'),
F.coalesce('Contact', 'Contact_tmp').alias('Contact')
)
df.show(truncate=False)