我如何将以指定字符开头的字符串传输到sparksql中的另一列

How do i transfer a string that starts with the specified character into another column in sparksql

给定以下数据框:

Row|Address                                                                 |Contact      |
+---+------------------------------------------------------------------------+-------------+
|1  |J. Borja cor. Guillermo St., Cagayan de Oro City 08822 722-922 / 726-667|null         |
|2  |Cruz Taal cor. Apolinar Velez St.,Cagayan de Oro City 08822 725-301     |null         |
|3  |R.N. Abejuela St., Cagayan de Oro City                                  |08822 727-864|

如何将它转换成这样:

Row|Address                                                      |Contact                  |
+---+------------------------------------------------------------------------+-------------+
|1  |J. Borja cor. Guillermo St., Cagayan de Oro City            | 08822 722-922 / 726-667 |
|2  |Cruz Taal cor. Apolinar Velez St.,Cagayan de Oro City       |           08822 725-301 |
|3  |R.N. Abejuela St., Cagayan de Oro City                      |           08822 727-864 |

首先使用正则表达式从Address中提取Contact_tmp,然后从Address中删除它,最后使用coalesce函数合并ContactContact_tmp.

df = df.withColumn('Contact_tmp', F.regexp_extract('Address', '\d+\s+\d+-\d+\s*/*\s*\d*-*\d*', 0)).select(
    'Row',
    F.expr('replace(Address, Contact_tmp, "")').alias('Address'),
    F.coalesce('Contact', 'Contact_tmp').alias('Contact')
)
df.show(truncate=False)