如何减去 Pyspark 数据框中的 2 个字符串列
How to subtract 2 string columns in a Pyspark dataframe
场景如下:
考虑一个具有 2 列的 Pyspark 数据框,如下所示:
{
全名:脸书,
姓氏:书
}
我想要一个新列 firstname 减去 fullname 和 lastname 如下所示
{
firstname:face,
lastname:book
}
df = spark.createDataFrame(
[
('facebook','book')
], ['fullname','lastname'])
df.withColumn('firstname', F.expr("regexp_replace(fullname,lastname,'')")).show()
+--------+--------+---------+
|fullname|lastname|firstname|
+--------+--------+---------+
|facebook| book| face|
+--------+--------+---------+
场景如下: 考虑一个具有 2 列的 Pyspark 数据框,如下所示:
{ 全名:脸书, 姓氏:书 }
我想要一个新列 firstname 减去 fullname 和 lastname 如下所示
{ firstname:face, lastname:book }
df = spark.createDataFrame(
[
('facebook','book')
], ['fullname','lastname'])
df.withColumn('firstname', F.expr("regexp_replace(fullname,lastname,'')")).show()
+--------+--------+---------+
|fullname|lastname|firstname|
+--------+--------+---------+
|facebook| book| face|
+--------+--------+---------+