如何将数据框列与 pyspark 中的另一个数据框列进行比较?

How to compare dataframe column to another dataframe column inplace in pyspark?

# DataframeA and DataframeB match:
DataframeA:
col: Name "Ali", "Bilal", "Ahsan"

DataframeB:
col: Name "Ali", "Bilal", "Ahsan"

# DataframeC and DataframeD DO NOT match:  
DataframeC:
col: Name "Ali", "Ahsan", "Bilal"

DataframeD:
col: Name "Ali", "Bilal", "Ahsan"

我想就地匹配列值,如有任何帮助,我们将不胜感激。

使用以下 Scala 代码作为参考并将其翻译成 python。根据您的 dataframe 姓名更新 val check 行。

    scala> val w = Window.orderBy(lit(1))
    scala> val check  = dfA.withColumn("rn", row_number.over(w)).alias("A").join(dfB.withColumn("rn", row_number.over(w)).alias("B"), List("rn"),"left").withColumn("check", when(col("A.name") === col("B.name"), lit("match")).otherwise(lit("not match"))).select("check").distinct.count

    scala> if (check == 1){
     | println("matched")} else (println("not matched"))

在python中使用set进行比较。

DataframeC.columns
-> ["Ali", "Ahsan", "Bilal"]
DataframeD.columns
-> ["Ali", "Bilal", "Ahsan"]

DataframeC.columns == DataframeD.columns
-> False

set(DataframeC.columns) == set(DataframeD.columns)
-> True