使用 Pyspark 的 if/else 语句的 "if" 部分中的两个条件

Question

如果满足这两个条件，我需要中断程序并抛出下面的异常，否则让程序继续。这在仅使用第一个条件时工作正常，但在同时使用这两个条件时会产生错误。如果 DF 不为零且 DF.col1 的 value 不是 'string.' 是否有任何提示可以使此工作正常进行，则以下代码应抛出异常？

if (DF.count() > 0) & (DF.col1 != 'string'): 
  raise Exception("!!!COUNT IS NON-ZERO, SO ADJUSTMENT IS NEEDED!!!")
else: 
  pass

这会引发错误：

" Py4JError: An error occurred while calling o678.and. Trace: 
py4j.Py4JException: Method and([class java.lang.Integer]) does not exist "

一些示例数据：

from pyspark.sql.types import StructType,StructField, StringType, IntegerType

data2 = [("not_string","test")]

schema = StructType([ \
    StructField("col1",StringType(),True), \
    StructField("col2",StringType(),True) \
  ])
 
DF = spark.createDataFrame(data=data2,schema=schema)
DF.printSchema()
DF.show(truncate=False)

Answer 1

在Python中，&运算符是位运算符，作用于位元，进行逐位运算。对于条件中的“和”逻辑，您必须使用 and:

if (DF.count() > 0) and (DF.col1 != 'string'): 
  raise Exception("!!!COUNT IS NON-ZERO, SO ADJUSTMENT IS NEEDED!!!")
else: 
  pass

Answer 2

如果您的数据框中有任何行 col1 的值不等于 'string'.

，您希望引发异常

IIUC

您可以使用过滤器和计数来完成此操作。如果有任何行不等于值 'string'，计数将大于 0，计算结果为 True 引发异常。

data2 = [("not_string","test")]

schema = StructType([ \
    StructField("col1",StringType(),True), \
    StructField("col2",StringType(),True) \
  ])
 
DF = spark.createDataFrame(data=data2,schema=schema)

if DF.filter(DF.col1 != 'string').count():
    raise Exception("!!!COUNT IS NON-ZERO, SO ADJUSTMENT IS NEEDED!!!")

Exception: !!!COUNT IS NON-ZERO, SO ADJUSTMENT IS NEEDED!!!

使用 Pyspark 的 if/else 语句的 "if" 部分中的两个条件

Two conditions in "if" part of if/else statement using Pyspark

python

if-statement

apache-spark-sql

pyspark