PYSPARK:如何将具有多个 case 语句的 SQL 查询转换为 Pyspark/Pyspark-SQL?
PYSPARK : How to covert SQL query with multiple case statements to Pyspark/Pyspark-SQL?
我有两组包含多个 case 语句的查询。我需要在 pyspark 中实现相同的逻辑。我试过了,但我在多次遇到困难时遇到了一些困难。任何帮助将不胜感激。
第一个查询
case
when appointment_date is null
then 0
when resolution_desc in (
'CSTXCL - OK BY PHONE'
)
or resolution_des ilike '%NO VAN ROLL%'
then 0
when status in ('PENDING','CANCELLED')
then 0
when ticket_type = 'install'
and appointment_required is true
end as truck_roll
第二个查询
case when status = 'COMPLETED' and resolution not in ('CANCELLING ORDER','CANCEL ORDER')
then 1 else 0 end as completed,
case when status = 'CANCELLED' or ( status in ('COMPLETED','PENDING' ) and resolution_desc in ('CANCELLING ORDER','CANCEL ORDER') ) then 1 else 0 end as cancelled.
我尝试了下面的代码进行第二次查询,但没有用:
sparkdf.withColumn('completed', f.when((sparkdf.ticket_status =='COMPLETED') & (~sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))\
.withColumn('cancelled', f.when((sparkdf.ticket_status == 'CANCELLED') | (sparkdf.ticket_status.isin('COMPLETED','PENDING')) & (sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))
您可以使用“expr”函数来执行 SQL 代码(在本例中使用三引号,因为它是多行代码):
from pyspark.sql.functions import expr
sparkdf.withColumn(
'completed',
expr('''
CASE WHEN status = 'COMPLETED'
AND resolution NOT IN ('CANCELLING ORDER',
'CANCEL ORDER') THEN 1
ELSE 0
END
'''
)
)
当然,您可以对“已取消”列执行相同的操作
我有两组包含多个 case 语句的查询。我需要在 pyspark 中实现相同的逻辑。我试过了,但我在多次遇到困难时遇到了一些困难。任何帮助将不胜感激。
第一个查询
case
when appointment_date is null
then 0
when resolution_desc in (
'CSTXCL - OK BY PHONE'
)
or resolution_des ilike '%NO VAN ROLL%'
then 0
when status in ('PENDING','CANCELLED')
then 0
when ticket_type = 'install'
and appointment_required is true
end as truck_roll
第二个查询
case when status = 'COMPLETED' and resolution not in ('CANCELLING ORDER','CANCEL ORDER')
then 1 else 0 end as completed,
case when status = 'CANCELLED' or ( status in ('COMPLETED','PENDING' ) and resolution_desc in ('CANCELLING ORDER','CANCEL ORDER') ) then 1 else 0 end as cancelled.
我尝试了下面的代码进行第二次查询,但没有用:
sparkdf.withColumn('completed', f.when((sparkdf.ticket_status =='COMPLETED') & (~sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))\
.withColumn('cancelled', f.when((sparkdf.ticket_status == 'CANCELLED') | (sparkdf.ticket_status.isin('COMPLETED','PENDING')) & (sparkdf.resolution_description.isin('CANCELLING ORDER','CANCEL ORDER','CLOSE SRO')),1).otherwise(0))
您可以使用“expr”函数来执行 SQL 代码(在本例中使用三引号,因为它是多行代码):
from pyspark.sql.functions import expr
sparkdf.withColumn(
'completed',
expr('''
CASE WHEN status = 'COMPLETED'
AND resolution NOT IN ('CANCELLING ORDER',
'CANCEL ORDER') THEN 1
ELSE 0
END
'''
)
)
当然,您可以对“已取消”列执行相同的操作