pyspark 当条件为真时,插入一些带有列变量的单词

pyspark when condition is true, insert some words with column variable

我有一个 csv 描述:

s_table | s_name  | t_cast | t_d  |
aaaaaa  |  juuoo  |  TRUE  |float |
aaaaaa  |  juueo  |  TRUE  |float |
aaaaaa  |  ju4oo  |        |      |
aaaaaa  |  juuoo  |        |      |
aaaaaa  |  thyoo  |        |      |
aaaaaa  |  juioo  |        |      |
aaaaaa  |  rtyoo  |        |      |

我正在尝试使用 pyspark when 条件来检查 t_casts_table 的条件,如果它是 TRUE,则 return 是一个语句在新专栏中。

我试过的是:

filters = filters.withColumn("p3", f.when((f.col("s_table") == "aaaaaa") & (f.col("t_cast").isNull()),f.col("s_name")).
                                     when((f.col("s_table") == "aaaaaa") & (f.col("t_cast") == True),
                                          f"CAST({f.col('s_table')} AS {f.col('t_d')}) AS {f.col('s_table')}"))

我想要实现的是 p3 到 return 列:

s_table | s_name  | t_cast | t_d  |   p_3                                  |
aaaaaa  |  juuoo  |  TRUE  |float | cast ('juuoo' as float) as 'juuoo'     |
aaaaaa  |  juueo  |  TRUE  |float | cast ('juueo' as float) as 'juuoo'     |
aaaaaa  |  ju4oo  |        |      |                             ju4oo      |
aaaaaa  |  juuoo  |        |      |                             juuoo      |
aaaaaa  |  thyoo  |        |      |                             thyoo      |
aaaaaa  |  juioo  |        |      |                             juioo      |
aaaaaa  |  rtyoo  |        |      |                             rtyoo      |

但是我得到的结果是:

CAST(Column<'s_field'> AS Column<'t_data_type'>) AS Column<'s_field'>, 
CAST(Column<'s_field'> AS Column<'t_data_type'>) AS Column<'s_field'>,

我觉得我快到了,但我不太明白。

您需要使用 Spark concat 函数而不是 Python 字符串格式来获取预期的字符串。类似于:

import pyspark.sql.functions as F

filters = filters.withColumn(
    "p3",
    (F.when((F.col("s_table") == "aaaaaa") & (F.col("t_cast").isNull()), F.col("s_name"))
     .when((F.col("s_table") == "aaaaaa") & F.col("t_cast"),
           F.expr(r"concat('CAST(\'', s_name, '\' AS ', t_d, ') AS \'', s_table, '\'')")
           )
     )
)

filters.show(truncate=False)

#+-------+------+------+-----+----------------------------------+
#|s_table|s_name|t_cast|t_d  |p3                                |
#+-------+------+------+-----+----------------------------------+
#|aaaaaa |juuoo |true  |float|CAST('juuoo' AS float) AS 'aaaaaa'|
#|aaaaaa |juueo |true  |float|CAST('juueo' AS float) AS 'aaaaaa'|
#|aaaaaa |ju4oo |null  |null |ju4oo                             |
#|aaaaaa |juuoo |null  |null |juuoo                             |
#|aaaaaa |thyoo |null  |null |thyoo                             |
#|aaaaaa |juioo |null  |null |juioo                             |
#|aaaaaa |rtyoo |null  |null |rtyoo                             |
#+-------+------+------+-----+----------------------------------+