pyspark 当条件为真时,插入一些带有列变量的单词
pyspark when condition is true, insert some words with column variable
我有一个 csv
描述:
s_table | s_name | t_cast | t_d |
aaaaaa | juuoo | TRUE |float |
aaaaaa | juueo | TRUE |float |
aaaaaa | ju4oo | | |
aaaaaa | juuoo | | |
aaaaaa | thyoo | | |
aaaaaa | juioo | | |
aaaaaa | rtyoo | | |
我正在尝试使用 pyspark when
条件来检查 t_cast
和 s_table
的条件,如果它是 TRUE
,则 return 是一个语句在新专栏中。
我试过的是:
filters = filters.withColumn("p3", f.when((f.col("s_table") == "aaaaaa") & (f.col("t_cast").isNull()),f.col("s_name")).
when((f.col("s_table") == "aaaaaa") & (f.col("t_cast") == True),
f"CAST({f.col('s_table')} AS {f.col('t_d')}) AS {f.col('s_table')}"))
我想要实现的是 p3
到 return 列:
s_table | s_name | t_cast | t_d | p_3 |
aaaaaa | juuoo | TRUE |float | cast ('juuoo' as float) as 'juuoo' |
aaaaaa | juueo | TRUE |float | cast ('juueo' as float) as 'juuoo' |
aaaaaa | ju4oo | | | ju4oo |
aaaaaa | juuoo | | | juuoo |
aaaaaa | thyoo | | | thyoo |
aaaaaa | juioo | | | juioo |
aaaaaa | rtyoo | | | rtyoo |
但是我得到的结果是:
CAST(Column<'s_field'> AS Column<'t_data_type'>) AS Column<'s_field'>,
CAST(Column<'s_field'> AS Column<'t_data_type'>) AS Column<'s_field'>,
我觉得我快到了,但我不太明白。
您需要使用 Spark concat
函数而不是 Python 字符串格式来获取预期的字符串。类似于:
import pyspark.sql.functions as F
filters = filters.withColumn(
"p3",
(F.when((F.col("s_table") == "aaaaaa") & (F.col("t_cast").isNull()), F.col("s_name"))
.when((F.col("s_table") == "aaaaaa") & F.col("t_cast"),
F.expr(r"concat('CAST(\'', s_name, '\' AS ', t_d, ') AS \'', s_table, '\'')")
)
)
)
filters.show(truncate=False)
#+-------+------+------+-----+----------------------------------+
#|s_table|s_name|t_cast|t_d |p3 |
#+-------+------+------+-----+----------------------------------+
#|aaaaaa |juuoo |true |float|CAST('juuoo' AS float) AS 'aaaaaa'|
#|aaaaaa |juueo |true |float|CAST('juueo' AS float) AS 'aaaaaa'|
#|aaaaaa |ju4oo |null |null |ju4oo |
#|aaaaaa |juuoo |null |null |juuoo |
#|aaaaaa |thyoo |null |null |thyoo |
#|aaaaaa |juioo |null |null |juioo |
#|aaaaaa |rtyoo |null |null |rtyoo |
#+-------+------+------+-----+----------------------------------+
我有一个 csv
描述:
s_table | s_name | t_cast | t_d |
aaaaaa | juuoo | TRUE |float |
aaaaaa | juueo | TRUE |float |
aaaaaa | ju4oo | | |
aaaaaa | juuoo | | |
aaaaaa | thyoo | | |
aaaaaa | juioo | | |
aaaaaa | rtyoo | | |
我正在尝试使用 pyspark when
条件来检查 t_cast
和 s_table
的条件,如果它是 TRUE
,则 return 是一个语句在新专栏中。
我试过的是:
filters = filters.withColumn("p3", f.when((f.col("s_table") == "aaaaaa") & (f.col("t_cast").isNull()),f.col("s_name")).
when((f.col("s_table") == "aaaaaa") & (f.col("t_cast") == True),
f"CAST({f.col('s_table')} AS {f.col('t_d')}) AS {f.col('s_table')}"))
我想要实现的是 p3
到 return 列:
s_table | s_name | t_cast | t_d | p_3 |
aaaaaa | juuoo | TRUE |float | cast ('juuoo' as float) as 'juuoo' |
aaaaaa | juueo | TRUE |float | cast ('juueo' as float) as 'juuoo' |
aaaaaa | ju4oo | | | ju4oo |
aaaaaa | juuoo | | | juuoo |
aaaaaa | thyoo | | | thyoo |
aaaaaa | juioo | | | juioo |
aaaaaa | rtyoo | | | rtyoo |
但是我得到的结果是:
CAST(Column<'s_field'> AS Column<'t_data_type'>) AS Column<'s_field'>,
CAST(Column<'s_field'> AS Column<'t_data_type'>) AS Column<'s_field'>,
我觉得我快到了,但我不太明白。
您需要使用 Spark concat
函数而不是 Python 字符串格式来获取预期的字符串。类似于:
import pyspark.sql.functions as F
filters = filters.withColumn(
"p3",
(F.when((F.col("s_table") == "aaaaaa") & (F.col("t_cast").isNull()), F.col("s_name"))
.when((F.col("s_table") == "aaaaaa") & F.col("t_cast"),
F.expr(r"concat('CAST(\'', s_name, '\' AS ', t_d, ') AS \'', s_table, '\'')")
)
)
)
filters.show(truncate=False)
#+-------+------+------+-----+----------------------------------+
#|s_table|s_name|t_cast|t_d |p3 |
#+-------+------+------+-----+----------------------------------+
#|aaaaaa |juuoo |true |float|CAST('juuoo' AS float) AS 'aaaaaa'|
#|aaaaaa |juueo |true |float|CAST('juueo' AS float) AS 'aaaaaa'|
#|aaaaaa |ju4oo |null |null |ju4oo |
#|aaaaaa |juuoo |null |null |juuoo |
#|aaaaaa |thyoo |null |null |thyoo |
#|aaaaaa |juioo |null |null |juioo |
#|aaaaaa |rtyoo |null |null |rtyoo |
#+-------+------+------+-----+----------------------------------+