需要在 pyspark 中将列表转换为数据框

Question

我在 Python 中有以下代码，但我需要将其转换为 pyspark，

qm1['c1'] = [x[0] in x[1] for x in zip(qm1['id'], qm1['question'])]
qm1['c1'] = qm1['c1'].astype(str)
qm1a = qm1[(qm1.c1 == 'True')]

这个python代码的输出是

question	key	id	c1
Women	0	omen	True
machine	0	mac	True

有人能帮我解决一下吗，因为我是 Python 的初学者？

Answer 1

这是我的测试（因为你的问题不包含任何内容）

df.show()
+--------+---+----+
|question|key|  id|
+--------+---+----+
|   Women|  0|omen|
| machine|  2| mac|
|     foo|  1| bar|
+--------+---+----+

和我创建预期输出的代码：

from pyspark.sql import functions as F

df = df.withColumn("c1", F.col("question").contains(F.col("id")))
df.show()
+--------+---+----+-----+
|question|key|  id|   c1|
+--------+---+----+-----+
|   Women|  0|omen| true|
| machine|  2| mac| true|
|     foo|  1| bar|false|
+--------+---+----+-----+

然后你可以简单地 filter on c1:

df.where("c1").show()
+--------+---+----+----+
|question|key|  id|  c1|
+--------+---+----+----+
|   Women|  0|omen|true|
| machine|  2| mac|true|
+--------+---+----+----+

需要在 pyspark 中将列表转换为数据框

Need to convert list to dataframe in pyspark

python

zip

list

dataframe

pyspark