需要在 pyspark 中将列表转换为数据框
Need to convert list to dataframe in pyspark
我在 Python 中有以下代码,但我需要将其转换为 pyspark,
qm1['c1'] = [x[0] in x[1] for x in zip(qm1['id'], qm1['question'])]
qm1['c1'] = qm1['c1'].astype(str)
qm1a = qm1[(qm1.c1 == 'True')]
这个python代码的输出是
question
key
id
c1
Women
0
omen
True
machine
0
mac
True
有人能帮我解决一下吗,因为我是 Python 的初学者?
这是我的测试(因为你的问题不包含任何内容)
df.show()
+--------+---+----+
|question|key| id|
+--------+---+----+
| Women| 0|omen|
| machine| 2| mac|
| foo| 1| bar|
+--------+---+----+
和我创建预期输出的代码:
from pyspark.sql import functions as F
df = df.withColumn("c1", F.col("question").contains(F.col("id")))
df.show()
+--------+---+----+-----+
|question|key| id| c1|
+--------+---+----+-----+
| Women| 0|omen| true|
| machine| 2| mac| true|
| foo| 1| bar|false|
+--------+---+----+-----+
然后你可以简单地 filter
on c1:
df.where("c1").show()
+--------+---+----+----+
|question|key| id| c1|
+--------+---+----+----+
| Women| 0|omen|true|
| machine| 2| mac|true|
+--------+---+----+----+
我在 Python 中有以下代码,但我需要将其转换为 pyspark,
qm1['c1'] = [x[0] in x[1] for x in zip(qm1['id'], qm1['question'])]
qm1['c1'] = qm1['c1'].astype(str)
qm1a = qm1[(qm1.c1 == 'True')]
这个python代码的输出是
question | key | id | c1 |
---|---|---|---|
Women | 0 | omen | True |
machine | 0 | mac | True |
有人能帮我解决一下吗,因为我是 Python 的初学者?
这是我的测试(因为你的问题不包含任何内容)
df.show()
+--------+---+----+
|question|key| id|
+--------+---+----+
| Women| 0|omen|
| machine| 2| mac|
| foo| 1| bar|
+--------+---+----+
和我创建预期输出的代码:
from pyspark.sql import functions as F
df = df.withColumn("c1", F.col("question").contains(F.col("id")))
df.show()
+--------+---+----+-----+
|question|key| id| c1|
+--------+---+----+-----+
| Women| 0|omen| true|
| machine| 2| mac| true|
| foo| 1| bar|false|
+--------+---+----+-----+
然后你可以简单地 filter
on c1:
df.where("c1").show()
+--------+---+----+----+
|question|key| id| c1|
+--------+---+----+----+
| Women| 0|omen|true|
| machine| 2| mac|true|
+--------+---+----+----+