如何从 Python 循环创建 PySpark DataFrame
How to create a PySpark DataFrame from a Python loop
我正在遍历多个工作正常的网络服务
customers= json.loads(GetCustomers())
for o in customers["result"]:
if o["customerId"] is not None:
custRoles = GetCustomersRoles(o["customerId"])
custRolesObj = json.loads(custRoles)
if custRolesObj["result"] is not None:
for l in custRolesObj["result"]:
print str(l["custId"]) + ", " + str(o["salesAmount"])
这行得通,我的打印输出也是正确的。但是,现在我需要从中创建一个 DataFrame。我读了,我们不能"create a DataFrame with two columns and add row by row while looping"。
但是我该如何解决这个问题?
更新
我希望这是创建列表的正确方法?
customers= json.loads(GetCustomers())
result = []
for o in customers["result"]:
if o["customerId"] is not None:
custRoles = GetCustomersRoles(o["customerId"])
custRolesObj = json.loads(custRoles)
if custRolesObj["result"] is not None:
for l in custRolesObj["result"]:
result.append(make_opportunity(str(l["customerId"]), str(o["salesAmount"])))
如果这是正确的,如何从中创建一个 Dataframe?
我用下面的代码解决了我的问题
customers= json.loads(GetCustomers())
result = []
for o in customers["result"]:
if o["customerId"] is not None:
custRoles = GetCustomersRoles(o["customerId"])
custRolesObj = json.loads(custRoles)
if custRolesObj["result"] is not None:
for l in custRolesObj["result"]:
result.append([str(l["customerId"]), str(o["salesAmount"])])
from pyspark.sql import *
df = spark.createDataFrame(result,['customerId', 'salesAmount'])
我正在遍历多个工作正常的网络服务
customers= json.loads(GetCustomers())
for o in customers["result"]:
if o["customerId"] is not None:
custRoles = GetCustomersRoles(o["customerId"])
custRolesObj = json.loads(custRoles)
if custRolesObj["result"] is not None:
for l in custRolesObj["result"]:
print str(l["custId"]) + ", " + str(o["salesAmount"])
这行得通,我的打印输出也是正确的。但是,现在我需要从中创建一个 DataFrame。我读了,我们不能"create a DataFrame with two columns and add row by row while looping"。
但是我该如何解决这个问题?
更新
我希望这是创建列表的正确方法?
customers= json.loads(GetCustomers())
result = []
for o in customers["result"]:
if o["customerId"] is not None:
custRoles = GetCustomersRoles(o["customerId"])
custRolesObj = json.loads(custRoles)
if custRolesObj["result"] is not None:
for l in custRolesObj["result"]:
result.append(make_opportunity(str(l["customerId"]), str(o["salesAmount"])))
如果这是正确的,如何从中创建一个 Dataframe?
我用下面的代码解决了我的问题
customers= json.loads(GetCustomers())
result = []
for o in customers["result"]:
if o["customerId"] is not None:
custRoles = GetCustomersRoles(o["customerId"])
custRolesObj = json.loads(custRoles)
if custRolesObj["result"] is not None:
for l in custRolesObj["result"]:
result.append([str(l["customerId"]), str(o["salesAmount"])])
from pyspark.sql import *
df = spark.createDataFrame(result,['customerId', 'salesAmount'])