将列表列表转换为 pyspark 数据框?

Convert list of lists to pyspark dataframe?

无法将以下列表转换为 pyspark 数据框。

lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]

cols = ['col1', 'col2', 'col3']

Desired output:

    +----------+----------+----------+ 
    | col1     | col2     | col3     |
    +----------+----------+----------+ 
    | 1        | A        | aa       |
    +----------+----------+----------+ 
    | 2        | B        | bb       |
    +----------+----------+----------+ 
    | 3        | C        | cc       |
    +----------+----------+----------+ 

我主要是在寻找 pandas 等价于:

df = pd.DataFrame(data=lst,columns=cols)

如果您安装了 pandas 包,则可以使用 spark.createDataFrame

将数据框导入 pyspark
import pandas as pd
from pyspark.sql import SparkSession


lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']

df = pd.DataFrame(data=lst,columns=cols)

#Create PySpark SparkSession
spark = SparkSession.builder \
    .master("local[1]") \
    .appName("spark") \
    .getOrCreate()

#Create PySpark DataFrame from Pandas
sparkDF=spark.createDataFrame(df) 
sparkDF.printSchema()
sparkDF.show()

或者,您也可以不使用 pandas

from pyspark.sql import SparkSession

lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']

#Create PySpark SparkSession
spark = SparkSession.builder \
    .master("local[1]") \
    .appName("spark") \
    .getOrCreate()

df = spark.createDataFrame(lst).toDF(*cols)
df.printSchema()
df.show()