Python + Pandas + Spark - 如何将数据帧导入 Pandas 数据帧并将其转换为字典?
Python + Pandas + Spark - How to import a dataframe into Pandas dataframe and convert it into a dictionary?
如何将数据框导入 Pandas 数据框并将其转换为字典?
我从 Spark 创建了这个数据框,
sc = SparkContext(appName="PythonSQL")
sqlContext = SQLContext(sc)
path = os.path.join(os.environ['SPARK_HOME'], "examples/src/main/resources/people.json")
# Create the DataFrame
df = sqlContext.read.json(path)
# Register this DataFrame as a table.
df.registerTempTable("people")
# SQL statements can be run by using the sql methods provided by sqlContext
teenagers = sqlContext.sql("SELECT name FROM people")
sc.stop()
当我尝试将其导入 Pandas 时,
teenagers = pd.DataFrame(teenagers, columns=['name'])
我收到这个错误,
[client 127.0.0.1:50885] PandasError: DataFrame constructor not
properly called!
毕竟我只是想把dataframe转换成字典,
dict = teenagers.set_index('name').to_dict()
print dict
有什么想法吗?
可以使用 toPandas
方法将 Spark DataFrame 转换为 pandas DataFrame。
如何将数据框导入 Pandas 数据框并将其转换为字典?
我从 Spark 创建了这个数据框,
sc = SparkContext(appName="PythonSQL")
sqlContext = SQLContext(sc)
path = os.path.join(os.environ['SPARK_HOME'], "examples/src/main/resources/people.json")
# Create the DataFrame
df = sqlContext.read.json(path)
# Register this DataFrame as a table.
df.registerTempTable("people")
# SQL statements can be run by using the sql methods provided by sqlContext
teenagers = sqlContext.sql("SELECT name FROM people")
sc.stop()
当我尝试将其导入 Pandas 时,
teenagers = pd.DataFrame(teenagers, columns=['name'])
我收到这个错误,
[client 127.0.0.1:50885] PandasError: DataFrame constructor not properly called!
毕竟我只是想把dataframe转换成字典,
dict = teenagers.set_index('name').to_dict()
print dict
有什么想法吗?
可以使用 toPandas
方法将 Spark DataFrame 转换为 pandas DataFrame。