ValueError: DataFrame constructor not properly called (Databricks/Python)

Question

我正在尝试设置一个 Pandas Dataframe 来处理我在 Databricks 中的数据。我的数据是从本地计算机上的文件导入的，如下所示： Snip of the data

# Import packages
import pandas as pd
import numpy as np

ownr = spark.read.format("csv").load("dbfs:/FileStore/shared_uploads/directory/carsownr.csv")

# View the shape and data types
#print(ownr.shape)
print(ownr.dtypes);

#Setup Dataframes
df1 = pd.DataFrame(data=ownr);

当我运行执行此操作时，我收到以下错误消息：

ValueError: DataFrame constructor not properly called!
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<command-3703442649830271> in <module>
      1 #Setup Dataframes
----> 2 df1 = pd.DataFrame(data=ownr);
      3 df2 = pd.DataFrame(data=curr);

/databricks/python/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    507                 )
    508             else:
--> 509                 raise ValueError("DataFrame constructor not properly called!")?
    510 
    511         NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!

我以为可能是我数据的问题，于是查了一下，所有的字段都是字符串。我是否需要将每个 column/field 转换为适当的数据类型才能成功添加到数据框中？

我在论坛中搜索了类似的错误消息，但找不到相关消息。如果能提供任何帮助，我将不胜感激！

Answer 1

如果您想从 Spark Dataframe 中获取 Pandas Dataframe，则需要使用 toPandas function 代替：

df1 = ownr.toPandas()

另请注意，Spark Dataframe 的大小应足够小以适合驱动程序的内存。

此外，如果您正在为 Spark 寻找 Pandas API，请查看 Koalas library (Spark < 3.2, DBR < 10.0) that is also included into upcoming Spark 3.2 under name Pandas API for Spark (DBR 10+) - 在这种情况下，您将获得两者的好处火花 & Pandas