创建一个 pyspark 数据框
Create a pyspark dataframe
我的密码是
pdf = pd.DataFrame(
{
"Year": [x for x in range(2013, 2051)],
"CSIRO Adjusted Sea Level": 0.0,
}
)
pdf.head()
df_pyspark = spark.createDataFrame(pdf)
df_pyspark.show()
以上导致此错误:
An error occurred while calling o406.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in stage 11.0 (TID 11) (192.168.1.66 executor driver):
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\worker.py", line 601, in main
File "C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\worker.py", line 71, in read_command
File "C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\serializers.py", line 160, in _read_with_length
return self.loads(obj)
File "C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\serializers.py", line 430, in loads
return pickle.loads(obj, encoding=encoding)
AttributeError: Can't get attribute '_fill_function' on <module 'pyspark.cloudpickle' from 'C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\cloudpickle\__init__.py'>
还有更多文字。我做错了什么?
我也试过了
lista =[(i, 0.0) for i in range(2013, 2051)]
df = spark.createDataFrame(
[
lista
],
"Year, Sea Level",
)
...并收到此错误:
ValueError: Length of object (38) does not match with length of fields (2)
如果你只需要一个 spark 数据框就很简单了:
>>> from pyspark.sql.functions import *
>>> df = spark.range(2013,2051).withColumnRenamed("id","Year").withColumn("Sea Level",lit(0.0))
>>> df.show(10)
+----+---------+
|Year|Sea Level|
+----+---------+
|2013| 0.0|
|2014| 0.0|
|2015| 0.0|
|2016| 0.0|
|2017| 0.0|
|2018| 0.0|
|2019| 0.0|
|2020| 0.0|
|2021| 0.0|
|2022| 0.0|
+----+---------+
only showing top 10 rows
我的密码是
pdf = pd.DataFrame(
{
"Year": [x for x in range(2013, 2051)],
"CSIRO Adjusted Sea Level": 0.0,
}
)
pdf.head()
df_pyspark = spark.createDataFrame(pdf)
df_pyspark.show()
以上导致此错误:
An error occurred while calling o406.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in stage 11.0 (TID 11) (192.168.1.66 executor driver):
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\worker.py", line 601, in main
File "C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\worker.py", line 71, in read_command
File "C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\serializers.py", line 160, in _read_with_length
return self.loads(obj)
File "C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\serializers.py", line 430, in loads
return pickle.loads(obj, encoding=encoding)
AttributeError: Can't get attribute '_fill_function' on <module 'pyspark.cloudpickle' from 'C:\spark-3.2.1-bin-hadoop3.2\python\lib\pyspark.zip\pyspark\cloudpickle\__init__.py'>
还有更多文字。我做错了什么?
我也试过了
lista =[(i, 0.0) for i in range(2013, 2051)]
df = spark.createDataFrame(
[
lista
],
"Year, Sea Level",
)
...并收到此错误:
ValueError: Length of object (38) does not match with length of fields (2)
如果你只需要一个 spark 数据框就很简单了:
>>> from pyspark.sql.functions import *
>>> df = spark.range(2013,2051).withColumnRenamed("id","Year").withColumn("Sea Level",lit(0.0))
>>> df.show(10)
+----+---------+
|Year|Sea Level|
+----+---------+
|2013| 0.0|
|2014| 0.0|
|2015| 0.0|
|2016| 0.0|
|2017| 0.0|
|2018| 0.0|
|2019| 0.0|
|2020| 0.0|
|2021| 0.0|
|2022| 0.0|
+----+---------+
only showing top 10 rows