将数组的数据类型从 double 更改为 int 的问题
Issues changing the datatype of an array from double to int
我有一组数据,我正在尝试编写一个 python 程序,在数据块中加载文件时从架构级别更改数据类型。在将数组的数据类型从 DOUBLE 更改为 INT 时,我不断收到错误
架构
root
|-- _id: string (nullable = true)
|-- city: string (nullable = true)
|-- loc: array (nullable = true)
| |-- element: double (containsNull = true)
|-- pop: long (nullable = true)
|-- state: string (nullable = true)
我的代码
s= StructType([
StructField("_id",IntegerType(), True),
StructField("city",StringType(), True),
StructField("loc",ArrayType(), True),
StructField("element",DoubleType(), True),
StructField("pop",LongType(), True),
StructField("state",StringType(), True)
])
filepath= "/FileStore/tables/zips.json"
df2= spark.read.format("json").load(filepath, schema=s)
df2.show()
错误
TypeError: __init__() missing 1 required positional argument: 'elementType'
示例数据
您的模式定义不正确
s= StructType([
StructField("_id",IntegerType(), True),
StructField("city",StringType(), True),
StructField("loc",ArrayType(DoubleType()), True),
StructField("element",DoubleType(), True),
StructField("pop",LongType(), True),
StructField("state",StringType(), True)
])
# "flatten" `lat` and `lon` from `loc` array
filepath= "/FileStore/tables/zips.json"
df2= (spark
.read.format("json").load(filepath, schema=s)
.withColumn('loc', F.array(
F.col('loc')[0].cast('int'),
F.col('loc')[1].cast('int')
))
)
df2.show()
# +---+----+------------+-----+-----+---+---+
# |_id|city| loc| pop|state|lat|lon|
# +---+----+------------+-----+-----+---+---+
# | 1| CC|[77.3, 77.2]|12345| SS| 77| 77|
# +---+----+------------+-----+-----+---+---+
您错过了在 ArrayType(elementType)
中传递一个参数
错误:elementType 应该是 DataType
from pyspark.sql.types import *
ArrayType(IntegerType())
点击这里了解更多:Documentation
我有一组数据,我正在尝试编写一个 python 程序,在数据块中加载文件时从架构级别更改数据类型。在将数组的数据类型从 DOUBLE 更改为 INT 时,我不断收到错误
架构
root
|-- _id: string (nullable = true)
|-- city: string (nullable = true)
|-- loc: array (nullable = true)
| |-- element: double (containsNull = true)
|-- pop: long (nullable = true)
|-- state: string (nullable = true)
我的代码
s= StructType([
StructField("_id",IntegerType(), True),
StructField("city",StringType(), True),
StructField("loc",ArrayType(), True),
StructField("element",DoubleType(), True),
StructField("pop",LongType(), True),
StructField("state",StringType(), True)
])
filepath= "/FileStore/tables/zips.json"
df2= spark.read.format("json").load(filepath, schema=s)
df2.show()
错误
TypeError: __init__() missing 1 required positional argument: 'elementType'
示例数据
您的模式定义不正确
s= StructType([
StructField("_id",IntegerType(), True),
StructField("city",StringType(), True),
StructField("loc",ArrayType(DoubleType()), True),
StructField("element",DoubleType(), True),
StructField("pop",LongType(), True),
StructField("state",StringType(), True)
])
# "flatten" `lat` and `lon` from `loc` array
filepath= "/FileStore/tables/zips.json"
df2= (spark
.read.format("json").load(filepath, schema=s)
.withColumn('loc', F.array(
F.col('loc')[0].cast('int'),
F.col('loc')[1].cast('int')
))
)
df2.show()
# +---+----+------------+-----+-----+---+---+
# |_id|city| loc| pop|state|lat|lon|
# +---+----+------------+-----+-----+---+---+
# | 1| CC|[77.3, 77.2]|12345| SS| 77| 77|
# +---+----+------------+-----+-----+---+---+
您错过了在 ArrayType(elementType)
中传递一个参数错误:elementType 应该是 DataType
from pyspark.sql.types import *
ArrayType(IntegerType())
点击这里了解更多:Documentation