无法推断类型的模式:<type 'unicode'> 将 RDD 转换为 DataFrame 时
Can not infer schema for type: <type 'unicode'> when converted RDD to DataFrame
当我尝试通过 RDD 转换为 spark 中的 Dataframe 时出现以下异常 "Can not infer schema for type: "
示例:
>> rangeRDD.take(1).foreach(println)
(301,301,10)
>> sqlContext.inferSchema(rangeRDD)
Can not infer schema for type: <type 'unicode'>
任何指针如何修复它?我什至尝试自己在 sqlContext.createDataFrame(rdd, schema)
中注入模式
schema = StructType([
StructField("x", IntegerType(), True),
StructField("y", IntegerType(), True),
StructField("z", IntegerType(), True)])
df = sqlContext.createDataFrame(rangeRDD, schema)
print df.first()
但最终出现运行时错误 'ValueError: Unexpected tuple u'(301,301,10)' with StructType'
先尝试解析数据
>>> rangeRDD = sc.parallelize([ u'(301,301,10)'])
>>> tupleRangeRDD = rangeRDD.map(lambda x: x[1:-1]) \
... .map(lambda x: x.split(",")) \
... .map(lambda x: [int(y) for y in x])
>>> df = sqlContext.createDataFrame(tupleRangeRDD, schema)
>>> df.first()
Row(x=301, y=301, z=10)
当我尝试通过 RDD 转换为 spark 中的 Dataframe 时出现以下异常 "Can not infer schema for type: "
示例:
>> rangeRDD.take(1).foreach(println)
(301,301,10)
>> sqlContext.inferSchema(rangeRDD)
Can not infer schema for type: <type 'unicode'>
任何指针如何修复它?我什至尝试自己在 sqlContext.createDataFrame(rdd, schema)
中注入模式schema = StructType([
StructField("x", IntegerType(), True),
StructField("y", IntegerType(), True),
StructField("z", IntegerType(), True)])
df = sqlContext.createDataFrame(rangeRDD, schema)
print df.first()
但最终出现运行时错误 'ValueError: Unexpected tuple u'(301,301,10)' with StructType'
先尝试解析数据
>>> rangeRDD = sc.parallelize([ u'(301,301,10)'])
>>> tupleRangeRDD = rangeRDD.map(lambda x: x[1:-1]) \
... .map(lambda x: x.split(",")) \
... .map(lambda x: [int(y) for y in x])
>>> df = sqlContext.createDataFrame(tupleRangeRDD, schema)
>>> df.first()
Row(x=301, y=301, z=10)