pyspark.mllib 中 LabeledPoint 的类型转换错误,在 pyspark.ml 中使用线性回归模型
Type conversion error from LabeledPoint in pyspark.mllib, for using linear regression model in pyspark.ml
我有以下使用 pyspark.ml 包进行线性回归的代码。但是,当模型正在拟合时,我在最后一行收到此错误消息:
IllegalArgumentException: u'requirement failed: Column features must
be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was
actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.
有人知道缺少什么吗?
pyspark.ml
中是否有 pyspark.mllib
中的 LabeledPoint
的替代品?
from pyspark import SparkContext
from pyspark.ml.regression import LinearRegression
from pyspark.mllib.regression import LabeledPoint
import numpy as np
from pandas import *
data = sc.textFile("/FileStore/tables/w7baik1x1487076820914/randomTableSmall.csv")
def parsePoint(line):
values = [float(x) for x in line.split(',')]
return LabeledPoint(values[1], [values[0]])
points_df = data.map(parsePoint).toDF()
lr = LinearRegression()
model = lr.fit(points_df, {lr.regParam:0.0})
问题是较新版本的 spark 在 ml 的 linalg 模块中有一个 Vector class,您不需要从 mllib.linalg 中获取它。此外,较新的版本不接受 spark.mllib.linalg.VectorUDT in ml。这是适合您的代码:
from pyspark import SparkContext
from pyspark.ml.regression import LinearRegression
from pyspark.ml.linalg import Vectors
import numpy as np
data = sc.textFile("/FileStore/tables/w7baik1x1487076820914/randomTableSmall.csv")
def parsePoint(line):
values = [float(x) for x in line.split(',')]
return (values[1], Vectors.dense([values[0]]))
points_df = data.map(parsePoint).toDF(['label','features'])
lr = LinearRegression()
model = lr.fit(points_df)
Spark 较新版本不接受 spark.mllib.linalg.VectorUDT(您不需要从 mllib.linalg 获取它)。
尝试替换
from pyspark.mllib.regression import LabeledPoint
作者:
from pyspark.ml.linalg import Vectors
我有以下使用 pyspark.ml 包进行线性回归的代码。但是,当模型正在拟合时,我在最后一行收到此错误消息:
IllegalArgumentException: u'requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.
有人知道缺少什么吗?
pyspark.ml
中是否有 pyspark.mllib
中的 LabeledPoint
的替代品?
from pyspark import SparkContext
from pyspark.ml.regression import LinearRegression
from pyspark.mllib.regression import LabeledPoint
import numpy as np
from pandas import *
data = sc.textFile("/FileStore/tables/w7baik1x1487076820914/randomTableSmall.csv")
def parsePoint(line):
values = [float(x) for x in line.split(',')]
return LabeledPoint(values[1], [values[0]])
points_df = data.map(parsePoint).toDF()
lr = LinearRegression()
model = lr.fit(points_df, {lr.regParam:0.0})
问题是较新版本的 spark 在 ml 的 linalg 模块中有一个 Vector class,您不需要从 mllib.linalg 中获取它。此外,较新的版本不接受 spark.mllib.linalg.VectorUDT in ml。这是适合您的代码:
from pyspark import SparkContext
from pyspark.ml.regression import LinearRegression
from pyspark.ml.linalg import Vectors
import numpy as np
data = sc.textFile("/FileStore/tables/w7baik1x1487076820914/randomTableSmall.csv")
def parsePoint(line):
values = [float(x) for x in line.split(',')]
return (values[1], Vectors.dense([values[0]]))
points_df = data.map(parsePoint).toDF(['label','features'])
lr = LinearRegression()
model = lr.fit(points_df)
Spark 较新版本不接受 spark.mllib.linalg.VectorUDT(您不需要从 mllib.linalg 获取它)。
尝试替换
from pyspark.mllib.regression import LabeledPoint
作者:
from pyspark.ml.linalg import Vectors