PySpark 中是否有等同于 scikit-learn 的 sample_weight 的参数?
Is there in PySpark a parameter equivalent to scikit-learn's sample_weight?
我目前正在使用 scikit-learn
库提供的 SGDClassifier
。当我使用 fit
方法时,我可以设置 sample_weight
参数:
Weights applied to individual samples. If not provided, uniform
weights are assumed. These weights will be multiplied with
class_weight (passed through the constructor) if class_weight is
specified
我想切换到 PySpark 并使用 LogisticRegression
class。无论如何,我找不到类似于 sample_weight
的参数。有一个 weightCol
参数,但我认为它做了一些不同的事情。
你有什么建议吗?
There is a weightCol
parameter but I think it does something different.
相反,weightCol
的 Spark ML 正是这样做的;来自 docs(重点添加):
weightCol
= Param(parent='undefined', name='weightCol', doc='weight column name. If this is not set or empty, we treat all instance weights as 1.0.')
我目前正在使用 scikit-learn
库提供的 SGDClassifier
。当我使用 fit
方法时,我可以设置 sample_weight
参数:
Weights applied to individual samples. If not provided, uniform weights are assumed. These weights will be multiplied with class_weight (passed through the constructor) if class_weight is specified
我想切换到 PySpark 并使用 LogisticRegression
class。无论如何,我找不到类似于 sample_weight
的参数。有一个 weightCol
参数,但我认为它做了一些不同的事情。
你有什么建议吗?
There is a
weightCol
parameter but I think it does something different.
相反,weightCol
的 Spark ML 正是这样做的;来自 docs(重点添加):
weightCol
= Param(parent='undefined', name='weightCol', doc='weight column name. If this is not set or empty, we treat all instance weights as 1.0.')