PySpark 中是否有等同于 scikit-learn 的 sample_weight 的参数？

Is there in PySpark a parameter equivalent to scikit-learn's sample_weight?

我目前正在使用 scikit-learn 库提供的 SGDClassifier。当我使用 fit 方法时，我可以设置 sample_weight 参数：

Weights applied to individual samples. If not provided, uniform weights are assumed. These weights will be multiplied with class_weight (passed through the constructor) if class_weight is specified

我想切换到 PySpark 并使用 LogisticRegression class。无论如何，我找不到类似于 sample_weight 的参数。有一个 weightCol 参数，但我认为它做了一些不同的事情。

你有什么建议吗？

There is a weightCol parameter but I think it does something different.

相反，weightCol 的 Spark ML 正是这样做的；来自 docs（重点添加）：

weightCol = Param(parent='undefined', name='weightCol', doc='weight column name. If this is not set or empty, we treat all instance weights as 1.0.')

PySpark 中是否有等同于 scikit-learn 的 sample_weight 的参数？

Is there in PySpark a parameter equivalent to scikit-learn's sample_weight?

python

scikit-learn

pyspark

apache-spark-ml

apache-spark-mllib