通过修改其私有方法来自定义 Spark ML 估计器（例如 GaussianMixture）的正确方法？

Proper way to customize Spark ML estimator (e.g. GaussianMixture) by modified its private method?

我的代码使用了apache.ml.clustering.GaussianMixture，但是它的初始化方法private def initRandom(...)不能正常工作，所以我想自定义一个新的init方法。

一开始我想“扩展”class GuassianMixture，但是initRandom是私有方法

然后我尝试了另一种方法，它是设置初始 GMM，但遗憾的是源代码说 TODO: SPARK-15785 Support users supplied initial GMM.

我也试过复制class GuassianMixture的代码用于我的自定义class，但是附加的东西太多了。 GaussianMixture.scala 带有一些 classes 和特征，其中一些只能在 ML 包中访问。

我自己解决了。这是我的解决方案。

我创建了 class CustomGaussianMixture，它从官方包 org.apache.spark.ml.clustering.

扩展了 GaussianMixture

并且在我的项目中，我创建了一个新包，也命名为 org.apache.spark.ml.clustering（以防止在 org.apache.spark.ml.clustering 中处理范围的复杂性 classes/traits/objects）。并将我的自定义 class 放入其中。

接下来是重写方法(fit) 调用initRandom，一个非私有方法，所以我可以重写它。具体来说，只需在classCustomGaussianMixture中编写我的新初始化方法，并将fit中的方法fit从GaussianMixture.scala中的官方源代码复制到classCustomGaussianMixture，记得修改 CustomGaussianMixture.fit() 中的代码来调用我自定义的 init 方法。

最后，需要的时候用CustomGaussianMixture代替GaussianMixture就可以了。

通过修改其私有方法来自定义 Spark ML 估计器（例如 GaussianMixture）的正确方法？

Proper way to customize Spark ML estimator (e.g. GaussianMixture) by modified its private method?

extends

scala

apache-spark

apache-spark-ml