这个函数 computeSVD 在 Pyspark 中是否使用 MapReduce

Question

computeSVD()是否使用 map , reduce 因为它是预定义函数？

我不知道函数的代码。

from pyspark.mllib.linalg import Vectors
from pyspark.mllib.linalg.distributed import RowMatrix

rows = sc.parallelize([
    Vectors.sparse(5, {1: 1.0, 3: 7.0}),
    Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
    Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
])

mat = RowMatrix(rows)

# Compute the top 5 singular values and corresponding singular vectors.
svd = mat.computeSVD(5, computeU=True)   <------------- this function
U = svd.U       # The U factor is a RowMatrix.
s = svd.s       # The singular values are stored in a local dense vector.
V = svd.V       # The V factor is a local dense matrix.

Answer 1

确实如此，来自 Spark documentation

This page documents sections of the MLlib guide for the RDD-based API (the spark.mllib package). Please see the MLlib Main Guide for the DataFrame-based API (the spark.ml package), which is now the primary API for MLlib.

如果你想查看代码库，这里是https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L328

这个函数 computeSVD 在 Pyspark 中是否使用 MapReduce

Does this function computeSVD use MapReduce in Pyspark

python

svd

apache-spark

pyspark