computeSVD() 函数中使用的映射器和缩减器是什么？

Question

我是 Map reduce 的新手，我想做一些研究来使用 mapreduce 计算 svd。

代码方面：我发现 computeSVD 一个 pyspark 函数，它使用 mapreduce ，如本 .
理论方面：computeSVD() 函数中使用的映射器和缩减器是什么？

我的代码

findspark.init('C:\spark\spark-3.0.3-bin-hadoop2.7')
conf=SparkConf()
conf.setMaster("local[*]")
conf.setAppName('firstapp')

sc = SparkContext(conf=conf)
spark = SparkSession(sc)
rows = np.loadtxt('data.txt', dtype=float) # data.txt is a (m rows x n cols) matrix m>n
rows = sc.parallelize(rows)
mat = RowMatrix(rows)
svd = mat.computeSVD(5, computeU=True)

如果有任何帮助，我将不胜感激。

Answer 1

您可以在 this function 中看到 RDD 对象中 mapPartitions 和 reduceByKey 的用法，它们执行类似于 MapReduce 的操作，但与 Hadoop Mapreduce 不是同一个库

computeSVD() 函数中使用的映射器和缩减器是什么？

what is the mapper and reducer that are used in computeSVD() function?

svd

apache-spark