在 pyspark 中以特定值 zipWithIndex 开始索引

Question

我想以特定值而不是默认值零开始数据框中索引的值，如果有任何参数我们可以在 pyspark 中用于 zipWithIndex()。

Answer 1

以下解决方案将有助于以默认值启动 zipwithIndex。

df = df_child.rdd.zipWithIndex().map(lambda x: (x[0], x[1] + index)).toDF()

其中索引是您希望以 zipWithIndex 开头的默认编号。

Start index with certain value zipWithIndex in pyspark