如何在 for 循环中使用 Spark 随后在数据集中添加列(其中 for 循环包含列名)

How to add columns subsequently in a dataset using Spark within a for loop ( where for loop contains the column name)

此处尝试将后续列添加到数据集行,出现的问题是最后一列仅可见。之前添加的列不会持续存在

private static void populate(Dataset<Row> res, String[] args)
    {
        String[] propArr = args[0].split(",");   // Eg: [abc, def, ghi]       
            
        // Dataset<Row> addColToMergedData = null;
        
        /** Here each element is the name of the column to be inserted */
        for(int i = 0; i < propArr.length; i++){

            // addColToMergedData = res.withColumn(propArr[i], lit(null));
        }
    }

for 循环 中的逻辑存在缺陷,因此出现了问题。 您可以按如下方式修改程序:

private static void populate(Dataset<Row> res, String[] args)
        {
                String[] propArr = args[0].split(",");   // Eg: [abc, def, ghi]       
               
                Dataset<Row> addColToMergedData = null;
        
                /** Here each element is the name of the column to be inserted */
                for(int i = 0; i < propArr.length; i++)
                {
                    res = res.withColumn(propArr[i], lit(null));
                }
                addColToMergedData  = res

        }

太阳:

// addColToMergedData = res.withColumn(colMap.get(propArr[i]), lit(null));

应该写成: res = res.withColumn(colMap.get(propArr[i]), lit(null));