如何在 for 循环中使用 Spark 随后在数据集中添加列(其中 for 循环包含列名)
How to add columns subsequently in a dataset using Spark within a for loop ( where for loop contains the column name)
此处尝试将后续列添加到数据集行,出现的问题是最后一列仅可见。之前添加的列不会持续存在
private static void populate(Dataset<Row> res, String[] args)
{
String[] propArr = args[0].split(","); // Eg: [abc, def, ghi]
// Dataset<Row> addColToMergedData = null;
/** Here each element is the name of the column to be inserted */
for(int i = 0; i < propArr.length; i++){
// addColToMergedData = res.withColumn(propArr[i], lit(null));
}
}
for 循环 中的逻辑存在缺陷,因此出现了问题。
您可以按如下方式修改程序:
private static void populate(Dataset<Row> res, String[] args)
{
String[] propArr = args[0].split(","); // Eg: [abc, def, ghi]
Dataset<Row> addColToMergedData = null;
/** Here each element is the name of the column to be inserted */
for(int i = 0; i < propArr.length; i++)
{
res = res.withColumn(propArr[i], lit(null));
}
addColToMergedData = res
}
太阳:
// addColToMergedData = res.withColumn(colMap.get(propArr[i]), lit(null));
应该写成:
res = res.withColumn(colMap.get(propArr[i]), lit(null));
此处尝试将后续列添加到数据集行,出现的问题是最后一列仅可见。之前添加的列不会持续存在
private static void populate(Dataset<Row> res, String[] args)
{
String[] propArr = args[0].split(","); // Eg: [abc, def, ghi]
// Dataset<Row> addColToMergedData = null;
/** Here each element is the name of the column to be inserted */
for(int i = 0; i < propArr.length; i++){
// addColToMergedData = res.withColumn(propArr[i], lit(null));
}
}
for 循环 中的逻辑存在缺陷,因此出现了问题。 您可以按如下方式修改程序:
private static void populate(Dataset<Row> res, String[] args)
{
String[] propArr = args[0].split(","); // Eg: [abc, def, ghi]
Dataset<Row> addColToMergedData = null;
/** Here each element is the name of the column to be inserted */
for(int i = 0; i < propArr.length; i++)
{
res = res.withColumn(propArr[i], lit(null));
}
addColToMergedData = res
}
太阳:
// addColToMergedData = res.withColumn(colMap.get(propArr[i]), lit(null));
应该写成: res = res.withColumn(colMap.get(propArr[i]), lit(null));