将 .txt Spark 输出转换为 .csv

Converting .txt Spark Output to .csv

目前,我正在从 .txt 文件中的 spark 作业获取输出。我正在尝试将其转换为 .csv

.txt 输出 (Dataset <String>)

John MIT Bachelor ComputerScience Mike UB Master ComputerScience

.csv 输出

NAME, UNIV, DEGREE, COURSE
   John,MIT,Bachelor,ComputerScience
   Amit,UB,Master,ComputerScience

我试图将它收集到一个列表中,但我不确定如何将它转换为 .csv 并添加 header。

这是一种简单的方法,可以将 txt 输出数据转换为数据结构(可以轻松写入 csv 文件)。

基本思想是使用数据结构以及 headers / 列的数量,以便从一个 liner txt 输出中解析条目集。

看看代码注释,每个 "TODO 4 U" 都对你有用,主要是因为我真的猜不出来你需要在代码中的那些位置做什么(比如如何获得 headers)。

This is just a main method that does its work straight forward. You may want to understand what it does and apply changes that make the code meet your requiremtens. Input and output are just Strings that you have to create, receive or process yourself.

public static void main(String[] args) {

    // TODO 4 U: get the values for the header somehow
    String headerLine = "NAME, UNIV, DEGREE, COURSE";

    // TODO 4 U: read the txt output
    String txtOutput = "John MIT Bachelor ComputerScience Mike UB Master ComputerScience";

    /*
     * then split the header line
     * (or do anything similar, I don't know where your header comes from)
     */
    String[] headers = headerLine.split(", ");

    // store the amount of headers, which is the amount of columns
    int amountOfColumns = headers.length;

    // split txt output data by space
    String[] data = txtOutput.split(" ");

    /*
     * declare a data structure that stores lists of Strings,
     * each one is representing a line of the csv file
     */
    Map<Integer, List<String>> linesForCsv = new TreeMap<Integer, List<String>>();

    // get the length of the txt output data
    int a = data.length;

    // create a list of Strings containing the headers and put it into the data structure
    List<String> columnHeaders = Arrays.asList(headers);
    linesForCsv.put(0, columnHeaders);

    // declare a line counter for the csv file
    int l = 0;
    // go through the txt output data in order to get the lines for the csv file
    for (int i = 0; i < a; i++) {
        // check if there is a new line to be created
        if (i % amountOfColumns == 0) {
            /*
             * every time the amount of headers is reached,
             * create a new list for a new line in the csv file
             */
            l++; // increment the line counter (even at 0 because the header row is inserted at 0)
            linesForCsv.put(l, new ArrayList<String>()); // create a new line-list
            linesForCsv.get(l).add(data[i]); // add the data to the line-list
        } else {
            // if there is no new line to be created, store the data in the current one
            linesForCsv.get(l).add(data[i]);
        }
    }

    // print the lines stored in the map
    // TODO 4 U: write this to a csv file instead of just printing it to the console
    linesForCsv.forEach((lineNumber, line) -> {
        System.out.println("Line " + lineNumber + ": " + String.join(",", line));
    });
}