使用apache commons csv读取csv文件时跳过双引号

skip double quotes when reading csv file using apache commons csv

Reader in = new FileReader(dataFile);
Iterable<CSVRecord> records = CSVFormat.RFC4180.withFirstRecordAsHeader().withIgnoreEmptyLines(true).withTrim().parse(in);

        // Reads the data in csv file until last row is encountered
        for (CSVRecord record : records) {

            String column1= record.get("column1");

这里 csv 文件中的 column1 值类似于“1234557。所以当我阅读该列时,它在开头用引号获取。Apache commons csv 中是否有任何方法可以跳过这些。

来自 csv 文件的样本数据:"""0996108562","""204979956"

无法通过此 MCVE (Minimal, Complete, and Verifiable example) 使用 commons-csv-1.4.jar 进行复制:

String input = "column1,column2\r\n" +
               "1,Foo\r\n" +
               "\"2\",\"Bar\"\r\n";
CSVFormat csvFormat = CSVFormat.RFC4180.withFirstRecordAsHeader()
                                       .withIgnoreEmptyLines(true)
                                       .withTrim();
try (CSVParser records = csvFormat.parse(new StringReader(input))) {
    for (CSVRecord record : records) {
        String column1 = record.get("column1");
        String column2 = record.get("column2");
        System.out.println(column1 + ": "+ column2);
    }
}

输出:

1: Foo
2: Bar

"2""Bar" 周围的引号已被删除。

如果我正确理解了您的要求,您需要使用 Apache 的 StringEscapeUtils 中的 unescapeCsv。正如文档所说:

If the value is enclosed in double quotes, and contains a comma, newline >>or double quote, then quotes are removed.

Any double quote escaped characters (a pair of double quotes) are unescaped to just one double quote.

If the value is not enclosed in double quotes, or is and does not contain a comma, newline or double quote, then the String value is returned unchanged.