无法使用 Apache Commons CSV 读取 CSV 文件 - IllegalArgumentException

Question

我正在尝试使用 Apache Commons CSV 从 CSV 文件（我从 EBay 的 MIP 服务器下载）访问数据，但是我遇到了以下错误：

java.lang.IllegalArgumentException: Index for header 'Selected Category ID' is 4 but CSVRecord only has 1 values!

我不太确定原因，因为该文件显然包含此索引。我的 CSV 文件如下所示：

我正在使用以下代码访问文件：

CSVParser csvParser = null;

    String selectedCategoryIDFieldName = "Selected Category ID";

    try {
        Reader reader = Files.newBufferedReader(Paths.get(CSVFile));
        csvParser = new CSVParser(reader, CSVFormat.DEFAULT
                .withHeader("SKU", "Locale", "Title", "Channel", selectedCategoryIDFieldName)
                .withIgnoreHeaderCase()
                .withTrim()
                .withSkipHeaderRecord(true));
    } catch (Exception e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }

    if (csvParser != null) {
        List<CSVRecord> csvRecords = csvParser.getRecords();
        for (CSVRecord csvRecord : csvRecords) {
            // Accessing values by the names assigned to each column

            try {
                long currentRecordNumber = csvRecord.getRecordNumber();
                String SKU = csvRecord.get("SKU");
                String categoryID = csvRecord.get(selectedCategoryIDFieldName);
                // ^^ this line throws `IllegalArgumentException`


                System.out.println("Current record number: " + currentRecordNumber);
                System.out.println("SKU - >  " + SKU);
                System.out.println("categoyrID -> "  + categoryID);


            } catch (Exception e) {
                e.printStackTrace();
            }   
        }

我搜索了 SO，发现最接近的问题是但它与我的问题无关，因为格式完全相同 before/after 我保存它（换句话说，我没有看到其他用户问题中的格式有任何问题。

更新： 我刚刚在 for (CSVRecord csvRecord : csvRecords) 循环的第二次迭代中发现了这个错误（文件只包含一个记录）。但是，我仍然不明白如果 CSV 文件中只有一条记录，为什么它会迭代两次。为什么它只按类别 ID 而不是按 SKU 列显示？

Answer 1

记录2中一定有一些空格，用记事本或notepad++打开文件。

我不熟悉 apache commons csv，所以它可能不是最佳解决方案

for (CSVRecord csvRecord : csvRecords) {
   if(csvRecord.size() >= csvParser.getHeaderMap.size()){  <--- add this if condition

Answer 2

也许给 univocity-parsers a go as it handles broken CSV pretty well (including dealing with unexpected spaces here and there) and it's also 3 times faster 而不是 commons-csv。它还应该使您的代码更清晰，因为您不必在任何地方放置 try/catch 块。

CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically();
settings.setHeaders("SKU", "Locale", "Title", "Channel", selectedCategoryIDFieldName);
// settings.setHeaderExtractionEnabled(true); //use this if the headers are in the input

CsvParser parser = new CsvParser(settings);
List<Record> records = parser.parseAllRecords(new File("/path/to/your.csv"));

希望对您有所帮助。

免责声明：我是这个图书馆的作者。它是开源且免费的（Apache 2.0 许可）

Answer 3

如果行的尾随值是空的，它们可能会连同它们的定界符一起被省略，导致 header 的大小大于行值的大小。这是非常有效的（？）CSV 文件。要调整您的解析器，请使用 isSet() 方法：

row.isSet(column) ? row.get(column) : EMPTY

无法使用 Apache Commons CSV 读取 CSV 文件 - IllegalArgumentException

Unable to Read CSV File with Apache Commons CSV - IllegalArgumentException

java

csv

apache

ebay-api