opencsv:如何在单元格内用双引号解析数据?

opencsv: how to parse data with double quotes inside cells?

我正在尝试使用 opencsv(版本 3.10)解析一些 public 数据。下面是一段代码,它获取 CSV 并将记录映射到 POJO 的列表:

URL permitsURL = new URL("http://assessor.boco.solutions/ASR_PublicDataFiles/Permits.csv");
InputStream permitInputStream = permitsURL.openStream();
Reader permitStreamReader = new InputStreamReader(permitInputStream);

CsvToBean<PermitRecord> csvToBean = new CsvToBean<PermitRecord>();

Map<String, String> columnMapping = new HashMap<String, String>();
columnMapping.put("strap", "strap");
columnMapping.put("issued_by", "issuedBy");
columnMapping.put("permit_num", "permitNum");
columnMapping.put("permit_category", "permitCategory");
columnMapping.put("issue_dt", "issueDt");
columnMapping.put("estimated_value", "estimatedValue");
columnMapping.put("description", "description");

HeaderColumnNameTranslateMappingStrategy<PermitRecord> strategy = new HeaderColumnNameTranslateMappingStrategy<PermitRecord>();
strategy.setType(PermitRecord.class);
strategy.setColumnMapping(columnMapping);

List<PermitRecord> permitRecordList = null;

CSVReader csvReader = new CSVReader(permitStreamReader);
permitRecordList = csvToBean.parse(strategy, csvReader);

解析列表中的记录少于 CSV 中的记录。查看数据,我注意到单元格值中有时会有双引号。这是一个例子:

"R0601364                 ","LAFAYETTE","14-0486","DECK","4/29/2014 12:00:00 AM","3834","deck under 36\"""
"R0601365                 ","LAFAYETTE","13-0570","NEW CONSTRUCTION","5/22/2013 12:00:00 AM","121899","SIN FAMILY HOME PLN CUSTOM FIN BASEMENT"

36 下的 甲板导致后续记录进入描述。通过 IDE 查看时,这一点更加明显:

你能看出我做错了什么吗?我怀疑有一个简单的修复方法,因为它已被 Excel 正确解析,并且 opencsv 似乎是 Java CSV 解析的事实标准。

Univocity CSV parsers真的好用。将 CSV 列映射到 POJO 属性轻而易举。

我在 pom.xml 中添加了以下依赖项:

<dependency>
    <groupId>com.univocity</groupId>
    <artifactId>univocity-parsers</artifactId>
    <version>2.5.4</version>
</dependency>

CSV 列使用注释映射到属性。注意方便的注释:

  • Parsed(field = "abc"):将 CSV 列映射到变量
  • @Trim:删除 leading/trailing 空格
  • @Format(formats = {"MM/dd/yyyy"}): 允许我们指定日期格式

这是 POJO:

package io.woolford.entity;

import com.univocity.parsers.annotations.Format;
import com.univocity.parsers.annotations.Parsed;
import com.univocity.parsers.annotations.Trim;
import java.util.Date;

public class PermitRecord {

    @Trim
    @Parsed(field = "strap")
    private String strap;

    @Parsed(field = "issued_by")
    private String issuedBy;

    @Parsed(field = "permit_num")
    private String permitNum;

    @Parsed(field = "permit_category")
    private String permitCategory;

    @Format(formats = {"MM/dd/yyyy"})
    @Parsed(field = "issue_dt")
    private Date issueDt;

    @Parsed(field = "estimated_value")
    private Integer estimatedValue;

    @Parsed(field = "description")
    private String description;

    // getters & setters removed for brevity
}

然后,根据 CSV 文件中的记录创建 POJO 列表:

URL permitsURL = new URL("http://assessor.boco.solutions/ASR_PublicDataFiles/Permits.csv");
InputStream permitInputStream = permitsURL.openStream();
List<PermitRecord> permitRecordList = new CsvRoutines().parseAll(PermitRecord.class, permitInputStream);

这个优雅的解决方案归功于@JeronimoBackes。并感谢 Univocity 出色的 CSV 解析器。