opencsv:如何在单元格内用双引号解析数据?
opencsv: how to parse data with double quotes inside cells?
我正在尝试使用 opencsv(版本 3.10)解析一些 public 数据。下面是一段代码,它获取 CSV 并将记录映射到 POJO 的列表:
URL permitsURL = new URL("http://assessor.boco.solutions/ASR_PublicDataFiles/Permits.csv");
InputStream permitInputStream = permitsURL.openStream();
Reader permitStreamReader = new InputStreamReader(permitInputStream);
CsvToBean<PermitRecord> csvToBean = new CsvToBean<PermitRecord>();
Map<String, String> columnMapping = new HashMap<String, String>();
columnMapping.put("strap", "strap");
columnMapping.put("issued_by", "issuedBy");
columnMapping.put("permit_num", "permitNum");
columnMapping.put("permit_category", "permitCategory");
columnMapping.put("issue_dt", "issueDt");
columnMapping.put("estimated_value", "estimatedValue");
columnMapping.put("description", "description");
HeaderColumnNameTranslateMappingStrategy<PermitRecord> strategy = new HeaderColumnNameTranslateMappingStrategy<PermitRecord>();
strategy.setType(PermitRecord.class);
strategy.setColumnMapping(columnMapping);
List<PermitRecord> permitRecordList = null;
CSVReader csvReader = new CSVReader(permitStreamReader);
permitRecordList = csvToBean.parse(strategy, csvReader);
解析列表中的记录少于 CSV 中的记录。查看数据,我注意到单元格值中有时会有双引号。这是一个例子:
"R0601364 ","LAFAYETTE","14-0486","DECK","4/29/2014 12:00:00 AM","3834","deck under 36\"""
"R0601365 ","LAFAYETTE","13-0570","NEW CONSTRUCTION","5/22/2013 12:00:00 AM","121899","SIN FAMILY HOME PLN CUSTOM FIN BASEMENT"
36 下的 甲板导致后续记录进入描述。通过 IDE 查看时,这一点更加明显:
你能看出我做错了什么吗?我怀疑有一个简单的修复方法,因为它已被 Excel 正确解析,并且 opencsv 似乎是 Java CSV 解析的事实标准。
Univocity CSV parsers真的好用。将 CSV 列映射到 POJO 属性轻而易举。
我在 pom.xml
中添加了以下依赖项:
<dependency>
<groupId>com.univocity</groupId>
<artifactId>univocity-parsers</artifactId>
<version>2.5.4</version>
</dependency>
CSV 列使用注释映射到属性。注意方便的注释:
Parsed(field = "abc")
:将 CSV 列映射到变量
@Trim
:删除 leading/trailing 空格
@Format(formats = {"MM/dd/yyyy"})
: 允许我们指定日期格式
这是 POJO:
package io.woolford.entity;
import com.univocity.parsers.annotations.Format;
import com.univocity.parsers.annotations.Parsed;
import com.univocity.parsers.annotations.Trim;
import java.util.Date;
public class PermitRecord {
@Trim
@Parsed(field = "strap")
private String strap;
@Parsed(field = "issued_by")
private String issuedBy;
@Parsed(field = "permit_num")
private String permitNum;
@Parsed(field = "permit_category")
private String permitCategory;
@Format(formats = {"MM/dd/yyyy"})
@Parsed(field = "issue_dt")
private Date issueDt;
@Parsed(field = "estimated_value")
private Integer estimatedValue;
@Parsed(field = "description")
private String description;
// getters & setters removed for brevity
}
然后,根据 CSV 文件中的记录创建 POJO 列表:
URL permitsURL = new URL("http://assessor.boco.solutions/ASR_PublicDataFiles/Permits.csv");
InputStream permitInputStream = permitsURL.openStream();
List<PermitRecord> permitRecordList = new CsvRoutines().parseAll(PermitRecord.class, permitInputStream);
这个优雅的解决方案归功于@JeronimoBackes。并感谢 Univocity 出色的 CSV 解析器。
我正在尝试使用 opencsv(版本 3.10)解析一些 public 数据。下面是一段代码,它获取 CSV 并将记录映射到 POJO 的列表:
URL permitsURL = new URL("http://assessor.boco.solutions/ASR_PublicDataFiles/Permits.csv");
InputStream permitInputStream = permitsURL.openStream();
Reader permitStreamReader = new InputStreamReader(permitInputStream);
CsvToBean<PermitRecord> csvToBean = new CsvToBean<PermitRecord>();
Map<String, String> columnMapping = new HashMap<String, String>();
columnMapping.put("strap", "strap");
columnMapping.put("issued_by", "issuedBy");
columnMapping.put("permit_num", "permitNum");
columnMapping.put("permit_category", "permitCategory");
columnMapping.put("issue_dt", "issueDt");
columnMapping.put("estimated_value", "estimatedValue");
columnMapping.put("description", "description");
HeaderColumnNameTranslateMappingStrategy<PermitRecord> strategy = new HeaderColumnNameTranslateMappingStrategy<PermitRecord>();
strategy.setType(PermitRecord.class);
strategy.setColumnMapping(columnMapping);
List<PermitRecord> permitRecordList = null;
CSVReader csvReader = new CSVReader(permitStreamReader);
permitRecordList = csvToBean.parse(strategy, csvReader);
解析列表中的记录少于 CSV 中的记录。查看数据,我注意到单元格值中有时会有双引号。这是一个例子:
"R0601364 ","LAFAYETTE","14-0486","DECK","4/29/2014 12:00:00 AM","3834","deck under 36\"""
"R0601365 ","LAFAYETTE","13-0570","NEW CONSTRUCTION","5/22/2013 12:00:00 AM","121899","SIN FAMILY HOME PLN CUSTOM FIN BASEMENT"
36 下的 甲板导致后续记录进入描述。通过 IDE 查看时,这一点更加明显:
你能看出我做错了什么吗?我怀疑有一个简单的修复方法,因为它已被 Excel 正确解析,并且 opencsv 似乎是 Java CSV 解析的事实标准。
Univocity CSV parsers真的好用。将 CSV 列映射到 POJO 属性轻而易举。
我在 pom.xml
中添加了以下依赖项:
<dependency>
<groupId>com.univocity</groupId>
<artifactId>univocity-parsers</artifactId>
<version>2.5.4</version>
</dependency>
CSV 列使用注释映射到属性。注意方便的注释:
Parsed(field = "abc")
:将 CSV 列映射到变量@Trim
:删除 leading/trailing 空格@Format(formats = {"MM/dd/yyyy"})
: 允许我们指定日期格式
这是 POJO:
package io.woolford.entity;
import com.univocity.parsers.annotations.Format;
import com.univocity.parsers.annotations.Parsed;
import com.univocity.parsers.annotations.Trim;
import java.util.Date;
public class PermitRecord {
@Trim
@Parsed(field = "strap")
private String strap;
@Parsed(field = "issued_by")
private String issuedBy;
@Parsed(field = "permit_num")
private String permitNum;
@Parsed(field = "permit_category")
private String permitCategory;
@Format(formats = {"MM/dd/yyyy"})
@Parsed(field = "issue_dt")
private Date issueDt;
@Parsed(field = "estimated_value")
private Integer estimatedValue;
@Parsed(field = "description")
private String description;
// getters & setters removed for brevity
}
然后,根据 CSV 文件中的记录创建 POJO 列表:
URL permitsURL = new URL("http://assessor.boco.solutions/ASR_PublicDataFiles/Permits.csv");
InputStream permitInputStream = permitsURL.openStream();
List<PermitRecord> permitRecordList = new CsvRoutines().parseAll(PermitRecord.class, permitInputStream);
这个优雅的解决方案归功于@JeronimoBackes。并感谢 Univocity 出色的 CSV 解析器。