带引号的 Apache commons-csv 错误
Apache commons-csv error with quote
我正在使用 org.apache.commons-csv 1.4,这周我在我们的一个 junit 测试中发现了这种奇怪的行为:
CSVReader reader = null;
List<String[]> linesCsv = new ArrayList<>();
FileInputStream fileStream = null;
InputStreamReader inputStreamReader = null;
try {
fileStream = new FileInputStream(file);
inputStreamReader = new InputStreamReader(fileStream, "ISO-8859-1");
reader = new CSVReader(inputStreamReader, ',', '"', 0);
String[] record = null;
while ((record = reader.readNext()) != null) {
linesCsv.add(record);
}
} catch (Exception e) {
logger.error("Error in ", e);
} finally {
if (inputStreamReader != null) {
inputStreamReader.close();
}
if (fileStream != null) {
fileStream.close();
}
if (reader != null) {
reader.close();
}
}
*错误案例
输入.csv
DAR_123451 ,"XXXXX Hello World "Hello World XXX "
DAR_123452 ,"XXXXX Hello World "Hello World XXX "
Java 击倒:
[0.0] DAR_123451
[0.1] XXXXX 你好世界 "Hello World XXX\nDAR_123456 ,XXXXX Hello World "你好世界 XXX
*大小写正确
输入.csv
DAR_123451 ,"XXXXX Hello World "Hello World" XXX "
DAR_123452 ,"XXXXX Hello World "Hello World" XXX "
Java 好的:
[0.0] DAR_123451
[0.1] XXXXX 你好世界 "Hello World" XXX
[1.0] DAR_123452
[1.1] XXXXX 你好世界 "Hello World" XXX
我无法设置 commons csv 库正常工作,这似乎是一个 Bug,我们如何才能正确读取字符串中带有单引号的字符串?
CSV 格式通常使用 2 个连续的 double-quotes 在文本中包含一个 double-quote 如果值被引号括起来,例如以下作品。
当我使用最新版本的 commons-csv 时,您的输入甚至出现异常 (IOException: (line 1) invalid char between encapsulated token and delimiter
)
因此,要正确包含 double-quotes,您需要使用以下内容
DAR_123451 ,"XXXXX Hello World ""Hello World"" XXX "
DAR_123452 ,"XXXXX Hello World ""Hello World"" XXX "
然后 test-case 按预期工作:
Reader in = new StringReader(
"DAR_123451 ,\"XXXXX Hello World \"\"Hello World XXX\"\" \"\n" +
"DAR_123452 ,\"XXXXX Hello World \"\"Hello World XXX\"\" \"");
Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in);
for (CSVRecord record : records) {
for (int i = 0; i < record.size(); i++) {
System.out.println("At " + i + ": " + record.get(i));
}
}
输出:
At 0: DAR_123451
At 1: XXXXX Hello World "Hello World XXX"
At 0: DAR_123452
At 1: XXXXX Hello World "Hello World XXX"
详情见https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality。
我正在使用 org.apache.commons-csv 1.4,这周我在我们的一个 junit 测试中发现了这种奇怪的行为:
CSVReader reader = null;
List<String[]> linesCsv = new ArrayList<>();
FileInputStream fileStream = null;
InputStreamReader inputStreamReader = null;
try {
fileStream = new FileInputStream(file);
inputStreamReader = new InputStreamReader(fileStream, "ISO-8859-1");
reader = new CSVReader(inputStreamReader, ',', '"', 0);
String[] record = null;
while ((record = reader.readNext()) != null) {
linesCsv.add(record);
}
} catch (Exception e) {
logger.error("Error in ", e);
} finally {
if (inputStreamReader != null) {
inputStreamReader.close();
}
if (fileStream != null) {
fileStream.close();
}
if (reader != null) {
reader.close();
}
}
*错误案例
输入.csv
DAR_123451 ,"XXXXX Hello World "Hello World XXX "
DAR_123452 ,"XXXXX Hello World "Hello World XXX "
Java 击倒:
[0.0] DAR_123451
[0.1] XXXXX 你好世界 "Hello World XXX\nDAR_123456 ,XXXXX Hello World "你好世界 XXX
*大小写正确
输入.csv
DAR_123451 ,"XXXXX Hello World "Hello World" XXX "
DAR_123452 ,"XXXXX Hello World "Hello World" XXX "
Java 好的:
[0.0] DAR_123451 [0.1] XXXXX 你好世界 "Hello World" XXX
[1.0] DAR_123452 [1.1] XXXXX 你好世界 "Hello World" XXX
我无法设置 commons csv 库正常工作,这似乎是一个 Bug,我们如何才能正确读取字符串中带有单引号的字符串?
CSV 格式通常使用 2 个连续的 double-quotes 在文本中包含一个 double-quote 如果值被引号括起来,例如以下作品。
当我使用最新版本的 commons-csv 时,您的输入甚至出现异常 (IOException: (line 1) invalid char between encapsulated token and delimiter
)
因此,要正确包含 double-quotes,您需要使用以下内容
DAR_123451 ,"XXXXX Hello World ""Hello World"" XXX "
DAR_123452 ,"XXXXX Hello World ""Hello World"" XXX "
然后 test-case 按预期工作:
Reader in = new StringReader(
"DAR_123451 ,\"XXXXX Hello World \"\"Hello World XXX\"\" \"\n" +
"DAR_123452 ,\"XXXXX Hello World \"\"Hello World XXX\"\" \"");
Iterable<CSVRecord> records = CSVFormat.DEFAULT.parse(in);
for (CSVRecord record : records) {
for (int i = 0; i < record.size(); i++) {
System.out.println("At " + i + ": " + record.get(i));
}
}
输出:
At 0: DAR_123451
At 1: XXXXX Hello World "Hello World XXX"
At 0: DAR_123452
At 1: XXXXX Hello World "Hello World XXX"
详情见https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality。