在 java 中使用 CSV 解析器实现键值解析器
Key value parser implementation with CSV parser in java
我正在编写一个程序来解析基于键值的日志,如下所示:
dstcountry="United States" date=2018-12-13 time=23:47:32
我正在使用 Univocity 解析器来执行此操作。这是我的代码。
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setDelimiter(' ');
parserSettings.getFormat().setQuote('"');
parserSettings.getFormat().setQuoteEscape('"');
parserSettings.getFormat().setCharToEscapeQuoteEscaping('"');
CsvParser keyValueParser = new CsvParser(parserSettings);
String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
String[] resp = keyValueParser.parseLine(line);
但是解析器给我这个输出:
dstcountry="United,
States",
date=2018-12-13,
time=23:47:32
预期输出是
dstcountry="United States",
date=2018-12-13,
time=23:47:32
代码有问题还是解析器错误?
此致,
哈里
我最终编写了自己的解析器。如果有人需要,我将粘贴在这里以供将来参考。欢迎提出建议和意见。
private static final int INSIDE_QT = 1;
private static final int OUTSIDE_QT = 0;
public String[] parseLine(char delimiter, char quote, char quoteEscape, char charToEscapeQuoteEscaping, String logLine) {
char[] line = logLine.toCharArray();
List<String> strList = new ArrayList<>();
int state = OUTSIDE_QT;
char lastChar = '[=10=]';
StringBuffer currentToken = new StringBuffer();
for (int i = 0; i < line.length; i++) {
if (state == OUTSIDE_QT) {
if (line[i] == delimiter) {
strList.add(currentToken.toString());
currentToken.setLength(0);
} else if (line[i] == quote) {
if (lastChar == quoteEscape) {
currentToken.deleteCharAt(currentToken.length() - 1);
currentToken.append(line[i]);
} else {
if (removeQuotes == false) {
currentToken.append(line[i]);
}
state = INSIDE_QT;
}
} else if (line[i] == quoteEscape) {
if (lastChar == charToEscapeQuoteEscaping) {
currentToken.deleteCharAt(currentToken.length() - 1);
currentToken.append(line[i]);
continue;
} else {
currentToken.append(line[i]);
}
} else {
currentToken.append(line[i]);
}
} else if (state == INSIDE_QT) {
if (line[i] == quote) {
if (lastChar != quoteEscape) {
if (removeQuotes == false) {
currentToken.append(line[i]);
}
if (currentToken.length() == 0) {
currentToken.append('[=10=]');
}
state = OUTSIDE_QT;
} else {
currentToken.append(line[i]);
}
} else if (line[i] == quoteEscape) {
if (lastChar == charToEscapeQuoteEscaping) {
currentToken.deleteCharAt(currentToken.length() - 1);
currentToken.append(line[i]);
continue;
} else {
currentToken.append(line[i]);
}
} else {
currentToken.append(line[i]);
}
}
lastChar = line[i];
}
if (lastChar == delimiter) {
strList.add("");
}
if (currentToken.length() > 0) {
strList.add(currentToken.toString());
}
return strList.toArray(new String[strList.size()]);
}
这里是库的作者。这不是解析器错误。您在这里遇到的问题是您没有解析 CSV 文件。
当解析器看到:dstcountry="United
,后跟 space(这是您的定界符)时,它会将其视为一个值。
引号设置仅适用于以引号字符开头的字段。由于您的输入不是 "dstcountry=""United States"""
,解析器将无法按您的意愿进行处理。没有 CSV 解析器可以为您做到这一点。
同样,您没有处理 CSV。您在这里唯一可以做的就是使用 2 个解析器实例:一个分解 =
周围的行,另一个分解结果中由 </code> 分隔的值第一个解析器。例如:</p>
<pre><code> CsvParserSettings parserSettings = new CsvParserSettings();
//break down the rows around the `=` character
parserSettings.getFormat().setDelimiter('=');
CsvParser keyValueParser = new CsvParser(parserSettings);
String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
String[] keyPairs = keyValueParser.parseLine(line);
//break down each value around the whitespace.
parserSettings.getFormat().setDelimiter(' ');
CsvParser valueParser = new CsvParser(parserSettings);
//add all values to a list
List<String> row = new ArrayList<String>();
for(String value : keyPairs){
//if a value has a whitespace, break it down using the the other parser instance
String[] values = valueParser.parseLine(value);
Collections.addAll(row, values);
}
//here is your result
System.out.println(row);
这将打印出:
[dstcountry, United States, date, 2018-12-13, time, 23:47:32]
您现在有了键值。以下代码将根据需要打印出来:
for (int i = 0; i < row.size(); i += 2) {
System.out.println(row.get(i) + " = " + row.get(i + 1));
}
输出:
dstcountry = United States
date = 2018-12-13
time = 23:47:32
希望这对您有所帮助,感谢您使用我们的解析器!
我正在编写一个程序来解析基于键值的日志,如下所示:
dstcountry="United States" date=2018-12-13 time=23:47:32
我正在使用 Univocity 解析器来执行此操作。这是我的代码。
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.getFormat().setDelimiter(' ');
parserSettings.getFormat().setQuote('"');
parserSettings.getFormat().setQuoteEscape('"');
parserSettings.getFormat().setCharToEscapeQuoteEscaping('"');
CsvParser keyValueParser = new CsvParser(parserSettings);
String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
String[] resp = keyValueParser.parseLine(line);
但是解析器给我这个输出:
dstcountry="United,
States",
date=2018-12-13,
time=23:47:32
预期输出是
dstcountry="United States",
date=2018-12-13,
time=23:47:32
代码有问题还是解析器错误?
此致,
哈里
我最终编写了自己的解析器。如果有人需要,我将粘贴在这里以供将来参考。欢迎提出建议和意见。
private static final int INSIDE_QT = 1;
private static final int OUTSIDE_QT = 0;
public String[] parseLine(char delimiter, char quote, char quoteEscape, char charToEscapeQuoteEscaping, String logLine) {
char[] line = logLine.toCharArray();
List<String> strList = new ArrayList<>();
int state = OUTSIDE_QT;
char lastChar = '[=10=]';
StringBuffer currentToken = new StringBuffer();
for (int i = 0; i < line.length; i++) {
if (state == OUTSIDE_QT) {
if (line[i] == delimiter) {
strList.add(currentToken.toString());
currentToken.setLength(0);
} else if (line[i] == quote) {
if (lastChar == quoteEscape) {
currentToken.deleteCharAt(currentToken.length() - 1);
currentToken.append(line[i]);
} else {
if (removeQuotes == false) {
currentToken.append(line[i]);
}
state = INSIDE_QT;
}
} else if (line[i] == quoteEscape) {
if (lastChar == charToEscapeQuoteEscaping) {
currentToken.deleteCharAt(currentToken.length() - 1);
currentToken.append(line[i]);
continue;
} else {
currentToken.append(line[i]);
}
} else {
currentToken.append(line[i]);
}
} else if (state == INSIDE_QT) {
if (line[i] == quote) {
if (lastChar != quoteEscape) {
if (removeQuotes == false) {
currentToken.append(line[i]);
}
if (currentToken.length() == 0) {
currentToken.append('[=10=]');
}
state = OUTSIDE_QT;
} else {
currentToken.append(line[i]);
}
} else if (line[i] == quoteEscape) {
if (lastChar == charToEscapeQuoteEscaping) {
currentToken.deleteCharAt(currentToken.length() - 1);
currentToken.append(line[i]);
continue;
} else {
currentToken.append(line[i]);
}
} else {
currentToken.append(line[i]);
}
}
lastChar = line[i];
}
if (lastChar == delimiter) {
strList.add("");
}
if (currentToken.length() > 0) {
strList.add(currentToken.toString());
}
return strList.toArray(new String[strList.size()]);
}
这里是库的作者。这不是解析器错误。您在这里遇到的问题是您没有解析 CSV 文件。
当解析器看到:dstcountry="United
,后跟 space(这是您的定界符)时,它会将其视为一个值。
引号设置仅适用于以引号字符开头的字段。由于您的输入不是 "dstcountry=""United States"""
,解析器将无法按您的意愿进行处理。没有 CSV 解析器可以为您做到这一点。
同样,您没有处理 CSV。您在这里唯一可以做的就是使用 2 个解析器实例:一个分解 =
周围的行,另一个分解结果中由 </code> 分隔的值第一个解析器。例如:</p>
<pre><code> CsvParserSettings parserSettings = new CsvParserSettings();
//break down the rows around the `=` character
parserSettings.getFormat().setDelimiter('=');
CsvParser keyValueParser = new CsvParser(parserSettings);
String line = "dstcountry=\"United States\" date=2018-12-13 time=23:47:32";
String[] keyPairs = keyValueParser.parseLine(line);
//break down each value around the whitespace.
parserSettings.getFormat().setDelimiter(' ');
CsvParser valueParser = new CsvParser(parserSettings);
//add all values to a list
List<String> row = new ArrayList<String>();
for(String value : keyPairs){
//if a value has a whitespace, break it down using the the other parser instance
String[] values = valueParser.parseLine(value);
Collections.addAll(row, values);
}
//here is your result
System.out.println(row);
这将打印出:
[dstcountry, United States, date, 2018-12-13, time, 23:47:32]
您现在有了键值。以下代码将根据需要打印出来:
for (int i = 0; i < row.size(); i += 2) {
System.out.println(row.get(i) + " = " + row.get(i + 1));
}
输出:
dstcountry = United States
date = 2018-12-13
time = 23:47:32
希望这对您有所帮助,感谢您使用我们的解析器!