在 Java 中按管道拆分会产生不同的结果

Splitting by pipe in Java yields different results

在任何人下结论之前,是的,我知道需要转义管道符号 :-)

...在我的代码中,我这样做了:

String line = "C0000005|A13433185|SCUI|RB|C0036775|A7466261|SCUI||R86000559||MSHFRE|MSHFRE|||N||"
line = line.trim();
String[]     columns_array = line.trim().split("\|");           // length = 15
List<String> columns_list  = Splitter.on("|").splitToList(line); // size   = 17

我正在解析一个巨大的文件 (~5GB),其中每一行都是管道分隔的,上面的 line 是该文件中的第一行,我的代码因索引越界错误而崩溃。调试后,我意识到发生了什么,并添加了 guava Splitter 行作为完整性检查。使用拆分器,我得到了预期的列表。

为什么guava splitter和native split的结果不一样?

String.split() 从结果数组中删除尾随的空字符串。在被拆分的字符串末尾有两个定界符 (...||).

以下是文档的摘录:http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split%28java.lang.String%29

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

String.split() 的 API 文档说:

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

由于这个事实,您的列表被截断了。

正如一位评论者已经指出的那样,您可以使用以下方法生成正确的结果:

String[] columns_array = line.trim().split("\|", -1);  // length 17

接受多个参数的 split(String s, int n) 函数的 API:

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length