Java string.split( ) 在逗号或行尾

Java string.split( ) on comma or end-of-line

我不喜欢正则表达式。我有这样的数据:

abc,42,4/04/1992,,,something,   ,2/05/2007,dkwit,,334,,,

数据本身的含义有点无关紧要,关键是它是逗号分隔的,你可以将逗号之间的数据称为"columns",有些列可能是空白或空(稍后、空白列和空列将被忽略)。我需要根据逗号分隔符将字符串拆分为一个数组。我试过了

new StringTokenizer(string, ",")

但这将跳过列之间数据为空的标记,因此我尝试使用 string.split(",")。问题是它会跳过上面数据中的最后三列。您可以说在“334”之后,它的行为类似于 StringTokenizer,跳过没有空格或没有数据的列。

我能否使 string.split( ) 的行为方式使其继续拆分直到遇到行尾,或者是否有更好的方法来做到这一点?

您可以使用重载的String#split(String,int)方法,并将限制设置为负数:

String text = "abc,42,4/04/1992,,,something, ,2/05/2007,dkwit,,334,,,";
String[] tokens = text.split(",", -1);

limit 参数在链接的 Javadoc 中有解释:

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

解析 CSV(逗号分隔值)数据的最简单方法是使用 CVS 解析器。最简单的一个是 OpenCVS。以下是如何操作的示例:

String data = "abc,42,4/04/1992,,,something,   ,2/05/2007,dkwit,,334,,,";

CSVReader reader = new CSVReader(new StringReader(data));
for (String[] tokens = reader.readNext(); tokens != null; tokens = reader.readNext()) {
    for (String token : tokens){
        System.out.print("<" + token + ">\t");
    }
    System.out.println();
}

输出(我添加了 <> 以显示值的开始和结束位置):

<abc>   <42>    <4/04/1992> <>  <>  <something> <   >   <2/05/2007> <dkwit> <>  <334>   <>  <>  <>