将 Files.lines 与 .map(line -> line.split("multiple delimiters")) 一起使用

Question

我有一个格式如下的输入文件： Ontario:Brampton:43° 41' N:79° 45' 西 Ontario:Toronto:43° 39' N:79° 23' 西 Quebec:Montreal:45° 30' N:73° 31' 西 ...

我有一个 class 命名值的位置。示例：
省份：安大略省城市：宾顿纬度：43 纬度分钟数：41 纬度方向：N LongDegrees: 79 ....等

我已经完成了一种正确解析它的方法，但我正在尝试了解是否可以使用 Java 8 使用 Streams、Lambdas 更好地完成此操作。

如果我从以下开始：

Files.lines(Paths.get(inputFile))
                
                .map(line -> line.split("\b+")) //this delimits everything
                //.filter(x -> x.startsWith(":"))
                .flatMap(Arrays::stream)
                .forEach(System.out::println);

有人可以帮我重现以下内容吗？

private void parseLine(String data) {
        int counter1 = 1;                       //1-2 province or city
        int counter2 = 1;                       //1-2 LatitudeDirection,LongitudeDirection
        int counter3 = 1;                       //1-4 LatitudeDegrees,LatitudeMinutes,LongitudeDegrees,LongitudeMinutes

        City city = new City();                 //create City object
        //String read = Arrays.toString(data);    //convert array element to String
        String[] splited = data.split(":");     //set delimiter
        
        for (String part : splited) {
            //System.out.println(part);
            char firstChar = part.charAt(0);    
            if(Character.isDigit(firstChar)){           //if the first char is a digit, then this part needs to be split again 
                String[] splited2 = part.split(" ");    //split second time with space delimiter
                for (String part2: splited2){
                    firstChar = part2.charAt(0);
                    if (Character.isDigit(firstChar)){                              //if the first char is a digit, then needs trimming
                        String parseDigits = part2.substring(0, part2.length()-1);  //trim trailing degrees or radians character
                        switch(counter2++){
                            case 1:
                                city.setLatitudeDegrees(Integer.parseInt(parseDigits));
                                //System.out.println("LatitudeDegrees: " + city.getLatitudeDegrees());
                                break;
                            case 2:
                                city.setLatitudeMinutes(Integer.parseInt(parseDigits));
                                //System.out.println("LatitudeMinutes: " + city.getLatitudeMinutes());
                                break;
                            case 3:
                                city.setLongitudeDegrees(Integer.parseInt(parseDigits));
                                //System.out.println("LongitudeDegrees: " + city.getLongitudeDegrees());
                                break;
                            case 4:
                                city.setLongitudeMinutes(Integer.parseInt(parseDigits));
                                //System.out.println("LongitudeMinutes: " + city.getLongitudeMinutes());
                                counter2 = 1;                       //reset counter2
                                break;
                        }
                    }else{
                        if(counter3 == 1){
                            city.setLatitudeDirection(part2.charAt(0));
                            //System.out.println("LatitudeDirection: " + city.getLatitudeDirection());
                            counter3++;                     //increment counter3 to use longitude next
                        }else{
                            city.setLongitudeDirection(part2.charAt(0));
                            //System.out.println("LongitudeDirection: " + city.getLongitudeDirection());
                            counter3 = 1;                   //reset counter 3
                            //System.out.println("Number of cities: " + cities.size());
                            cities.add(city);
                        }    
                    }
                }
            }else{
                if(counter1 == 1){
                    city.setProvince(part);
                    //System.out.println("\nProvince: " + city.getProvince());
                    counter1++;
                }else if(counter1 == 2){
                    city.setCity(part);
                    //System.out.println("City: " + city.getCity());
                    counter1 = 1;                       //reset counter1
                }
            }
        }
    }

毫无疑问，我的 parseLine() 方法可能有更好的解决方案，但我真的很想按照上面的概述对其进行压缩。谢谢！！

Answer 1

让我们从一些一般注意事项开始。

不推荐您的序列 .map(line -> line.split("\b+")).flatMap(Arrays::stream)。这两个步骤将首先创建一个数组，然后再创建另一个包装该数组的流。您可以使用 splitAsStream 跳过数组步骤，尽管这需要您显式处理 Pattern 而不是将其隐藏在 String.split:

中

.flatMap(Pattern.compile("\b+")::splitAsStream)

但请注意，在这种情况下，拆分成单词并没有真正的回报。

如果你想保留你原来的parseLine方法，你可以简单地做

Files.lines(Paths.get(inputFile))
     .forEach(this::parseLine);

大功告成。

但说真的，这不是真正的解决方案。要进行模式匹配，您应该使用指定用于模式匹配的库，例如the regex package。当您通过 split("\b+") 进行拆分时，您已经在使用它了，但这远远落后于它可以为您做的事情。

让我们定义模式：

(…) 形成一个允许捕获匹配部分的组，以便我们可以提取它作为我们的结果
[^:]* 指定由任意字符组成的令牌，除了冒号 ([^:]) 任意长度 (*)
\d+ 定义一个数字（d = 数字，+ = 一个或多个）
[NS] 和 [WE] 分别匹配 N 或 S 或 W 或 E 的单个字符

所以您要查找的整个模式是

([^:]*):([^:]*):(\d+)° (\d+)' ([NS]):(\d+)° (\d+)' ([WE])

整个解析例程将是：

static Pattern CITY_PATTERN=Pattern.compile(
    "([^:]*):([^:]*):(\d+)° (\d+)' ([NS]):(\d+)° (\d+)' ([WE])");

static City parseCity(String line) {
    Matcher matcher = CITY_PATTERN.matcher(line);
    if(!matcher.matches())
        throw new IllegalArgumentException(line+" doesn't match "+CITY_PATTERN);
    City city=new City();
    city.setProvince(matcher.group(1));
    city.setCity(matcher.group(2));
    city.setLatitudeDegrees(Integer.parseInt(matcher.group(3)));
    city.setLatitudeMinutes(Integer.parseInt(matcher.group(4)));
    city.setLatitudeDirection(line.charAt(matcher.start(5)));
    city.setLongitudeDegrees(Integer.parseInt(matcher.group(6)));
    city.setLongitudeMinutes(Integer.parseInt(matcher.group(7)));
    city.setLongitudeDirection(line.charAt(matcher.start(8)));
    return city;
}

我真的希望你称你的难以阅读的方法不再“浓缩”......

使用上面的例程，一个干净的基于 Stream 的处理解决方案看起来像

List<City> cities = Files.lines(Paths.get(inputFile))
    .map(ContainingClass::parseCity).collect(Collectors.toList());

将文件收集到新的城市列表中。

将 Files.lines 与 .map(line -> line.split("multiple delimiters")) 一起使用

Using Files.lines with .map(line -> line.split("multiple delimiters"))

java

lambda

filestream

java-8