Return 使用正则表达式的子字符串 [Java]

Question

我需要实现一个函数，给定一个文件名，returns一个根据正则表达式规范的子字符串

文件名是这样组成的，我需要得到加粗的字符串

Doc20191001119049_fotocontargasx_3962122_943000.jpg

Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg

Doc201910011214020_fotoesterna_ant_396024_947112.jpg

Doc201710071149010_foto_TargaMid_4007396_95010.jpg

我目前已经实现了这个：

Pattern rexExp = Pattern.compile("_[a-zA-Z0-9]+_");

但无法正常工作

Answer 1

解决方案 1：Matching/extracting

您可以在 _ 中捕获 \w+ 模式，然后是 [digits][_][digits][.][extension]:

Pattern rexExp = Pattern.compile("_(\w+)_\d+_\d+\.[^.]*$");

见regex demo

详情

_ - 下划线
(\w+) - 1+ letters/digits/_
_ - 下划线
\d+ - 1+ 位数
_\d+ - _ 和 1+ 位数字
\. - 一个点
[^.]* - .
$ - 字符串结尾。

Online Java demo:

String s = "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg";
Pattern rexExp = Pattern.compile("_(\w+)_\d+_\d+\.[^.]*$");
Matcher matcher = rexExp.matcher(s);
if (matcher.find()){
    System.out.println(matcher.group(1)); 
} // => fotoAssicurazioneCartaceo

解决方案 2：修剪掉不必要的 prefix/suffix

您可以删除所有从开始到第一个 _ 包括它，以及最后的 [digits][_][digits][.][extension]：

.replaceAll("^[^_]*_|_\d+_\d+\.[^.]*$", "")

见this regex demo

详情

^[^_]*_ - 字符串的开头，除 _ 之外的 0+ 个字符，然后是 _
| - 或
_\d+_\d+\.[^.]*$ - _, 1+ 位数字, _, 1+ 位数字, . 然后 . 以外的 0+ 个字符到字符串的结尾。

Answer 2

为了补充 Wiktor 的精确，这里有一个 "quick-and-dirty" 的方法，它对您的输入做出以下骇人听闻的假设："Required string is only non-numbers, surrounded by numbers, and the input is always a valid filepath".

public static void main(String[] args) {
  String[] strs = {"Doc20191001119049_fotocontargasx_3962122_943000.jpg", "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg", "Doc201910011214020_fotoesterna_ant_396024_947112.jpg", "Doc201710071149010_foto_TargaMid_4007396_95010.jpg"};
  var p = Pattern.compile("_([\D_]+)_");
  for(var str : strs) {
    var m = p.matcher(str);
    if(m.find()) {
      System.out.println("found: "+m.group(1));
    }
  }
}

输出：

found: fotocontargasx
found: fotoAssicurazioneCartaceo
found: fotoesterna_ant
found: foto_TargaMid

Answer 3

模式：(?<=_).+(?=(_\d+){2}\.)

    final String s = "Doc20191001119049_fotocontargasx_3962122_943000.jpg\n"
        + "\n"
        + "Doc201810011052053_fotoAssicurazioneCartaceo_3962128_943000.jpg\n"
        + "\n"
        + "Doc201910011214020_fotoesterna_ant_396024_947112.jpg\n"
        + "\n"
        + "Doc201710071149010_foto_TargaMid_4007396_95010.jpg";
    Pattern pattern = Pattern.compile("(?<=_).+(?=(_\d+){2}\.)");
    Matcher matcher = pattern.matcher(s);
    List<String> allMatches = new ArrayList<>();

    while (matcher.find()) {
        allMatches.add(matcher.group());
    }

输出：[fotocontargasx, fotoAssicurazioneCartaceo, fotoesterna_ant, foto_TargaMid]

Return 使用正则表达式的子字符串 [Java]

Return a substring using a regExp [Java]

java

regex

android

regexp-substr

解决方案 1：Matching/extracting

解决方案 2：修剪掉不必要的 prefix/suffix