从rtf字符串中提取字符串内容java

Extract string content from rtf string java

我有以下 rtf 字符串:\af31507 \ltrch\fcs0 \insrsid6361256 Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}}{\rtlch\fcs1 \af31507 \ltrch\fcs0 \insrsid12283827,我想提取 Study Title 的内容,即 (Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000})。下面是我的代码

String[] arr = value.split("\s+");
//System.out.println(arr.length);
for(int j=0; j<arr.length; j++) {
    if(isNumeric(arr[j])) {
         arr[j] = "\?" + arr[j];
    }
}

在上面的代码中,我将字符串拆分为 space 并遍历数组以检查字符串中是否有任何数字,但是,isNumeric 函数无法处理 8000\u8805 之后,因为它获取的内容是 8000}}{\rtlch\fcs1。我不确定如何使用正则表达式搜索研究标题及其内容?

Study Title: {[^}]*} 将符合您的期望。演示:https://regex101.com/r/FZl1WL/1

    String s = "{\af31507 \ltrch\fcs0 \insrsid6361256 Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}}{\rtlch\fcs1 \af31507 \ltrch\fcs0 \insrsid12283827";
    Pattern p = Pattern.compile("Study Title: \{[^}]*\}");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group());
    }

输出:

Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}

根据 OP 要求更新

String s = "{\af31507 \ltrch\fcs0 \insrsid6361256 Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}}{\rtlch\fcs1 \af31507 \ltrch\fcs0 \insrsid12283827";
    Pattern p = Pattern.compile("(?<=Study Title: \{)[^}]*(?=\})");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group());
    }

Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000