如何在golang中实现正则表达式不匹配？

Question

This is a multiple choice question example. I want to get the chinese text like "英国、法国", "加拿大、墨西哥", "葡萄牙、加拿大", "墨西哥、德国" in the content of following code in golang, but it does not work.

package main

import (
    "fmt"
    "regexp"
    "testing"
)

func TestRegex(t *testing.T) {
    text := `（ B ）38.目前，亚马逊美国站后台，除了有美国站点外，还有(    )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国
`

    fmt.Printf("%q\n", regexp.MustCompile(`[A-E]\.(\S+)?`).FindAllStringSubmatch(text, -1))
    fmt.Printf("%q\n", regexp.MustCompile(`[A-E]\.`).Split(text, -1))
}

text:

（ B ）38.目前，亚马逊美国站后台，除了有美国站点外，还有(    )站点。
A.英国、法国B.加拿大、墨西哥
C.葡萄牙、加拿大D.墨西哥、德国

pattern: [A-E]\.(\S+)?

Actual result: [["A.英国、法国B.加拿大、墨西哥" "英国、法国B.加拿大、墨西哥"] ["C.葡萄牙、加拿大D.墨西哥、德国" "葡萄牙、加拿大D.墨西哥、德国"]].

Expect result: [["A.英国、法国" "英国、法国"] ["B.加拿大、墨西哥" "加拿大、墨西哥"] ["C.葡萄牙、加拿大" "葡萄牙、加拿大"] ["D.墨西哥、德国" "墨西哥、德国"]]

I think it might be a greedy mode problem. Because in my code, it reads option A and option B as one option directly.

Answer 1

非贪婪匹配无法解决此问题，您需要正向前瞻，re2 不支持。

解决方法是只搜索标签并手动提取标签之间的文本。

re := regexp.MustCompile(`[A-E]\.`)
res := re.FindAllStringIndex(text, -1)
results := make([][]string, len(res))
for i, m := range res {
    if i < len(res)-1 {
        results[i] = []string{text[m[0]:m[1]], text[m[1]:res[i+1][0]]}
    } else {
        results[i] = []string{text[m[0]:m[1]], text[m[1]:]}
    }
}

fmt.Printf("%q\n", results)

应该打印

[["A." "英国、法国"] ["B." "加拿大、墨西哥\n"] ["C." "葡萄牙、加拿大"] ["D." "墨西哥、德国\n"]]

如何在golang中实现正则表达式不匹配？

how to realize mismatch of regexp in golang?

go

regex-greedy