查找所有出现的正则表达式模式，但忽略包含其他模式的出现

Question

我正在尝试解析一段文本：

「<%sM_item2><%sM_plusnum2>の|　<%sM_slot>の部分を|　<%sM_change_color>に　カラーリングするのですね？|<br>|「それでは　<%sM_item>が　１０本と|　<%nM_gold>ゴールドが必要ですが　よろしいですか？|<yesno><close>

在此文本块中，我尝试对所有出现的 <???> 进行正则表达式拆分，但匹配 <%???>.

时除外

我主要使用它：

re.split(r'<((?!%).+?)>', source_text)

['「<%sM_item2><%sM_plusnum2>の|\u3000<%sM_slot>の部分を|\u3000<%sM_change_color>に\u3000カラーリングするのですね？|', 'br', '|「それでは\u3000<%sM_item>が\u3000１０
本と|\u3000<%nM_gold>ゴールドが必要ですが\u3000よろしいですか？|', 'yesno', '', 'close', '']

我的问题是虽然它保留了 <%???> 标签，但它以某种方式从匹配项中删除了 <> 字符（注意 'yesno'、'close' 和 'br' 标签不再有这些字符）。

Answer 1

基于re.split的documentation：

Split string by the occurrences of pattern. If capturing parentheses are used 
in pattern, then the text of all groups in the pattern are also returned as 
part of the resulting list.

在这种情况下，我的括号需要放在匹配的外面，以保留 ()。

re.split('(<(?!%).+?>)', source_text)
['「<%sM_item2><%sM_plusnum2>の|\u3000<%sM_slot>の部分を|\u3000<%sM_change_color>に\u3000カラーリングするのですね？|', '<br>', '|「それでは\u3000<%sM_item>が\u3000１０本と|\u3000<%nM_gold>ゴールドが必要ですが\u3000よろしいですか？|', '<yesno>', '', '<close>', '']

查找所有出现的正则表达式模式，但忽略包含其他模式的出现

Find all occurrences of regex pattern, but ignore occurrences that contain another pattern

python

regex

python-re