字符串替换：多行，Case-Insensitive 带特殊字符

Question

Objective：给定一个字符串，用空字符串 '' 替换每个出现的 '<?xml version="1.0" encoding="utf-8"?>' 和大写 cousins。

string.replace() 解决方案 and/or re.sub() 解决方案会很棒。基于 BeautifulSoup 模块的解决方案将被视为最后的手段。

尝试基于string.replace():

s = '1:<?xml version="1.0" encoding="utf-8"?>\n2:<?xml version="1.0" encoding="UTF-8"?>'
## 1:<?xml version="1.0" encoding="utf-8"?>
## 2:<?xml version="1.0" encoding="UTF-8"?>
h = '<?xml version="1.0" encoding="utf-8"?>'
r = s.replace(h, '')
## 1:
## 2:<?xml version="1.0" encoding="UTF-8"?>

问题：没有删除大写格式的匹配项，如 UTF-8。

尝试基于re.sub():

import re
s = '1:<?xml version="1.0" encoding="utf-8"?>\n2:<?xml version="1.0" encoding="UTF-8"?>'
## 1:<?xml version="1.0" encoding="utf-8"?>
## 2:<?xml version="1.0" encoding="UTF-8"?>
h = '<?xml version="1.0" encoding="utf-8"?>'
r = re.sub(h, '', s, flags=re.IGNORECASE | re.MULTILINE)
## 1:<?xml version="1.0" encoding="utf-8"?>
## 2:<?xml version="1.0" encoding="UTF-8"?>

问题：根本不起作用。然而，一个更简单的案例有效：

    import re
    s = '1:a\n2:A'
    ## 1:a
    ## 2:A
    h = 'a'
    r = re.sub(h, '', s, flags=re.IGNORECASE | re.MULTILINE)
    ## 1:
    ## 2:

我怀疑问题出在字符串中的特殊字符，例如<?xml，但一直未能找到解决方案。

<?xml header 由 xml 解析器通过 BeautifulSoup 模块引入到我的代码中。我在这里使用 BeautifulSoup 的方法并没有取得多大成功，例如.find_all() 和 .replace_with()。我尝试了 soup.decode_contents()，这对某些情况有效，但对其他情况无效。我没有发布我尝试过的示例，因为我不想将模块用于手头的特定任务（我有一个字符串，我想输出一个字符串，并且不希望 BeautifulSoup 否则改变字符串）。向 BS 道歉 die-hards。 ;-)

Answer 1

是的，? 和 . 是正则表达式特殊字符。您可以使用 re.escape():

来转义它们

import re
s = '1:<?xml version="1.0" encoding="utf-8"?>\n2:<?xml version="1.0" encoding="UTF-8"?>'
h = re.escape('<?xml version="1.0" encoding="utf-8"?>') # <-- put re.escape() around the string
r = re.sub(h, '', s, flags=re.IGNORECASE)               # <-- no need for RE.MULTILINE

print(r)

打印（<?xml..?> 字符串被替换）：

1:
2:

字符串替换：多行，Case-Insensitive 带特殊字符

String replacement: Multiline, Case-Insensitive with Special Characters

string

python-3.x

python-re