如何拆分包含特殊字符的字符串

Question

string_1 = "\tVH VH VH VL N N N N N N N\n"

这里我尝试拆分其中包含 \t 和 \n 的字符串，当我尝试使用 split 函数拆分字符串时，如下所示：

sep_setring = string_1.split()

输出：

['VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N']

但是，我需要这样的输出：

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

Answer 1

使用re.findall：

string_1 = "\tVH VH VH VL N N N N N N N\n"
matches = re.findall(r'\S+|[^\S ]+', string_1)
print(matches)

这会打印：

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

这里是对正则表达式模式的解释，它可以选择找到一组非白色space字符或一组白色space字符（space除外）：

\S+      match one or more non whitespace characters
|        OR
[^\S ]+  match one or more whitespace characters excluding space itself

Answer 2

您可以使用 lookarounds 拆分：

(?<=\t)|(?=\n)|

示例

import re
string_1 = "\tVH VH VH VL N N N N N N N\n"
sep_setring = re.split(r"(?<=\t)|(?=\n)| ", string_1)
print(sep_setring)

输出

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

How to split the string including the special character