如何拆分包含特殊字符的字符串

How to split the string including the special character

string_1 = "\tVH VH VH VL N N N N N N N\n"

这里我尝试拆分其中包含 \t\n 的字符串,当我尝试使用 split 函数拆分字符串时,如下所示:

sep_setring = string_1.split()

输出:

['VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N']

但是,我需要这样的输出:

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

使用re.findall

string_1 = "\tVH VH VH VL N N N N N N N\n"
matches = re.findall(r'\S+|[^\S ]+', string_1)
print(matches)

这会打印:

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

这里是对正则表达式模式的解释,它可以选择找到一组非白色space字符或一组白色space字符(space除外):

\S+      match one or more non whitespace characters
|        OR
[^\S ]+  match one or more whitespace characters excluding space itself

您可以使用 lookarounds 拆分:

(?<=\t)|(?=\n)| 
  • (?<=\t) 在左侧声明一个制表符
  • |
  • (?=\n) 断言右边换行
  • |
  • 匹配一个space

示例

import re
string_1 = "\tVH VH VH VL N N N N N N N\n"
sep_setring = re.split(r"(?<=\t)|(?=\n)| ", string_1)
print(sep_setring)

输出

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']