按文本中的特定行拆分 Python 中的字符串
Split a string in Python by a specific line in text
如果有一行只包含“----”,我想拆分正文。我正在使用 re.split(..)
方法,但它没有按预期运行。我错过了什么?
import re
s = """width:5
----
This is a test sentence to test the width thing"""
print re.split('^----
这只是打印
['width:5\n----\nThis is a test scentence to test the width thing']
, s)
这只是打印
['width:5\n----\nThis is a test scentence to test the width thing']
您缺少 MULTILINE
flag:
print re.split(r'^----$', s, flags=re.MULTILINE)
如果没有它,^
和 $
将应用于整个 s
字符串,而不是字符串中的每一行:
re.MULTILINE
When specified, the pattern character '^' matches at the beginning of
the string and at the beginning of each line (immediately following
each newline); and the pattern character '$' matches at the end of the
string and at the end of each line (immediately preceding each
newline).
演示:
>>> import re
>>>
>>> s = """width:5
... ----
... This is a test sentence to test the width thing"""
>>>
>>> print re.split(r'^----$', s, flags=re.MULTILINE)
['width:5\n', '\nThis is a test sentence to test the width thing']
您也不能使用 ^
和 $
,因为使用 ^
和 $
您指定正则表达式引擎从字符串的首尾匹配,并使用Positive look-around 保持 \n
:
>>> print re.split('(?<=\n)----(?=\n)', s)
['width:5\n', '\nThis is a test sentence to test the width thing']
另一种不使用正则表达式的拆分方法。
s.split("\n----\n")
更少的代码使其如预期般完美:
输入:
re.split('[\n-]+', s, re.MULTILINE)
输出:
['width:5', 'This is a test sentence to test the width thing']
你试过了吗:
result = re.split("^----$", subject_text, 0, re.MULTILINE)
如果有一行只包含“----”,我想拆分正文。我正在使用 re.split(..)
方法,但它没有按预期运行。我错过了什么?
import re
s = """width:5
----
This is a test sentence to test the width thing"""
print re.split('^----
这只是打印
['width:5\n----\nThis is a test scentence to test the width thing']
, s)
这只是打印
['width:5\n----\nThis is a test scentence to test the width thing']
您缺少 MULTILINE
flag:
print re.split(r'^----$', s, flags=re.MULTILINE)
如果没有它,^
和 $
将应用于整个 s
字符串,而不是字符串中的每一行:
re.MULTILINE
When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline).
演示:
>>> import re
>>>
>>> s = """width:5
... ----
... This is a test sentence to test the width thing"""
>>>
>>> print re.split(r'^----$', s, flags=re.MULTILINE)
['width:5\n', '\nThis is a test sentence to test the width thing']
您也不能使用 ^
和 $
,因为使用 ^
和 $
您指定正则表达式引擎从字符串的首尾匹配,并使用Positive look-around 保持 \n
:
>>> print re.split('(?<=\n)----(?=\n)', s)
['width:5\n', '\nThis is a test sentence to test the width thing']
另一种不使用正则表达式的拆分方法。
s.split("\n----\n")
更少的代码使其如预期般完美:
输入:
re.split('[\n-]+', s, re.MULTILINE)
输出:
['width:5', 'This is a test sentence to test the width thing']
你试过了吗:
result = re.split("^----$", subject_text, 0, re.MULTILINE)