如何打印从 'startswith' 到 'endswith' 的字符串部分

Question

我喜欢将原始文本文件中可以在 'startswith' 和 'endswith' 字符串之间识别的部分保存到新的文本文件中。

示例：输入文本文件包含以下行：

...abc…
...starts with string...
...def...
...ends with string...
...ghi...

...jkl...
...starts with string...
...mno...
...ends with string...
...pqr...

我有兴趣将以下行提取到输出文本文件中：

starts with string...def...ends with string
starts with string...mno...ends with string

我的以下代码 returns 空列表 [ ]。请帮助更正我的代码。

with open('file_in.txt','r') as fi:
    id = []
    for ln in fi:
        if ln.startswith("start with string"):
            if ln.endswith("ends with string"):
                id.append(ln[:])
                with open(file_out.txt, 'a', encoding='utf-8') as fo:
                    fo.write (",".join(id))
print(id)

我希望 file.out.txt 包含所有以 "start with string" 开头并以 "ends with string" 结尾的字符串。

Answer 1

startswith and endswith return True or False rather than a position you can use to slice your string. Try find or index 代替。例如：

start = 'starts with string'
end = 'ends with string'
s = '...abc… ...starts with string... ...def... ...ends with string... ...ghi...'

sub = s[s.find(start):s.find(end) + len(end)]
print(sub)
# starts with string... ...def... ...ends with string

您需要在循环中添加一些检查以查看开始和结束字符串是否存在，因为如果不匹配，find 将 return -1，这将导致一些意外的切片。

Answer 2

您可以使用单独的变量来指示当前行是否是感兴趣部分的一部分，并根据开始和结束标记切换此变量。那么你也可以把这个函数变成一个生成器：

def extract(fh, start, stop):
    sub = False
    for line in fh:
        sub |= start in line
        if sub:
            yield line
            sub ^= stop in line

with open('test.txt') as fh:
    print(''.join(extract(fh, 'starts with string', 'ends with string')))

在 Python 3.8 中你可以使用 assignment expressions:

import itertools as it

def extract(fh, start, stop):
    while any(start in (line := x) for x in fh):
        yield line
        yield from it.takewhile(lambda x: stop not in x, ((line := y) for y in fh))
        yield line

with open('test.txt') as fh:
    print(''.join(extract(fh, 'starts with string', 'ends with string')))

变化：不包括开始和停止标记

如果要从输出中排除开始和停止标记，我们可以再次使用 itertools.takewhile:

import itertools as it

def extract(fh, start, stop):
    while any(start in x for x in fh):
        yield from it.takewhile(lambda x: stop not in x, fh)

with open('test.txt') as fh:
    print(''.join(extract(fh, 'starts with string', 'ends with string')))

Answer 3

在每一行的末尾都有一个字符告诉计算机显示一个新行。我在这里假设 "start with string" 和 "ends with string" 在同一行。如果不是这种情况，请在第一个 if 语句的正下方添加 --"id.append(ln[:])"--。

尝试

ln.endswith("ends with string"+'\n' )

或

ln.endswith("ends with string"+'\n' +'\r')

with open('C:\Py\testing.txt','r') as fi:
    id = []
    x = 0
    copy_line = False
    for ln in fi:
        if "starts with string" in ln:
            copy_line = True
        if copy_line:
            id.append ( ln[:] )
        if "ends with string" in ln :
            copy_line = False

    with open ('C:\Py\testing_out.txt', 'a', encoding='utf-8' ) as fo:
        fo.write (",".join(id))

print(id)

如何打印从 'startswith' 到 'endswith' 的字符串部分

How to print portion of the string from 'startswith' till 'endswith'

python

startswith

python-3.x

ends-with

变化：不包括开始和停止标记