Python 将一系列数字之间的文本行写入新文件
Python Write lines of a text in between a range of numbers to a new file
示例文本文件:
1. some text here
2. more text here
more text here
more text here
more text here
3. more text here
more text here
more text here
more text here
4. more text here
more text here
more text here
more text here
5. more text here
more text here
more text here
more text here
6. last text here
more text here
more text here
more text here
1. new text here
more text here
more text here
2. some more text
more text here
3. a bit more text
more text here
4. ok this is enough text.
1. nawww heres a bit more text.
more text here
more text here
2. okay this is the final text.
more text here
more text here
3. just to be sure this is last.
more text here
1. etc
这是我所拥有的示例文本,但要短得多。
我有这个 python 代码作为开始:
with open("text.txt") as txt_file:
lines = txt_file.readlines()
for line in lines:
if line.startswith('1.'):
print(line)
但我不知道如何将 1.
之后的所有行打印到下一个 1.
到单独的文件
我假设我必须在最后一个 if
语句中有某种 for
循环,但我不确定如何去做.
我期望结果的示例如下:
如果一行以1.
开头。将文本写入新文本文件,直到下一行以 1.
开头,然后重新开始整个过程,直到没有更多文本。
所以对于上面的示例文本,我应该有 4
个文件。
在这种情况下,文件号 1.
将包含来自 1-6
段落的所有文本。
1. some text here
2. more text here
more text here
more text here
more text here
3. more text here
more text here
more text here
more text here
4. more text here
more text here
more text here
more text here
5. more text here
more text here
more text here
more text here
6. last text here
more text here
more text here
more text here
文件编号 2.
将包含示例文本文件中所有段落 1-4
second 1.
中的所有文本 1-4
1. new text here
more text here
more text here
2. some more text
more text here
3. a bit more text
more text here
4. ok this is enough text.
文件编号 3.
将包含来自 1-3
的所有段落的示例文本文件中 third 1.
的所有文本
1. nawww heres a bit more text.
more text here
more text here
2. okay this is the final text.
more text here
more text here
3. just to be sure this is last.
more text here
等等……
我希望我的解释是正确的,而且是有道理的。
一种简单的方法是在以 1.
:
开头的每一行拆分文件
import re
with open("text.txt") as txt_file:
content = txt_file.read()
chunks = []
for match in re.split(r"(?=^1\.)", content, flags=re.MULTILINE):
if match:
chunks.append(match)
现在您有一个文本列表,每个文本都以 1.
开头,您可以遍历并保存到单个文件。
这是另一个解决方案。您可以根据需要调整它,但我找到了包含 1.
的所有行的索引,然后将这些索引之间的行写入新文件。
with open('test.txt') as f:
lines = f.readlines()
ones_index = []
for idx, line in enumerate(lines):
if '1.' in line:
ones_index.append(idx)
ones_index[len(lines):] = [len(lines)]
for i in range(len(ones_index)-1):
start = ones_index[i]
stop = ones_index[i+1]
with open('newfile-{}.txt'.format(i), 'w') as g:
g.write('\n'.join(lines[start:stop]))
编辑:我刚刚意识到这起初并没有处理最后一行。添加了一个新行来解决这个问题。
你创建了一个变量 n = 0
n = 0
for i in range(k):
while(n == i):
print(line)
if line.startswith(str(k)+"."):
n += 1
如果你愿意,你可以创建一个 dic,你可以将你的行保存为 1.line = [] 作为列表。然后您可以使用 pandas 库创建一个 csv 文件。如果我理解正确,希望这对您有所帮助。
如果你想避免将整个文件读入内存,你可以制作一个生成器,逐行收集来自文件的组,并在你有一个完整的组时产生它们。类似于:
def splitgroups(text):
lines = None
for line in text:
if line.startswith("1."):
if lines is not None:
yield lines
lines = line
else:
lines += line
yield lines
with open(filepath) as text:
# iterate over groups rather than lines
# and do what you want with each chunk:
for group in splitgroups(text):
print("*********")
print(group)
示例文本文件:
1. some text here
2. more text here
more text here
more text here
more text here
3. more text here
more text here
more text here
more text here
4. more text here
more text here
more text here
more text here
5. more text here
more text here
more text here
more text here
6. last text here
more text here
more text here
more text here
1. new text here
more text here
more text here
2. some more text
more text here
3. a bit more text
more text here
4. ok this is enough text.
1. nawww heres a bit more text.
more text here
more text here
2. okay this is the final text.
more text here
more text here
3. just to be sure this is last.
more text here
1. etc
这是我所拥有的示例文本,但要短得多。
我有这个 python 代码作为开始:
with open("text.txt") as txt_file:
lines = txt_file.readlines()
for line in lines:
if line.startswith('1.'):
print(line)
但我不知道如何将 1.
之后的所有行打印到下一个 1.
到单独的文件
我假设我必须在最后一个 if
语句中有某种 for
循环,但我不确定如何去做.
我期望结果的示例如下:
如果一行以1.
开头。将文本写入新文本文件,直到下一行以 1.
开头,然后重新开始整个过程,直到没有更多文本。
所以对于上面的示例文本,我应该有 4
个文件。
在这种情况下,文件号 1.
将包含来自 1-6
段落的所有文本。
1. some text here
2. more text here
more text here
more text here
more text here
3. more text here
more text here
more text here
more text here
4. more text here
more text here
more text here
more text here
5. more text here
more text here
more text here
more text here
6. last text here
more text here
more text here
more text here
文件编号 2.
将包含示例文本文件中所有段落 1-4
second 1.
中的所有文本 1-4
1. new text here
more text here
more text here
2. some more text
more text here
3. a bit more text
more text here
4. ok this is enough text.
文件编号 3.
将包含来自 1-3
third 1.
的所有文本
1. nawww heres a bit more text.
more text here
more text here
2. okay this is the final text.
more text here
more text here
3. just to be sure this is last.
more text here
等等……
我希望我的解释是正确的,而且是有道理的。
一种简单的方法是在以 1.
:
import re
with open("text.txt") as txt_file:
content = txt_file.read()
chunks = []
for match in re.split(r"(?=^1\.)", content, flags=re.MULTILINE):
if match:
chunks.append(match)
现在您有一个文本列表,每个文本都以 1.
开头,您可以遍历并保存到单个文件。
这是另一个解决方案。您可以根据需要调整它,但我找到了包含 1.
的所有行的索引,然后将这些索引之间的行写入新文件。
with open('test.txt') as f:
lines = f.readlines()
ones_index = []
for idx, line in enumerate(lines):
if '1.' in line:
ones_index.append(idx)
ones_index[len(lines):] = [len(lines)]
for i in range(len(ones_index)-1):
start = ones_index[i]
stop = ones_index[i+1]
with open('newfile-{}.txt'.format(i), 'w') as g:
g.write('\n'.join(lines[start:stop]))
编辑:我刚刚意识到这起初并没有处理最后一行。添加了一个新行来解决这个问题。
你创建了一个变量 n = 0
n = 0
for i in range(k):
while(n == i):
print(line)
if line.startswith(str(k)+"."):
n += 1
如果你愿意,你可以创建一个 dic,你可以将你的行保存为 1.line = [] 作为列表。然后您可以使用 pandas 库创建一个 csv 文件。如果我理解正确,希望这对您有所帮助。
如果你想避免将整个文件读入内存,你可以制作一个生成器,逐行收集来自文件的组,并在你有一个完整的组时产生它们。类似于:
def splitgroups(text):
lines = None
for line in text:
if line.startswith("1."):
if lines is not None:
yield lines
lines = line
else:
lines += line
yield lines
with open(filepath) as text:
# iterate over groups rather than lines
# and do what you want with each chunk:
for group in splitgroups(text):
print("*********")
print(group)