将文本文件段落加载到没有库的字符串中

Question

抱歉，如果这个问题对你们中的一些人来说可能看起来有点愚蠢，但我在 Python 中完全是编程的初学者，所以我很糟糕，还有很多东西要学。所以基本上我有这个由段落分隔的长文本文件，有时换行符可以是双倍或三倍以使我们的任务更加困难所以我添加了一点检查并且看起来它工作正常所以我有一个名为“段落”的变量告诉我目前在哪个段落中。现在基本上我需要扫描这个文本文件并在其中搜索一些单词序列但是换行符是这里最大的敌人，例如如果我有字符串=“虚拟文本”并且我正在调查这个：

"random questions about files with a dummy
 text and strings

 hey look a new paragraph here"

如您所见，虚拟和文本之间有一个换行符，因此无法逐行读取文件。所以我想知道将整个段落直接加载到一个字符串中，这样我什至可以更轻松地删除标点符号和其他内容，并直接检查其中是否包含这些单词序列。所有这些都必须在没有图书馆的情况下完成。但是，我的一段段落计数器代码在读取文件时起作用，所以如果可以在字符串中上传整个段落，我基本上应该使用类似“.join”的东西，直到段落增加 1，因为我们在下一个段落？有什么想法吗？

Answer 1

您可以去掉换行符。这是另一个问题的示例。

data = open('resources.txt', 'r')
book_list = []
for line in data:
    new_line = line.rstrip('\n')
    book_list.append(new_line)

Answer 2

这应该可以解决问题。它非常简短和优雅：

with open('dummy text.txt') as file:
    data = file.read().replace('\n', '')
print(data)#prints out the file

输出为：

"random questions about files with a dummy text and strings hey look a new paragraph here"

Answer 3

我觉得你不用深思熟虑。这是解决此类问题的一种非常常用的模式。

paragraphs = []
lines = []
for line in open('text.txt'):
    if not line.strip():  # empty line
        if lines:
            paragraphs.append("".join(lines))
            lines = []
    else:
        lines.append(line)
if lines:
    paragraphs.append("".join(lines))

如果 stripped 行是空的，您会遇到第二个 \n，这意味着您必须 join 段落的前几行。

如果你遇到第 3 个 \n，你不能再 join 所以删除你之前的行 (lines = [])。这样，你就不会join同一个段落了。

要检查最后一行，试试这个模式。

f = open('text.txt')
line0 = f.readline()
while True:
    # do what you have to do with the previous line, `line0`
    line = f.readline()
    if not line:    # `line0` was the last line
        # do what you have to do with the last line
        break
    line0 = line

将文本文件段落加载到没有库的字符串中

Load a text file paragraph into a string without libraries

python

string

file

txt