如何使用 python 计算文本文件的段落数?
how can i count paragraphs of text file using python?
我正在尝试编写一本书的密码解码器,以下是我目前得到的结果。
code = open("code.txt", "r").read()
my_book = open("book.txt", "r").read()
book = my_book.txt
code_line = 0
while code_line < 6 :
sl = code.split('\n')[code_line]+'\n'
paragraph_num = sl.split(' ')[0]
line_num = sl.split(' ')[1]
word_num = sl.split(' ')[2]
x = x+1
循环更改段落、行、单词变量,一切正常。
但我现在需要的是如何指定段落然后是行然后是单词
,while 循环中的 for 循环将完美运行。
所以我想从段落编号 "paragraph_num" 和行编号 "line_num" 中得到单词编号 "word_num"
这是我的代码文件,我正在尝试将其转换成文字
"paragraph number","line number","word number"
70 1 3
50 2 2
21 2 9
28 1 6
71 2 2
27 1 4
然后我希望我的输出看起来像这样
word
word
word
word
word
word
我的书 "that file that i need to get the words from" 看起来像这样
word word word word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word
word word word word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word word
理论
如果您想从文本中提取段落,您可以 split
by "\n\n"
:
>>> "word\n\nword\nword\n\nword".split("\n\n")
['word', 'word\nword', 'word']
您现在有了一个段落列表。对于每个段落,您可以按 "\n"
拆分并获得行列表。
对于每一行,您可以 split
不带参数并获得单词列表。
嵌套循环
text = """word word word word word word word word word
word word word word word word word
word word word word word word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word word word
word word word word boat word word word word word
word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word
word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word word word word word word word"""
for i, paragraph in enumerate(text.split("\n\n")):
for j, line in enumerate(paragraph.split("\n")):
for k, word in enumerate(line.split()):
print("%d, %d, %d : %s" % (i,j,k,word))
它输出:
0, 0, 0 : word
0, 0, 1 : word
0, 0, 2 : word
0, 0, 3 : word
0, 0, 4 : word
0, 0, 5 : word
0, 0, 6 : word
0, 0, 7 : word
0, 0, 8 : word
0, 1, 0 : word
0, 1, 1 : word
0, 1, 2 : word
0, 1, 3 : word
0, 1, 4 : word
0, 1, 5 : word
0, 1, 6 : word
0, 2, 0 : word
0, 2, 1 : word
0, 2, 2 : word
0, 2, 3 : word
0, 2, 4 : word
0, 2, 5 : word
0, 2, 6 : word
0, 2, 7 : word
0, 2, 8 : word
0, 2, 9 : word
0, 2, 10 : word
0, 2, 11 : word
0, 2, 12 : word
0, 2, 13 : word
0, 2, 14 : word
0, 2, 15 : word
0, 2, 16 : word
0, 2, 17 : word
0, 2, 18 : word
0, 2, 19 : word
0, 2, 20 : word
0, 3, 0 : word
0, 3, 1 : word
0, 3, 2 : word
0, 3, 3 : word
0, 3, 4 : word
0, 3, 5 : word
0, 3, 6 : word
0, 3, 7 : word
0, 3, 8 : word
0, 3, 9 : word
0, 3, 10 : word
0, 3, 11 : word
0, 3, 12 : word
0, 3, 13 : word
0, 3, 14 : word
0, 3, 15 : word
0, 3, 16 : word
0, 3, 17 : word
1, 0, 0 : word
1, 0, 1 : word
1, 0, 2 : word
1, 0, 3 : word
1, 0, 4 : boat
1, 0, 5 : word
1, 0, 6 : word
循环有助于查看所需的索引。
嵌套列表理解
如果你想要快速查找,你可以使用嵌套列表理解来创建一个“3D 列表”:
table = [[[word for word in line.split()] for line in paragraph.split("\n")] for paragraph in text.split("\n\n")]
它输出:
[[['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']], [['word', 'word', 'word', 'word', 'boat', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']], [['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']]]
您可以通过这种方式找到所需的单词:
table[1][0][4]
# "boat"
如果你有一个元组列表:
codes = [
(1, 0, 4),
(2, 1, 3)
]
for i,j,k in codes:
print(table[i][j][k])
如果有人想要另一个有点不同的代码,
因为我坚信这与此处的“书籍密码”有关arnold/book cipher with python*
我在这里发布我的代码 link;如果这个理解有误,请告诉我。
# Replace "document1.txt" with whatever your book / document's name is.
BOOK="document1.txt" # This contains your "Word Word Word Word ...." I believed from the very start that you meant, they are not the same - (obviously)
# Read book into "boktxt"
def GetBookContent(BOOK):
ReadBook = open(BOOK, "r")
txtContent_splitted = ReadBook.read();
ReadBook.close()
Words=txtContent_splitted
return(txtContent_splitted.split())
boktxt = GetBookContent(BOOK)
words=input("input text: ").split()
print("\nyou entered these words:\n",words)
i=0
words_len=len(words)
for word in boktxt:
while i < words_len:
print(boktxt.index(words[i]))
i=i+1
x=0
klist=input("input key-sequence sep. With spaces: ").split()
for keys in klist:
print(boktxt[int(klist[x])])
x=x+1
我正在尝试编写一本书的密码解码器,以下是我目前得到的结果。
code = open("code.txt", "r").read()
my_book = open("book.txt", "r").read()
book = my_book.txt
code_line = 0
while code_line < 6 :
sl = code.split('\n')[code_line]+'\n'
paragraph_num = sl.split(' ')[0]
line_num = sl.split(' ')[1]
word_num = sl.split(' ')[2]
x = x+1
循环更改段落、行、单词变量,一切正常。
但我现在需要的是如何指定段落然后是行然后是单词 ,while 循环中的 for 循环将完美运行。
所以我想从段落编号 "paragraph_num" 和行编号 "line_num" 中得到单词编号 "word_num"
这是我的代码文件,我正在尝试将其转换成文字
"paragraph number","line number","word number"
70 1 3
50 2 2
21 2 9
28 1 6
71 2 2
27 1 4
然后我希望我的输出看起来像这样
word
word
word
word
word
word
我的书 "that file that i need to get the words from" 看起来像这样
word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word
理论
如果您想从文本中提取段落,您可以 split
by "\n\n"
:
>>> "word\n\nword\nword\n\nword".split("\n\n")
['word', 'word\nword', 'word']
您现在有了一个段落列表。对于每个段落,您可以按 "\n"
拆分并获得行列表。
对于每一行,您可以 split
不带参数并获得单词列表。
嵌套循环
text = """word word word word word word word word word
word word word word word word word
word word word word word word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word word word
word word word word boat word word word word word
word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word
word word word word word word word word word word word
word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word word word word word word word"""
for i, paragraph in enumerate(text.split("\n\n")):
for j, line in enumerate(paragraph.split("\n")):
for k, word in enumerate(line.split()):
print("%d, %d, %d : %s" % (i,j,k,word))
它输出:
0, 0, 0 : word
0, 0, 1 : word
0, 0, 2 : word
0, 0, 3 : word
0, 0, 4 : word
0, 0, 5 : word
0, 0, 6 : word
0, 0, 7 : word
0, 0, 8 : word
0, 1, 0 : word
0, 1, 1 : word
0, 1, 2 : word
0, 1, 3 : word
0, 1, 4 : word
0, 1, 5 : word
0, 1, 6 : word
0, 2, 0 : word
0, 2, 1 : word
0, 2, 2 : word
0, 2, 3 : word
0, 2, 4 : word
0, 2, 5 : word
0, 2, 6 : word
0, 2, 7 : word
0, 2, 8 : word
0, 2, 9 : word
0, 2, 10 : word
0, 2, 11 : word
0, 2, 12 : word
0, 2, 13 : word
0, 2, 14 : word
0, 2, 15 : word
0, 2, 16 : word
0, 2, 17 : word
0, 2, 18 : word
0, 2, 19 : word
0, 2, 20 : word
0, 3, 0 : word
0, 3, 1 : word
0, 3, 2 : word
0, 3, 3 : word
0, 3, 4 : word
0, 3, 5 : word
0, 3, 6 : word
0, 3, 7 : word
0, 3, 8 : word
0, 3, 9 : word
0, 3, 10 : word
0, 3, 11 : word
0, 3, 12 : word
0, 3, 13 : word
0, 3, 14 : word
0, 3, 15 : word
0, 3, 16 : word
0, 3, 17 : word
1, 0, 0 : word
1, 0, 1 : word
1, 0, 2 : word
1, 0, 3 : word
1, 0, 4 : boat
1, 0, 5 : word
1, 0, 6 : word
循环有助于查看所需的索引。
嵌套列表理解
如果你想要快速查找,你可以使用嵌套列表理解来创建一个“3D 列表”:
table = [[[word for word in line.split()] for line in paragraph.split("\n")] for paragraph in text.split("\n\n")]
它输出:
[[['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']], [['word', 'word', 'word', 'word', 'boat', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']], [['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word'], ['word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word', 'word']]]
您可以通过这种方式找到所需的单词:
table[1][0][4]
# "boat"
如果你有一个元组列表:
codes = [
(1, 0, 4),
(2, 1, 3)
]
for i,j,k in codes:
print(table[i][j][k])
如果有人想要另一个有点不同的代码,
因为我坚信这与此处的“书籍密码”有关arnold/book cipher with python*
我在这里发布我的代码 link;如果这个理解有误,请告诉我。
# Replace "document1.txt" with whatever your book / document's name is.
BOOK="document1.txt" # This contains your "Word Word Word Word ...." I believed from the very start that you meant, they are not the same - (obviously)
# Read book into "boktxt"
def GetBookContent(BOOK):
ReadBook = open(BOOK, "r")
txtContent_splitted = ReadBook.read();
ReadBook.close()
Words=txtContent_splitted
return(txtContent_splitted.split())
boktxt = GetBookContent(BOOK)
words=input("input text: ").split()
print("\nyou entered these words:\n",words)
i=0
words_len=len(words)
for word in boktxt:
while i < words_len:
print(boktxt.index(words[i]))
i=i+1
x=0
klist=input("input key-sequence sep. With spaces: ").split()
for keys in klist:
print(boktxt[int(klist[x])])
x=x+1