搜索文本文件中大写单词的数量 Python
Searching for the amount of capital words in a text file Python
我需要帮助整理文本文件
我已经尝试了 for 循环的多种变体。我还尝试去除所有空格并单独计算文件中的字母。我还尝试了 strip 函数的多种变体和不同的 if 语句
for character in file:
if character.isupper():
capital += 1
file.readline().rstrip()
break
print(capital)
我希望程序能够读取文档中的每个单词或字母以及 return 其中包含的大写单词的总数。
两件事:
- 确保您迭代的是字符而不是单词或句子。放一些打印语句来检查。
- 删除 if 块中的 break 语句。这将立即退出您的 for 循环,并导致您只计算 1。
for sentence in file:
for char in sentence:
if char.isupper():
capital += 1
print(capital)
如果 objective 是计算以大写字母开头的单词,那么我会使用布尔值是整数子类型的事实:
with open('my_textfile.txt', 'r') as text:
print(sum(word.istitle() for row in text for word in row))
假设我们有一个包含以下内容的示例文件 doc.txt
:
This is a test file for identifying Capital Words.
I created this as an Example because the question's requirements could vary.
For instance, should acronyms like SQL count as capital words?
If no: this should result in eight capital words.
If yes: this should result in nine.
如果你想计算大写(又名首字母大写)单词,但排除首字母缩略词等全部大写的单词,你可以这样做:
def count_capital_words(filename):
count = 0
with open(filename, 'r') as fp:
for line in fp:
for word in line.split():
if word.istitle():
print(word)
count += 1
return count
print(count_capital_words('doc.txt')) # 8
如果要计算全部大写的单词,可以修改函数只检查单词的首字母。请注意,filter(None, ...)
函数将确保 word
永远不会为空字符串,从而避免在这些情况下抛出的 IndexError
:
def count_capital_words(filename):
count = 0
with open(filename, 'r') as fp:
for line in fp:
for word in filter(None, line.split()):
if word[0].isupper():
count += 1
return count
print(count_capital_words('doc.txt')) # 9
如果你有更复杂的需求,你可以像这样得到一个可迭代的单词:
from itertools import chain
def get_words(filename):
with open(filename, 'r') as fp:
words = chain.from_iterable(line.split() for line in fp)
yield from words
我需要帮助整理文本文件
我已经尝试了 for 循环的多种变体。我还尝试去除所有空格并单独计算文件中的字母。我还尝试了 strip 函数的多种变体和不同的 if 语句
for character in file:
if character.isupper():
capital += 1
file.readline().rstrip()
break
print(capital)
我希望程序能够读取文档中的每个单词或字母以及 return 其中包含的大写单词的总数。
两件事:
- 确保您迭代的是字符而不是单词或句子。放一些打印语句来检查。
- 删除 if 块中的 break 语句。这将立即退出您的 for 循环,并导致您只计算 1。
for sentence in file:
for char in sentence:
if char.isupper():
capital += 1
print(capital)
如果 objective 是计算以大写字母开头的单词,那么我会使用布尔值是整数子类型的事实:
with open('my_textfile.txt', 'r') as text:
print(sum(word.istitle() for row in text for word in row))
假设我们有一个包含以下内容的示例文件 doc.txt
:
This is a test file for identifying Capital Words. I created this as an Example because the question's requirements could vary. For instance, should acronyms like SQL count as capital words? If no: this should result in eight capital words. If yes: this should result in nine.
如果你想计算大写(又名首字母大写)单词,但排除首字母缩略词等全部大写的单词,你可以这样做:
def count_capital_words(filename):
count = 0
with open(filename, 'r') as fp:
for line in fp:
for word in line.split():
if word.istitle():
print(word)
count += 1
return count
print(count_capital_words('doc.txt')) # 8
如果要计算全部大写的单词,可以修改函数只检查单词的首字母。请注意,filter(None, ...)
函数将确保 word
永远不会为空字符串,从而避免在这些情况下抛出的 IndexError
:
def count_capital_words(filename):
count = 0
with open(filename, 'r') as fp:
for line in fp:
for word in filter(None, line.split()):
if word[0].isupper():
count += 1
return count
print(count_capital_words('doc.txt')) # 9
如果你有更复杂的需求,你可以像这样得到一个可迭代的单词:
from itertools import chain
def get_words(filename):
with open(filename, 'r') as fp:
words = chain.from_iterable(line.split() for line in fp)
yield from words