从文件中读取文本时如何跳过"Names"？

Question

我正在编写一个程序，我应该在其中读取文件，跳过所有人员姓名并处理其他信息。

跳过读名字应该用什么逻辑

我从一个文件中读取单词，然后使用它们的出现频率制作一个词云。对于像文章这样的琐碎事情，我做了一个列表，并确保如果读过的单词在这篇文章列表中，它们就不会被计算在内。（我用字典做了这个）

但是我无法理解如何跳过阅读名称。

WordList=[]

with open('file.txt','r') as f:
    for line in f:
        for word in line.split():
            if len(word)>3:
                if word not in IgList:
                    WordList.append(word.lower())


# Get a set of unique words from the list

word_set =[]


for word in WordList[::-1]:
    if word not in word_set:
        word_set.append(word)


# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
    freq[word] = WordList.count(word) / float(len(WordList))

size=[]##Size of each word is stored here
for i in word_set:
    size.append(100*freq[i])

for i in range(0,len(word_set)):
    print size[i],word_set[i]

Answer 1

with open("filename") as f:
    rd=f.readlines()
    print (rd[:x])

x 是 names 之后的索引号，假设您知道文件中的名称在哪里。基本上它会跳过名字。例如，如果您的文件是这样的；

John 25 USA
Mary 26 Bangladesh
Usain 63 Republic of the Congo

你得写；

print (rd[1:])

或者如果是这样的话；

63 Republic of the Congo Usain
26 Bangladesh Mary
25 USA John

您必须输入；

print (rd[:1])

Answer 2

假设句子通常以冠词开头并且 "Names" 以 大写字母开头

IgList=list of articles 


with open('file.txt','r') as f:
    for line in f:
        for word in line.split():
                if word not in IgList:
                    if word[0] not in word.upper():##Cheking if first letter is Capital
                        WordList.append(word.lower())

如果单词以大写字母开头，则跳过。可以编写额外的代码来跳过第一个读取的单词。

从文件中读取文本时如何跳过"Names"？

How to skip "Names" when reading text from a file?

python

file

input

python-2.7