修改选择参数重写结果到txt

Question

我做了一个代码，显示一个文档的句子，把某个POS出现次数超过N次的句子转储到一个文件中，代码示例中只用一个参数作为选择我只是接受超过 5 个动词的句子 .

#!/usr/bin/python
# -*- coding: utf8 -*-
import os
import nlpnet
import codecs

TAGGER = nlpnet.POSTagger('pos-pt', language='pt')

def is_worth_saving(text, pos, pos_count):
     os_words = [word for sentence in TAGGER.tag(text)
             for word in sentence
             if word[1] == pos]
return len(pos_words) >= pos_count

with codecs.open('File_with_phrases.txt', encoding='utf8') as original_file:
with codecs.open('new_file.txt', 'w') as output_file:
    for text in original_file:
        #Example of a parameter with more than 5 verbs
        if is_worth_saving(text, 'V', 5 ):
            output_file.write(text.encode('utf8') + os.linesep)

此方法只计算一种语法class。我想扩展，以便 codic 告诉任何开放的 classes 它是名词、形容词、副词还是动词。如果你在一个句子中有 n 次出现，无论是名词、形容词、副词、动词，不要只限于一个语法 class。我想到了类似的东西，但是你不能添加。:

if(word.tag == Substantivo || word.tag == Adjetivo || word.tag == Advérbio || word.tag ==  Verbo)
NumberofWordsClass++;
if(NumberofWordsClass >= 5)

Answer 1

我觉得代码应该是这样的

我假设 TAGGER returns 一次一个句子。我不知道它使用什么标签，所以你必须用你在语句中实际想要的列表替换我使用的列表，if is_worth_saving(text, ['V', 'ADJ', 'ADV', 'N'], 5 ).

两个代码更改：(1) 我刚才指出的语句，以及 (2) 函数定义中的列表理解。

import os
import nlpnet
import codecs

TAGGER = nlpnet.POSTagger('pos-pt', language='pt')

def is_worth_saving(text, pos, pos_count):
    os_words = \
        [word[1] for word in [sentence for sentence in TAGGER.tag(text)] if word[1] in pos]
    return len(pos_words) >= pos_count

with codecs.open('File_with_phrases.txt', encoding='utf8') as original_file:
    with codecs.open('new_file.txt', 'w') as output_file:
        for text in original_file:
            if is_worth_saving(text, ['V', 'ADJ', 'ADV', 'N'], 5 ):
                output_file.write(text.encode('utf8') + os.linesep)

编辑：我让自己处于劣势，因为我没有安装 nlpnet 并且我不知道出了什么问题。不过，我确实找到了一个额外的要求。代码必须能够处理这样一个事实，即 nlpnet 可以将多个词性分配给一个词。我认为这段代码可以做到这一点。我还留下了一些调试打印语句。

import os
import nlpnet
import codecs

TAGGER = nlpnet.POSTagger('pos-pt', language='pt')

def is_worth_saving(text, pos, pos_count):

    pos_words = 0
    for sentence in TAGGER.tag(text):
        print(sentence)
        for word in sentence:
            print (word)
            pos_words += 1 if set(word[1].split('+')).intersection(set(pos)) else 0

    return pos_words >= pos_count

with codecs.open('File_with_phrases.txt', encoding='utf8') as original_file:
    with codecs.open('new_file.txt', 'w') as output_file:
        for text in original_file:
            if is_worth_saving(text, ['V', 'ADJ', 'ADV', 'N'], 5 ):
                output_file.write(text.encode('utf8') + os.linesep)

修改选择参数重写结果到txt

Modify Selection Parameters Rewrite Result in txt

python

parameters

text

word

python-2.7