如何写这个 romove_stopwords 更快 python?

how to write this romove_stopwords faster python?

我有一个这样的函数remove_stopwords。如何让它 运行 更快?

temp.reverse()

def drop_stopwords(text):
    
    for x in temp:
        elif len(x.split()) > 1:
            text_list = text.split()  
            for y in range(len(text_list)-len(x.split())):
                if " ".join(text_list[y:y+len(x.split())]) == x:
                    del text_list[y:y+len(x.split())]
                    text = " ".join(text_list)
        
        else:
            text = " ".join(text for text in text.split() if text not in vietnamese)

    return text

解决我数据中文本的时间是 14 秒,如果我有一些像这次这样的技巧,时间将减少到 3 秒:


temp.reverse()

def drop_stopwords(text):
    
    for x in temp:
        if len(x.split()) >2:
            if x in text:
                text = text.replace(x,'')

        elif len(x.split()) > 1:
            text_list = text.split()  
            for y in range(len(text_list)-len(x.split())):
                if " ".join(text_list[y:y+len(x.split())]) == x:
                    del text_list[y:y+len(x.split())]
                    text = " ".join(text_list)
        
        else:
            text = " ".join(text for text in text.split() if text not in vietnamese)

    return text

但我认为在我的语言中有些地方可能有误。我如何在 Python 中重写此函数以使其更快(在 C 和 C++ 中,我可以使用上面的函数轻松解决它:(( )

您的函数一遍又一遍地做很多相同的事情,特别是重复 splitjoin 相同的 text。执行单个 split,对列表进行操作,然后在最后执行单个 join 可能会更快,并且肯定会导致更简单的代码。不幸的是,我没有您的任何示例数据来测试性能,但希望这能为您提供一些试验的东西:

temp = ["foo", "baz ola"]


def drop_stopwords(text):
    text_list = text.split()
    text_len = len(text_list)
    for word in temp:
        word_list = word.split()
        word_len = len(word_list)
        for i in range(text_len + 1 - word_len):
            if text_list[i:i+word_len] == word_list:
                text_list[i:i+word_len] = [None] * word_len
    return ' '.join(t for t in text_list if t)


print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog

您也可以尝试在所有情况下迭代地执行 text.replace 并查看与更复杂的基于 split 的解决方案相比其性能如何:

temp = ["foo", "baz ola"]


def drop_stopwords(text):
    for word in temp:
        text = text.replace(word, '')
    return ' '.join(text.split())


print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog