如何写这个 romove_stopwords 更快 python?
how to write this romove_stopwords faster python?
我有一个这样的函数remove_stopwords
。如何让它 运行 更快?
temp.reverse()
def drop_stopwords(text):
for x in temp:
elif len(x.split()) > 1:
text_list = text.split()
for y in range(len(text_list)-len(x.split())):
if " ".join(text_list[y:y+len(x.split())]) == x:
del text_list[y:y+len(x.split())]
text = " ".join(text_list)
else:
text = " ".join(text for text in text.split() if text not in vietnamese)
return text
解决我数据中文本的时间是 14 秒,如果我有一些像这次这样的技巧,时间将减少到 3 秒:
temp.reverse()
def drop_stopwords(text):
for x in temp:
if len(x.split()) >2:
if x in text:
text = text.replace(x,'')
elif len(x.split()) > 1:
text_list = text.split()
for y in range(len(text_list)-len(x.split())):
if " ".join(text_list[y:y+len(x.split())]) == x:
del text_list[y:y+len(x.split())]
text = " ".join(text_list)
else:
text = " ".join(text for text in text.split() if text not in vietnamese)
return text
但我认为在我的语言中有些地方可能有误。我如何在 Python 中重写此函数以使其更快(在 C 和 C++ 中,我可以使用上面的函数轻松解决它:(( )
您的函数一遍又一遍地做很多相同的事情,特别是重复 split
和 join
相同的 text
。执行单个 split
,对列表进行操作,然后在最后执行单个 join
可能会更快,并且肯定会导致更简单的代码。不幸的是,我没有您的任何示例数据来测试性能,但希望这能为您提供一些试验的东西:
temp = ["foo", "baz ola"]
def drop_stopwords(text):
text_list = text.split()
text_len = len(text_list)
for word in temp:
word_list = word.split()
word_len = len(word_list)
for i in range(text_len + 1 - word_len):
if text_list[i:i+word_len] == word_list:
text_list[i:i+word_len] = [None] * word_len
return ' '.join(t for t in text_list if t)
print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog
您也可以尝试在所有情况下迭代地执行 text.replace
并查看与更复杂的基于 split
的解决方案相比其性能如何:
temp = ["foo", "baz ola"]
def drop_stopwords(text):
for word in temp:
text = text.replace(word, '')
return ' '.join(text.split())
print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog
我有一个这样的函数remove_stopwords
。如何让它 运行 更快?
temp.reverse()
def drop_stopwords(text):
for x in temp:
elif len(x.split()) > 1:
text_list = text.split()
for y in range(len(text_list)-len(x.split())):
if " ".join(text_list[y:y+len(x.split())]) == x:
del text_list[y:y+len(x.split())]
text = " ".join(text_list)
else:
text = " ".join(text for text in text.split() if text not in vietnamese)
return text
解决我数据中文本的时间是 14 秒,如果我有一些像这次这样的技巧,时间将减少到 3 秒:
temp.reverse()
def drop_stopwords(text):
for x in temp:
if len(x.split()) >2:
if x in text:
text = text.replace(x,'')
elif len(x.split()) > 1:
text_list = text.split()
for y in range(len(text_list)-len(x.split())):
if " ".join(text_list[y:y+len(x.split())]) == x:
del text_list[y:y+len(x.split())]
text = " ".join(text_list)
else:
text = " ".join(text for text in text.split() if text not in vietnamese)
return text
但我认为在我的语言中有些地方可能有误。我如何在 Python 中重写此函数以使其更快(在 C 和 C++ 中,我可以使用上面的函数轻松解决它:(( )
您的函数一遍又一遍地做很多相同的事情,特别是重复 split
和 join
相同的 text
。执行单个 split
,对列表进行操作,然后在最后执行单个 join
可能会更快,并且肯定会导致更简单的代码。不幸的是,我没有您的任何示例数据来测试性能,但希望这能为您提供一些试验的东西:
temp = ["foo", "baz ola"]
def drop_stopwords(text):
text_list = text.split()
text_len = len(text_list)
for word in temp:
word_list = word.split()
word_len = len(word_list)
for i in range(text_len + 1 - word_len):
if text_list[i:i+word_len] == word_list:
text_list[i:i+word_len] = [None] * word_len
return ' '.join(t for t in text_list if t)
print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog
您也可以尝试在所有情况下迭代地执行 text.replace
并查看与更复杂的基于 split
的解决方案相比其性能如何:
temp = ["foo", "baz ola"]
def drop_stopwords(text):
for word in temp:
text = text.replace(word, '')
return ' '.join(text.split())
print(drop_stopwords("the quick brown foo jumped over the baz ola dog"))
# the quick brown jumped over the dog