令牌一起使用如何替换令牌?
How to replace tokens if they are used together?
我想使用 python 对 COVID-19 主题进行情感分析。问题出现了,像“积极测试”这样的条目接收到积极的极性,尽管这个陈述是消极的声明。我目前的代码如下:
import nltk
from textblob import TextBlob
from nltk.stem import WordNetLemmatizer
# Setting the test string
test_string = "He was tested positive on Covid-19"
tokens = nltk.word_tokenize(test_string)
# Lemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
tokens_lem_list = []
for word in tokens:
lem_tokens = wordnet_lemmatizer.lemmatize(word, pos="v")
tokens_lem_list.append(lem_tokens)
# List to string
tokens_lem_str = ' '.join(tokens_lem_list)
# Print the polarity of the string
print(TextBlob(tokens_lem_str).sentiment.polarity)
具有以下输出:
0.22727272727272727
Process finished with exit code 0
因此,我想删除标记“test”和“positive”,如果它们一起使用,并用“ill”替换它们。我应该使用循环还是只会用大量文本耗尽我的计算能力?
非常感谢您的帮助!
我已经解决了我的问题如下:
# Producing a loop which finds "positive" and "negative" tested string entries
matches_positive = ["test", "positive"]
matches_negative = ["test", "negative"]
replaced_testing_term_sentence = []
for sentence_lem in sentences_list_lem:
# Constrain to replace "positive tested" by "not healthy"
if all(x in sentence_lem for x in matches_positive):
sentence_lem = [word.replace("positive", "not healthy") for word in sentence_lem]
sentence_lem.remove("test")
replaced_testing_term_sentence.append(sentence_lem)
# Constrain to replace "negative tested" by "not ill"
elif all(x in sentence_lem for x in matches_negative):
sentence_lem = [word.replace("negative", "not ill") for word in sentence_lem]
sentence_lem.remove("test")
replaced_testing_term_sentence.append(sentence_lem)
# Constrain to remain not matching sentences in the data sample
else:
replaced_testing_term_sentence.append(sentence_lem)
它完成了工作。选定的替换术语是有意选择的。如果有人看到优化的潜力,我将不胜感激。
我想使用 python 对 COVID-19 主题进行情感分析。问题出现了,像“积极测试”这样的条目接收到积极的极性,尽管这个陈述是消极的声明。我目前的代码如下:
import nltk
from textblob import TextBlob
from nltk.stem import WordNetLemmatizer
# Setting the test string
test_string = "He was tested positive on Covid-19"
tokens = nltk.word_tokenize(test_string)
# Lemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
tokens_lem_list = []
for word in tokens:
lem_tokens = wordnet_lemmatizer.lemmatize(word, pos="v")
tokens_lem_list.append(lem_tokens)
# List to string
tokens_lem_str = ' '.join(tokens_lem_list)
# Print the polarity of the string
print(TextBlob(tokens_lem_str).sentiment.polarity)
具有以下输出:
0.22727272727272727
Process finished with exit code 0
因此,我想删除标记“test”和“positive”,如果它们一起使用,并用“ill”替换它们。我应该使用循环还是只会用大量文本耗尽我的计算能力?
非常感谢您的帮助!
我已经解决了我的问题如下:
# Producing a loop which finds "positive" and "negative" tested string entries
matches_positive = ["test", "positive"]
matches_negative = ["test", "negative"]
replaced_testing_term_sentence = []
for sentence_lem in sentences_list_lem:
# Constrain to replace "positive tested" by "not healthy"
if all(x in sentence_lem for x in matches_positive):
sentence_lem = [word.replace("positive", "not healthy") for word in sentence_lem]
sentence_lem.remove("test")
replaced_testing_term_sentence.append(sentence_lem)
# Constrain to replace "negative tested" by "not ill"
elif all(x in sentence_lem for x in matches_negative):
sentence_lem = [word.replace("negative", "not ill") for word in sentence_lem]
sentence_lem.remove("test")
replaced_testing_term_sentence.append(sentence_lem)
# Constrain to remain not matching sentences in the data sample
else:
replaced_testing_term_sentence.append(sentence_lem)
它完成了工作。选定的替换术语是有意选择的。如果有人看到优化的潜力,我将不胜感激。