将 TextBlob 的索引超出范围错误列出到 csv
list index out of range error with TextBlob to csv
我有一个很大的 csv,其中包含来自我博客的数千条评论,我想对使用 textblob 和 nltk 进行情绪分析。
我正在使用来自 https://wafawaheedas.gitbooks.io/twitter-sentiment-analysis-visualization-tutorial/sentiment-analysis-using-textblob.html 的 python 脚本,但针对 Python3 进行了修改。
'''
uses TextBlob to obtain sentiment for unique tweets
'''
from importlib import reload
import csv
from textblob import TextBlob
import sys
# to force utf-8 encoding on entire program
#sys.setdefaultencoding('utf8')
alltweets = csv.reader(open("/path/to/file.csv", 'r', encoding="utf8", newline=''))
sntTweets = csv.writer(open("/path/to/outputfile.csv", "w", newline=''))
for row in alltweets:
blob = TextBlob(row[2])
print (blob.sentiment.polarity)
if blob.sentiment.polarity > 0:
sntTweets.writerow([row[0], row[1], row[2], row[3], blob.sentiment.polarity, "positive"])
elif blob.sentiment.polarity < 0:
sntTweets.writerow([row[0], row[1], row[2], row[3], blob.sentiment.polarity, "negative"])
elif blob.sentment.polarity == 0.0:
sntTweets.writerow([row[0], row[1], row[2], row[3], blob.sentiment.polarity, "neutral"])
然而,当我运行这个时,我不断得到
$ python3 sentiment.py
Traceback (most recent call last):
File "sentiment.py", line 17, in <module>
blob = TextBlob(row[2])
IndexError: list index out of range
我知道这个错误是什么意思,但我不确定我需要做什么来修复。
对我缺少的东西有什么想法吗?谢谢!
玩了一会儿之后,我想出了一个更优雅的解决方案,使用 pandas
from textblob import TextBlob
import pandas as pd
df = pd.read_csv("pathtoinput.csv", na_values='',
encoding='utf8',keep_default_na=False, low_memory=False)
columns = ['text']
df = df[columns]
df['tweet'] = df['text'].astype('str')
df['polarity'] = df['tweet'].apply(lambda tweet:
TextBlob(tweet).sentiment.polarity)
df.loc[df.polarity > 0, 'sentiment'] ='positive'
df.loc[df.polarity == 0, 'sentiment'] ='neutral'
df.loc[df.polarity < 0, 'sentiment'] ='negative'
df.to_csv("pathtooutput.csv", encoding='utf-8', index=False)
我有一个很大的 csv,其中包含来自我博客的数千条评论,我想对使用 textblob 和 nltk 进行情绪分析。
我正在使用来自 https://wafawaheedas.gitbooks.io/twitter-sentiment-analysis-visualization-tutorial/sentiment-analysis-using-textblob.html 的 python 脚本,但针对 Python3 进行了修改。
'''
uses TextBlob to obtain sentiment for unique tweets
'''
from importlib import reload
import csv
from textblob import TextBlob
import sys
# to force utf-8 encoding on entire program
#sys.setdefaultencoding('utf8')
alltweets = csv.reader(open("/path/to/file.csv", 'r', encoding="utf8", newline=''))
sntTweets = csv.writer(open("/path/to/outputfile.csv", "w", newline=''))
for row in alltweets:
blob = TextBlob(row[2])
print (blob.sentiment.polarity)
if blob.sentiment.polarity > 0:
sntTweets.writerow([row[0], row[1], row[2], row[3], blob.sentiment.polarity, "positive"])
elif blob.sentiment.polarity < 0:
sntTweets.writerow([row[0], row[1], row[2], row[3], blob.sentiment.polarity, "negative"])
elif blob.sentment.polarity == 0.0:
sntTweets.writerow([row[0], row[1], row[2], row[3], blob.sentiment.polarity, "neutral"])
然而,当我运行这个时,我不断得到
$ python3 sentiment.py
Traceback (most recent call last):
File "sentiment.py", line 17, in <module>
blob = TextBlob(row[2])
IndexError: list index out of range
我知道这个错误是什么意思,但我不确定我需要做什么来修复。
对我缺少的东西有什么想法吗?谢谢!
玩了一会儿之后,我想出了一个更优雅的解决方案,使用 pandas
from textblob import TextBlob
import pandas as pd
df = pd.read_csv("pathtoinput.csv", na_values='',
encoding='utf8',keep_default_na=False, low_memory=False)
columns = ['text']
df = df[columns]
df['tweet'] = df['text'].astype('str')
df['polarity'] = df['tweet'].apply(lambda tweet:
TextBlob(tweet).sentiment.polarity)
df.loc[df.polarity > 0, 'sentiment'] ='positive'
df.loc[df.polarity == 0, 'sentiment'] ='neutral'
df.loc[df.polarity < 0, 'sentiment'] ='negative'
df.to_csv("pathtooutput.csv", encoding='utf-8', index=False)