为什么我的情绪分析运行这么慢？

Question

我正在尝试制作一个 GUI 应用程序，您可以在其中输入两个不同事物的 Twitter 主题标签，然后使用情感分析对它们进行比较（我现在以电影为例）。我的代码尚未完成，因为到目前为止我只显示了一个主题标签。最终结果应该是一个显示推文极性的图表（到目前为止，它只显示一部电影的极性）。虽然运行宁我的代码工作并会弹出一个图表，但它花费了大部分时间。有时它会像我预期的那样快速加载，但其他任何时候都需要很长时间，我会不耐烦并重新运行程序。代码 arranged/the 模块的使用方式是否导致了这种情况？或者情绪分析通常很慢？这是我的第一个情绪分析项目，所以我不太确定。这是我的代码，我已经取出了推特密钥和令牌，因为我不确定我是否可以将它们留在那里：

import tweepy as tw
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

# authenticate twitter
auth = tw.OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_token,access_token_secret)
api = tw.API(auth,wait_on_rate_limit= True)

# GET TWEETS HERE

hashtag = ("#GreenKnight",)
query = tw.Cursor(api.search, q = hashtag).items(1000)
tweets = [{'Tweets':tweet.text, 'Timestamp':tweet.created_at}for tweet in query]
# put tweets in pandas dataframe
df = pd.DataFrame.from_dict(tweets)
df.head()

# green knight movie references
green_knight_references = ["GreenKnight", "Green Knight", "green knight", "greenknight", "'The Green Knight'"]
def identify_subject(tweet,refs):
    flag = 0
    for ref in refs:
        if tweet.find(ref) != - 1:
            flag = 1
        return flag

df['Green Knight'] = df['Tweets'].apply(lambda x: identify_subject(x, green_knight_references))

df.head(10)

# time for stop words, to clear out the language not needed
import nltk
from nltk.corpus import stopwords
from textblob import Word, TextBlob
stop_words = stopwords.words("english")
custom_stopwords = ['RT']

def preprocess_tweets(tweet,custom_stopwords):
    preprocessed_tweet = tweet
    preprocessed_tweet.replace('{^\w\s}',"")
    preprocessed_tweet = " ".join(word for word in preprocessed_tweet.split() if word not in stop_words)
    preprocessed_tweet = " ".join(word for word in preprocessed_tweet.split() if word not in custom_stopwords)
    preprocessed_tweet = " ".join(Word(word).lemmatize() for word in preprocessed_tweet.split())
    return (preprocessed_tweet)


df['Processed Tweet'] = df['Tweets'].apply(lambda x: preprocess_tweets(x, custom_stopwords))
df.head()

#visualize

df['polarity'] = df['Processed Tweet'].apply(lambda x: TextBlob(x).sentiment[0])
df['subjectivity'] = df['Processed Tweet'].apply(lambda x: TextBlob(x).sentiment[1])
df.head()
(df[df['Green Knight']==1][['Green Knight','polarity','subjectivity']].groupby('Green Knight').agg([np.mean, np.max, np.min, np.median]))


green_knight = df[df['Green Knight']==1][['Timestamp', 'polarity']]
green_knight = green_knight.sort_values(by='Timestamp', ascending=True)
green_knight['MA Polarity'] = green_knight.polarity.rolling(10, min_periods=3).mean()

green_knight.head()

fig, axes = plt.subplots(2, 1, figsize=(13, 10))

axes[0].plot(green_knight['Timestamp'], green_knight['MA Polarity'])
axes[0].set_title("\n".join(["Green Knight Tweets"]))


fig.suptitle("\n".join(["Movie tweet polarity"]), y=0.98)

plt.show()

Answer 1

我以前用 tweepy 工作过，最慢的是 Twitter 的 API。它很快就会筋疲力尽，而且不付钱给他们，这会令人沮丧:(。
使用 TextBlob 的情绪分析应该不会很慢。但是，您最好的选择是使用 cProfile 选项，如评论中提到的@osint_alex，或者对于一个简单的解决方案，只需在代码的主要 'blocks' 之间放置一些打印语句。

为什么我的情绪分析运行这么慢？

Why is my sentiment analysis running so slow?

python

twitter

sentiment-analysis

pandas

为什么我的情绪分析 运行 这么慢？

Why is my sentiment analysis running so slow?

python

twitter

sentiment-analysis

pandas

为什么我的情绪分析运行这么慢？