使用 dataframe 和 matplotlib 的这段代码有什么错误

Question

我有一个 python 从 CSV 文件读取并使用 pandas 将其转换为数据帧，然后使用 matplotlib 绘制直方图。第一个任务是正确的，它读写 CSV 文件。

csv 文件是：日期","user_loc","message","full_name","country","country_code","predictions","word count"

但是绘图任务显示以下错误。

错误：

 --------------------------------------------------------------------------- IndexError                                Traceback (most recent call
 last) <ipython-input-37-5bc3925ff988> in <module>
       1 #plot word count distribution for both positive and negative sentiment
 ----> 2 x= tweet_preds["word count"][tweet_preds.predictions ==1]
       3 y= tweet_preds["word count"][tweet_preds.predictions ==0]
       4 plt.figure(figsize=(12,6))
       5 plt.xlim(0,45)

 IndexError: only integers, slices (`:`), ellipsis (`...`),
 numpy.newaxis (`None`) and integer or boolean arrays are valid indices

代码：

    # create of dataframe:
    #create column names
    col_names = ["date","user_loc","followers","friends","message","bbox_coords",
                 "full_name","country","country_code","place_type"]
    #read csv
    df_twtr = pd.read_csv("F:\AIenv\sentiment_analysis\paul_ryan_twitter.csv",names = col_names)
    #check head
    df_twtr=df_twtr.dropna()
    df_twtr = df_twtr.reset_index(drop=True)
    df_twtr.head()


# run predictions on twitter data
tweet_preds = model_NB.predict(df_twtr['message'])

# append predictions to dataframe
df_tweet_preds = df_twtr.copy()
df_tweet_preds['predictions'] = tweet_preds
df_tweet_preds.shape

    df_tweet_preds = pd.DataFrame(df_tweet_preds,columns = ["date","user_loc","message","full_name","country","country_code","predictions","word count"])
    df_tweet_preds = df_tweet_preds.drop(["user_loc","country","country_code"],axis=1)
    df_tweet_preds_to_csv = df_tweet_preds.to_csv(r'F:\AIenv\sentiment_analysis\export_dataframe.csv', index = False, header=True)

     #plot word count distribution for both positive and negative sentiment
    x= tweet_preds["word count"][tweet_preds.predictions ==1]
    y= tweet_preds["word count"][tweet_preds.predictions ==0]
    plt.figure(figsize=(12,6))
    plt.xlim(0,45)
    plt.xlabel("word count")
    plt.ylabel("frequency")
    g = plt.hist([x,y],color=["r","b"],alpha=0.5,label=["positive","negative"])
    plt.legend(loc="upper right")

Answer 1

它不是数据框，它是一个 numpy 数组。您的 predict() 方法的结果是一个 numpy 数组，无法像您尝试的那样对其进行索引。为什么不只使用将预测附加到的数据框，'df_tweet_preds['predictions'] = tweet_preds'。然后你可以做各种索引。

使用 dataframe 和 matplotlib 的这段代码有什么错误

what is the error in this code using dataframe and matplotlib

python

matplotlib

sentiment-analysis

pandas