使用 dataframe 和 matplotlib 的这段代码有什么错误
what is the error in this code using dataframe and matplotlib
我有一个 python 从 CSV 文件读取并使用 pandas 将其转换为数据帧,然后使用 matplotlib 绘制直方图。第一个任务是正确的,它读写 CSV 文件。
csv 文件是:
日期","user_loc","message","full_name","country","country_code","predictions","word count"
但是绘图任务显示以下错误。
错误:
--------------------------------------------------------------------------- IndexError Traceback (most recent call
last) <ipython-input-37-5bc3925ff988> in <module>
1 #plot word count distribution for both positive and negative sentiment
----> 2 x= tweet_preds["word count"][tweet_preds.predictions ==1]
3 y= tweet_preds["word count"][tweet_preds.predictions ==0]
4 plt.figure(figsize=(12,6))
5 plt.xlim(0,45)
IndexError: only integers, slices (`:`), ellipsis (`...`),
numpy.newaxis (`None`) and integer or boolean arrays are valid indices
代码:
# create of dataframe:
#create column names
col_names = ["date","user_loc","followers","friends","message","bbox_coords",
"full_name","country","country_code","place_type"]
#read csv
df_twtr = pd.read_csv("F:\AIenv\sentiment_analysis\paul_ryan_twitter.csv",names = col_names)
#check head
df_twtr=df_twtr.dropna()
df_twtr = df_twtr.reset_index(drop=True)
df_twtr.head()
# run predictions on twitter data
tweet_preds = model_NB.predict(df_twtr['message'])
# append predictions to dataframe
df_tweet_preds = df_twtr.copy()
df_tweet_preds['predictions'] = tweet_preds
df_tweet_preds.shape
df_tweet_preds = pd.DataFrame(df_tweet_preds,columns = ["date","user_loc","message","full_name","country","country_code","predictions","word count"])
df_tweet_preds = df_tweet_preds.drop(["user_loc","country","country_code"],axis=1)
df_tweet_preds_to_csv = df_tweet_preds.to_csv(r'F:\AIenv\sentiment_analysis\export_dataframe.csv', index = False, header=True)
#plot word count distribution for both positive and negative sentiment
x= tweet_preds["word count"][tweet_preds.predictions ==1]
y= tweet_preds["word count"][tweet_preds.predictions ==0]
plt.figure(figsize=(12,6))
plt.xlim(0,45)
plt.xlabel("word count")
plt.ylabel("frequency")
g = plt.hist([x,y],color=["r","b"],alpha=0.5,label=["positive","negative"])
plt.legend(loc="upper right")
它不是数据框,它是一个 numpy 数组。您的 predict() 方法的结果是一个 numpy 数组,无法像您尝试的那样对其进行索引。为什么不只使用将预测附加到的数据框,'df_tweet_preds['predictions'] = tweet_preds'。然后你可以做各种索引。
我有一个 python 从 CSV 文件读取并使用 pandas 将其转换为数据帧,然后使用 matplotlib 绘制直方图。第一个任务是正确的,它读写 CSV 文件。
csv 文件是: 日期","user_loc","message","full_name","country","country_code","predictions","word count"
但是绘图任务显示以下错误。
错误:
--------------------------------------------------------------------------- IndexError Traceback (most recent call
last) <ipython-input-37-5bc3925ff988> in <module>
1 #plot word count distribution for both positive and negative sentiment
----> 2 x= tweet_preds["word count"][tweet_preds.predictions ==1]
3 y= tweet_preds["word count"][tweet_preds.predictions ==0]
4 plt.figure(figsize=(12,6))
5 plt.xlim(0,45)
IndexError: only integers, slices (`:`), ellipsis (`...`),
numpy.newaxis (`None`) and integer or boolean arrays are valid indices
代码:
# create of dataframe:
#create column names
col_names = ["date","user_loc","followers","friends","message","bbox_coords",
"full_name","country","country_code","place_type"]
#read csv
df_twtr = pd.read_csv("F:\AIenv\sentiment_analysis\paul_ryan_twitter.csv",names = col_names)
#check head
df_twtr=df_twtr.dropna()
df_twtr = df_twtr.reset_index(drop=True)
df_twtr.head()
# run predictions on twitter data
tweet_preds = model_NB.predict(df_twtr['message'])
# append predictions to dataframe
df_tweet_preds = df_twtr.copy()
df_tweet_preds['predictions'] = tweet_preds
df_tweet_preds.shape
df_tweet_preds = pd.DataFrame(df_tweet_preds,columns = ["date","user_loc","message","full_name","country","country_code","predictions","word count"])
df_tweet_preds = df_tweet_preds.drop(["user_loc","country","country_code"],axis=1)
df_tweet_preds_to_csv = df_tweet_preds.to_csv(r'F:\AIenv\sentiment_analysis\export_dataframe.csv', index = False, header=True)
#plot word count distribution for both positive and negative sentiment
x= tweet_preds["word count"][tweet_preds.predictions ==1]
y= tweet_preds["word count"][tweet_preds.predictions ==0]
plt.figure(figsize=(12,6))
plt.xlim(0,45)
plt.xlabel("word count")
plt.ylabel("frequency")
g = plt.hist([x,y],color=["r","b"],alpha=0.5,label=["positive","negative"])
plt.legend(loc="upper right")
它不是数据框,它是一个 numpy 数组。您的 predict() 方法的结果是一个 numpy 数组,无法像您尝试的那样对其进行索引。为什么不只使用将预测附加到的数据框,'df_tweet_preds['predictions'] = tweet_preds'。然后你可以做各种索引。