如何在使用 pandas 计算平均值时摆脱某些行
How to get rid of certain rows when calculating mean with pandas making a chart with plotly
所以我正在尝试使用 plotly 和 pandas 制作条形图,以绘制我拥有的数据框的不同类别的平均评分。每个类别的评分为 1-5,类别是末尾带有“评分”的列。我可以让它正常工作。但是,我要考虑的是,在我的类别(数据框中的不同列)中,如果该类别未被评级,它们的值为 -1。我想知道当我计算平均值并绘制图表时,如何确保在平均值计算期间不考虑 -1 值?
我的代码
# Plot to find mean rating for different categories
# Take columns we are interested in and stack them into column named 'category'
# This will allow us to group by category and calculate mean rating
dfm = pd.melt(df, id_vars=["id", "course", "date", "overall_rating", "job_prospects_desc", "course_lecturer_desc", "facilities_desc", "student_support_desc", "local_life_desc"],
value_vars=["job_prospects_rating", "course_lecturer_rating", "facilities_rating", "student_support_rating", "local_life_rating"],
var_name ='Category')
# Group by category and calculate mean rating
dfg = dfm.groupby(['Category']).mean().reset_index()
print(df)
fig2 = px.bar(dfg, x = 'Category', y = 'value', color = 'Category',
category_orders = {'Category':['job_prospects_rating','course_lecturer_rating','facilities_rating','student_support_rating','local_life_rating']},
color_discrete_map = {
'job_prospects_rating' : 'light blue',
'course_lecturer_rating' : 'blue',
'facilities_rating' : 'pink',
'student_support_rating' : 'purple',
'local_life_rating' : 'violet'},
title="Mean Rating For Different Student Categories At The University of Lincoln")
fig2.update_yaxes(title = 'Mean rating (1-5)')
fig2.show()
df = pd.DataFrame(data= np.array([[0,0,0,1,1,1],[1,2,-1,4,5,6],[7,8,9,10,-1,12]]).T, columns = ['Category', 'A', 'B'])
df1 = df.applymap(lambda x: x if x!= -1 else np.NaN)
df1.groupby(['Category']).mean()
逻辑非常简单:将 '-1' 替换为 NaN 并忘记它们
所以我正在尝试使用 plotly 和 pandas 制作条形图,以绘制我拥有的数据框的不同类别的平均评分。每个类别的评分为 1-5,类别是末尾带有“评分”的列。我可以让它正常工作。但是,我要考虑的是,在我的类别(数据框中的不同列)中,如果该类别未被评级,它们的值为 -1。我想知道当我计算平均值并绘制图表时,如何确保在平均值计算期间不考虑 -1 值?
我的代码
# Plot to find mean rating for different categories
# Take columns we are interested in and stack them into column named 'category'
# This will allow us to group by category and calculate mean rating
dfm = pd.melt(df, id_vars=["id", "course", "date", "overall_rating", "job_prospects_desc", "course_lecturer_desc", "facilities_desc", "student_support_desc", "local_life_desc"],
value_vars=["job_prospects_rating", "course_lecturer_rating", "facilities_rating", "student_support_rating", "local_life_rating"],
var_name ='Category')
# Group by category and calculate mean rating
dfg = dfm.groupby(['Category']).mean().reset_index()
print(df)
fig2 = px.bar(dfg, x = 'Category', y = 'value', color = 'Category',
category_orders = {'Category':['job_prospects_rating','course_lecturer_rating','facilities_rating','student_support_rating','local_life_rating']},
color_discrete_map = {
'job_prospects_rating' : 'light blue',
'course_lecturer_rating' : 'blue',
'facilities_rating' : 'pink',
'student_support_rating' : 'purple',
'local_life_rating' : 'violet'},
title="Mean Rating For Different Student Categories At The University of Lincoln")
fig2.update_yaxes(title = 'Mean rating (1-5)')
fig2.show()
df = pd.DataFrame(data= np.array([[0,0,0,1,1,1],[1,2,-1,4,5,6],[7,8,9,10,-1,12]]).T, columns = ['Category', 'A', 'B'])
df1 = df.applymap(lambda x: x if x!= -1 else np.NaN)
df1.groupby(['Category']).mean()
逻辑非常简单:将 '-1' 替换为 NaN 并忘记它们