您如何使用 plotly 和 pandas python 库在 csv 中总结相似数据?

How do you sum up similar data in a csv using plotly and pandas python libraries?

我正在尝试使用 pandas 创建图表以读取 csv 文件并在 Python 中绘制条形图。

csv 数据看起来像这样(这不是正确的数据,它是一个示例):

day,month,year,cases,deaths,countriesAndTerritories
11,11,2020,190,230, United_States_of_America
10,11,2020,224,132, United_States_of_America
9,11,2020,80,433, United_States_of_America
8,11,2020,126,623, United_States_of_America

我已经使用以下代码成功创建了一个条形图来可视化每月的死亡人数:

import pandas as pd
import plotly.express as px

covid_data = pd.read_csv('data/data.csv')

united_states_data = covid_data[covid_data.countriesAndTerritories == 'United_States_of_America']

month_data = united_states_data[['month']]

death_data = united_states_data[['deaths']]

fig = px.bar(united_states_data, x='month', y='deaths', title='COVID-19 deaths by month')
fig.show()

问题是每个月它都会将每一天的数据堆叠在一起,并显示分隔日期的白线。我只想要每个月的数据,我不关心天数。我该怎么做?我想我必须以某种方式为每个月的总死亡人数创建一个新数据集,方法是将同一个月中每一天的数据相加?

下面的代码应该能达到你的目的:

df_plot = df.groupby('month', as_index=False).deaths.sum()
fig = px.bar(df_plot, x='month', y='deaths', title='COVID-19 deaths by month')
fig.show()

更新:我找到了一个解决方案,可以将每个月的日期组合在一起以创建每月的死亡人数。

# create a dict that holds the data I want (set deaths to 0 for now)
monthly_deaths = [
    {"month": 1, "name": 'January', "deaths": 0},
    {"month": 2, "name": 'February', "deaths": 0},
    {"month": 3, "name": 'March', "deaths": 0},
    {"month": 4, "name": 'April', "deaths": 0},
    {"month": 5, "name": 'May', "deaths": 0},
    {"month": 6, "name": 'June', "deaths": 0},
    {"month": 7, "name": 'July', "deaths": 0},
    {"month": 8, "name": 'August', "deaths": 0},
    {"month": 9, "name": 'September', "deaths": 0},
    {"month": 10, "name": 'October', "deaths": 0},
    {"month": 11, "name": 'November', "deaths": 0},
    {"month": 12, "name": 'December', "deaths": 0},
]

# create a data frame of the data I want to use to achieve this
new_data = pd.DataFrame(united_states_data)

# iterate through the rows of the data frame and update the values of the dict
for i, j in new_data.iterrows():
    if float(j[['month']]) == 1:
        monthly_deaths[0]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 2:
        monthly_deaths[1]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 3:
        monthly_deaths[2]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 4:
        monthly_deaths[3]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 5:
        monthly_deaths[4]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 6:
        monthly_deaths[5]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 7:
        monthly_deaths[6]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 8:
        monthly_deaths[7]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 9:
        monthly_deaths[8]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 10:
        monthly_deaths[9]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 11:
        monthly_deaths[10]['deaths'] += int(j[['deaths']])
    if float(j[['month']]) == 12:
        monthly_deaths[11]['deaths'] += int(j[['deaths']])

# plot the data
fig = px.bar(monthly_deaths, x='name', y='deaths', title='COVID-19 deaths by month')
fig.show()

结果是这样的: