Python df 按日期添加行,因此每个组都在同一日期结束。填充剩余的行
Python df add rows by date, so each group ends on the same date. Ffill remaining rows
为了使用地理绘图动画帧,我希望我的所有组都在同一天结束。这将避免最后一帧使某些国家变灰。目前,根据日期的最新数据点是 'Timestamp('2021-05-13 00:00:00')'。
因此,在下一步中,我想根据所有国家/地区添加新行,以便它们在 df 中有直到最新日期的行。
可以使用 ffill 填充列 'people_vaccinated_per_hundred' 和 'people_fully_vaccinated_per_hundred'。
数据:
理想情况下,如果挪威比最新数据点“2021-05-13”少 1 天,那么它应该添加一个新行,如下所示。这应该对 df 中的所有其他国家/地区完成。
例子
country iso_code date people_vaccinated_per_hundred people_fully_vaccinated_per_hundred
12028 Norway NOR 2021-05-02 0.00 NaN
12029 Norway NOR 2021-05-03 0.00 NaN
12188 Norway NOR ... ... ...
12188 Norway NOR 2021-05-11 27.81 9.55
12189 Norway NOR 2021-05-12 28.49 10.42
Add new row
12189 Norway NOR 2021-05-13 28.49 10.42
一个直截了当的方法可能是创建国家和日期的笛卡尔积,然后加入这个为每个缺失的日期和国家组合创建空值。
countries = df.loc[:, ['country', 'iso_code']].drop_duplicates()
dates = df.loc[:, 'date'].drop_duplicates()
all_countries_dates = countries.merge(dates, how='cross')
df.merge(all_countries_dates, how='right', on=['country', 'iso_code', 'date'])
数据集如下:
country iso_code date people_vaccinated people_fully_vaccinated
Norway NOR 2021-05-09 0.00 1.00
Norway NOR 2021-05-10 0.00 3.00
Norway NOR 2021-05-11 27.81 9.55
Norway NOR 2021-05-12 28.49 10.42
Norway NOR 2021-05-13 28.49 10.42
United States USA 2021-05-09 23.00 3.00
United States USA 2021-05-10 23.00 3.00
这个转换会给你:
country iso_code date people_vaccinated people_fully_vaccinated
Norway NOR 2021-05-09 0.00 1.00
Norway NOR 2021-05-10 0.00 3.00
Norway NOR 2021-05-11 27.81 9.55
Norway NOR 2021-05-12 28.49 10.42
Norway NOR 2021-05-13 28.49 10.42
United States USA 2021-05-09 23.00 3.00
United States USA 2021-05-10 23.00 3.00
United States USA 2021-05-11 NaN NaN
United States USA 2021-05-12 NaN NaN
United States USA 2021-05-13 NaN NaN
在此之后,您可以使用 fillna 更改添加行的空值。
早于 pandas 1.1.5
版本的交叉连接代码
#creating a df with all unique countries and iso_codes
#creating a new table with all the dates in the original dataframe
countries = animation_covid_df.loc[:, ['country', 'iso_code']].drop_duplicates()
dates_df = animation_covid_df.loc[:, ['date']].drop_duplicates()
#creating an index called row number to later merge the dates table with the countries table on
dates_df['row_number'] = dates_df.reset_index().index
number_of_dates = dates_df.max() #shows the number of dates or rows in the the dates table
#creating an equivalent number of rows for each country as there are dates in the dates_df
indexed_country = countries.append([countries]*number_of_dates[1],ignore_index=True)
indexed_country = indexed_country.sort_values(['country', 'iso_code'], ascending=True)
#creating a new column called 'row_number' to join the indexed_country df with the dates_df
indexed_country['row_number'] = indexed_country.groupby(['country', 'iso_code']).cumcount()+1
#merging all the indexed countries with all the possible dates on the row number
indexed_country_date_df = indexed_country.merge(dates_df, on='row_number', how='left', suffixes=('_1', '_2'))
#setting the 'date' column in both tables to datetime so they can be merged on
animation_covid_df['date'] = pd.to_datetime(animation_covid_df['date'])
indexed_country_date_df['date'] = pd.to_datetime(indexed_country_date_df['date'])
为了使用地理绘图动画帧,我希望我的所有组都在同一天结束。这将避免最后一帧使某些国家变灰。目前,根据日期的最新数据点是 'Timestamp('2021-05-13 00:00:00')'。
因此,在下一步中,我想根据所有国家/地区添加新行,以便它们在 df 中有直到最新日期的行。 可以使用 ffill 填充列 'people_vaccinated_per_hundred' 和 'people_fully_vaccinated_per_hundred'。
数据:
理想情况下,如果挪威比最新数据点“2021-05-13”少 1 天,那么它应该添加一个新行,如下所示。这应该对 df 中的所有其他国家/地区完成。
例子
country iso_code date people_vaccinated_per_hundred people_fully_vaccinated_per_hundred
12028 Norway NOR 2021-05-02 0.00 NaN
12029 Norway NOR 2021-05-03 0.00 NaN
12188 Norway NOR ... ... ...
12188 Norway NOR 2021-05-11 27.81 9.55
12189 Norway NOR 2021-05-12 28.49 10.42
Add new row
12189 Norway NOR 2021-05-13 28.49 10.42
一个直截了当的方法可能是创建国家和日期的笛卡尔积,然后加入这个为每个缺失的日期和国家组合创建空值。
countries = df.loc[:, ['country', 'iso_code']].drop_duplicates()
dates = df.loc[:, 'date'].drop_duplicates()
all_countries_dates = countries.merge(dates, how='cross')
df.merge(all_countries_dates, how='right', on=['country', 'iso_code', 'date'])
数据集如下:
country iso_code date people_vaccinated people_fully_vaccinated
Norway NOR 2021-05-09 0.00 1.00
Norway NOR 2021-05-10 0.00 3.00
Norway NOR 2021-05-11 27.81 9.55
Norway NOR 2021-05-12 28.49 10.42
Norway NOR 2021-05-13 28.49 10.42
United States USA 2021-05-09 23.00 3.00
United States USA 2021-05-10 23.00 3.00
这个转换会给你:
country iso_code date people_vaccinated people_fully_vaccinated
Norway NOR 2021-05-09 0.00 1.00
Norway NOR 2021-05-10 0.00 3.00
Norway NOR 2021-05-11 27.81 9.55
Norway NOR 2021-05-12 28.49 10.42
Norway NOR 2021-05-13 28.49 10.42
United States USA 2021-05-09 23.00 3.00
United States USA 2021-05-10 23.00 3.00
United States USA 2021-05-11 NaN NaN
United States USA 2021-05-12 NaN NaN
United States USA 2021-05-13 NaN NaN
在此之后,您可以使用 fillna 更改添加行的空值。
早于 pandas 1.1.5
版本的交叉连接代码#creating a df with all unique countries and iso_codes
#creating a new table with all the dates in the original dataframe
countries = animation_covid_df.loc[:, ['country', 'iso_code']].drop_duplicates()
dates_df = animation_covid_df.loc[:, ['date']].drop_duplicates()
#creating an index called row number to later merge the dates table with the countries table on
dates_df['row_number'] = dates_df.reset_index().index
number_of_dates = dates_df.max() #shows the number of dates or rows in the the dates table
#creating an equivalent number of rows for each country as there are dates in the dates_df
indexed_country = countries.append([countries]*number_of_dates[1],ignore_index=True)
indexed_country = indexed_country.sort_values(['country', 'iso_code'], ascending=True)
#creating a new column called 'row_number' to join the indexed_country df with the dates_df
indexed_country['row_number'] = indexed_country.groupby(['country', 'iso_code']).cumcount()+1
#merging all the indexed countries with all the possible dates on the row number
indexed_country_date_df = indexed_country.merge(dates_df, on='row_number', how='left', suffixes=('_1', '_2'))
#setting the 'date' column in both tables to datetime so they can be merged on
animation_covid_df['date'] = pd.to_datetime(animation_covid_df['date'])
indexed_country_date_df['date'] = pd.to_datetime(indexed_country_date_df['date'])