Pandas : 填充 0 一些分类数据
Pandas : fill with 0 some categorical data
我有一个带日期的数据集(这可能不是分类数据),我想按星期几(and/or 按月或年计算出现次数,逻辑相同).
我有这样的东西,星期几(法语...):
import numpy as np
import pandas as pd
# this is simulation, of course my datas are more complex ;-)
data = {'date': ["2000-01-01", "2000-05-01", "2000-11-11", "2000-11-01"], 'col_2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame.from_dict(data)
# date conversion
df['date'] = pd.to_datetime(df['date'])
df['day_of_week'] = df['date'].dt.day_name(locale = 'fr_FR')
# group by
data = df.groupby('day_of_week')['day_of_week'].agg(['count'])
data.reset_index(level=0, inplace=True)
# order by... order of the day in the week
JOUR_SEMAINE = ["Lundi", "Mardi", "Mercredi", "Jeudi", "Vendredi", "Samedi", "Dimanche"]
MAPPING_SEMAINE = {day: i for i, day in enumerate(JOUR_SEMAINE)}
key = data['day_of_week'].map(MAPPING_SEMAINE)
data = data.iloc[key.argsort()]
# see
data
.
day_of_week
count
0
Lundi
1
1
Mercredi
1
2
Samedi
2
这很好,但是我如何用 0 填充缺席天数,以获得:
.
day_of_week
count
0
Lundi
1
1
Mardi
0
3
Mercredi
1
4
Jeudi
0
5
Vendredi
0
6
Samedi
2
7
Dimanche
0
为了更通用,我正在寻找一种更自动的方法来为我的分类数组的每个索引设置一个值(在本例中为 JOUR_SEMAINE),计算值(计数)或 0...
有人知道吗?
提前致谢
首先,我将创建分类数据框。我们可以使用日历库获取工作日名称。
import pandas as pd
import calendar as cal
weekdays = [wd for wd in cal.day_name]
weekday_count_df = pd.DataFrame(index=weekdays)
weekday_count_df
接下来,我们将您的数据放入数据框中。
data = {'date': ["2000-01-01", "2000-05-01", "2000-11-11", "2000-11-01"], 'col_2': ['a', 'b', 'c', 'd']}
data_df = pd.DataFrame(data)
data_df['date'] = pd.to_datetime(data_df['date'])
data_df['day_of_week'] = data_df['date'].dt.day_name()
data_df
接下来,我们可以加入两个数据框。
data_group_df = data_df.groupby('day_of_week')['date'].count()
weekday_count_df = weekday_count_df.join(data_group_df)
weekday_count_df = weekday_count_df.fillna(0)
weekday_count_df
最后,我们可以固定列类型和名称。
weekday_count_df = weekday_count_df.rename(columns={'date':'count'})
weekday_count_df['count'] = weekday_count_df['count'].astype('int32')
weekday_count_df
最终结果是这样的:
count
Monday 1
Tuesday 0
Wednesday 1
Thursday 0
Friday 0
Saturday 2
Sunday 0
我有一个带日期的数据集(这可能不是分类数据),我想按星期几(and/or 按月或年计算出现次数,逻辑相同).
我有这样的东西,星期几(法语...):
import numpy as np
import pandas as pd
# this is simulation, of course my datas are more complex ;-)
data = {'date': ["2000-01-01", "2000-05-01", "2000-11-11", "2000-11-01"], 'col_2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame.from_dict(data)
# date conversion
df['date'] = pd.to_datetime(df['date'])
df['day_of_week'] = df['date'].dt.day_name(locale = 'fr_FR')
# group by
data = df.groupby('day_of_week')['day_of_week'].agg(['count'])
data.reset_index(level=0, inplace=True)
# order by... order of the day in the week
JOUR_SEMAINE = ["Lundi", "Mardi", "Mercredi", "Jeudi", "Vendredi", "Samedi", "Dimanche"]
MAPPING_SEMAINE = {day: i for i, day in enumerate(JOUR_SEMAINE)}
key = data['day_of_week'].map(MAPPING_SEMAINE)
data = data.iloc[key.argsort()]
# see
data
. | day_of_week | count |
---|---|---|
0 | Lundi | 1 |
1 | Mercredi | 1 |
2 | Samedi | 2 |
这很好,但是我如何用 0 填充缺席天数,以获得:
. | day_of_week | count |
---|---|---|
0 | Lundi | 1 |
1 | Mardi | 0 |
3 | Mercredi | 1 |
4 | Jeudi | 0 |
5 | Vendredi | 0 |
6 | Samedi | 2 |
7 | Dimanche | 0 |
为了更通用,我正在寻找一种更自动的方法来为我的分类数组的每个索引设置一个值(在本例中为 JOUR_SEMAINE),计算值(计数)或 0...
有人知道吗?
提前致谢
首先,我将创建分类数据框。我们可以使用日历库获取工作日名称。
import pandas as pd
import calendar as cal
weekdays = [wd for wd in cal.day_name]
weekday_count_df = pd.DataFrame(index=weekdays)
weekday_count_df
接下来,我们将您的数据放入数据框中。
data = {'date': ["2000-01-01", "2000-05-01", "2000-11-11", "2000-11-01"], 'col_2': ['a', 'b', 'c', 'd']}
data_df = pd.DataFrame(data)
data_df['date'] = pd.to_datetime(data_df['date'])
data_df['day_of_week'] = data_df['date'].dt.day_name()
data_df
接下来,我们可以加入两个数据框。
data_group_df = data_df.groupby('day_of_week')['date'].count()
weekday_count_df = weekday_count_df.join(data_group_df)
weekday_count_df = weekday_count_df.fillna(0)
weekday_count_df
最后,我们可以固定列类型和名称。
weekday_count_df = weekday_count_df.rename(columns={'date':'count'})
weekday_count_df['count'] = weekday_count_df['count'].astype('int32')
weekday_count_df
最终结果是这样的:
count
Monday 1
Tuesday 0
Wednesday 1
Thursday 0
Friday 0
Saturday 2
Sunday 0