Pandas 按所选日期分组

Pandas group by selected dates

我有一个与此数据框非常相似的数据框:

index date month
0 2019-12-1 12
1 2020-03-1 3
2 2020-07-1 7
3 2021-02-1 2
4 2021-09-1 9

我想合并所有最接近一组月份的日期。月份需要像这样标准化:

Months Normalized month
3, 4, 5 4
6, 7, 8, 9 8
1, 2, 10, 11, 12 12

所以输出将是:

index date month
0 2019-12-1 12
1 2020-04-1 4
2 2020-08-1 8
3 2020-12-1 12
4 2021-08-1 8

您可以尝试创建一个月份字典,其中:

norm_month_dict = {3: 4, 4: 4, 5: 4, 6: 8, 7: 8, 8: 8, 9: 8, 1: 12, 2: 12, 10: 12, 11: 12, 12: 12}

然后使用此字典将月份值映射到它们各自的规范化月份值。

df['normalized_months'] = df.months.map(norm_month_dict)

您需要从第二个数据帧构建字典(假设 df1df2):

d = (
 df2.assign(Months=df2['Months'].str.split(', '))
    .explode('Months').astype(int)
    .set_index('Months')['Normalized month'].to_dict()
)
# {3: 4, 4: 4, 5: 4, 6: 8, 7: 8, 8: 8, 9: 8, 1: 12, 2: 12, 10: 12, 11: 12, 12: 12}

然后 map 值:

df1['month'] = df1['month'].map(d)

输出:

   index        date   month
0       0  2019-12-1      12
1       1  2020-03-1       4
2       2  2020-07-1       8
3       3  2021-02-1      12
4       4  2021-09-1       8`

您可以遍历 DataFrame 并使用 replace 来更改日期。

import pandas as pd 

df = pd.DataFrame(data={'date': ["2019-12-1", "2020-03-1", "2020-07-1", "2021-02-1", "2021-09-1"], 
                        'month': [12,3,7,2,9]})
for index, row in df.iterrows():
    if (row['month'] in [3,4,5]):
        df['month'][index] = 4
        df["date"][index]  = df["date"][0].replace(df["date"][0][5:7],"04")
    elif (row['month'] in [6,7,8,9]):
        df['month'][index] = 8
        df["date"][index]  = df["date"][0].replace(df["date"][0][5:7],"08")
    else:
        df['month'][index] = 12
        df["date"][index]  = df["date"][0].replace(df["date"][0][5:7],"12")