Pandas 按所选日期分组
Pandas group by selected dates
我有一个与此数据框非常相似的数据框:
index
date
month
0
2019-12-1
12
1
2020-03-1
3
2
2020-07-1
7
3
2021-02-1
2
4
2021-09-1
9
我想合并所有最接近一组月份的日期。月份需要像这样标准化:
Months
Normalized month
3, 4, 5
4
6, 7, 8, 9
8
1, 2, 10, 11, 12
12
所以输出将是:
index
date
month
0
2019-12-1
12
1
2020-04-1
4
2
2020-08-1
8
3
2020-12-1
12
4
2021-08-1
8
您可以尝试创建一个月份字典,其中:
norm_month_dict = {3: 4, 4: 4, 5: 4, 6: 8, 7: 8, 8: 8, 9: 8, 1: 12, 2: 12, 10: 12, 11: 12, 12: 12}
然后使用此字典将月份值映射到它们各自的规范化月份值。
df['normalized_months'] = df.months.map(norm_month_dict)
您需要从第二个数据帧构建字典(假设 df1
和 df2
):
d = (
df2.assign(Months=df2['Months'].str.split(', '))
.explode('Months').astype(int)
.set_index('Months')['Normalized month'].to_dict()
)
# {3: 4, 4: 4, 5: 4, 6: 8, 7: 8, 8: 8, 9: 8, 1: 12, 2: 12, 10: 12, 11: 12, 12: 12}
然后 map
值:
df1['month'] = df1['month'].map(d)
输出:
index date month
0 0 2019-12-1 12
1 1 2020-03-1 4
2 2 2020-07-1 8
3 3 2021-02-1 12
4 4 2021-09-1 8`
您可以遍历 DataFrame 并使用 replace 来更改日期。
import pandas as pd
df = pd.DataFrame(data={'date': ["2019-12-1", "2020-03-1", "2020-07-1", "2021-02-1", "2021-09-1"],
'month': [12,3,7,2,9]})
for index, row in df.iterrows():
if (row['month'] in [3,4,5]):
df['month'][index] = 4
df["date"][index] = df["date"][0].replace(df["date"][0][5:7],"04")
elif (row['month'] in [6,7,8,9]):
df['month'][index] = 8
df["date"][index] = df["date"][0].replace(df["date"][0][5:7],"08")
else:
df['month'][index] = 12
df["date"][index] = df["date"][0].replace(df["date"][0][5:7],"12")
我有一个与此数据框非常相似的数据框:
index | date | month |
---|---|---|
0 | 2019-12-1 | 12 |
1 | 2020-03-1 | 3 |
2 | 2020-07-1 | 7 |
3 | 2021-02-1 | 2 |
4 | 2021-09-1 | 9 |
我想合并所有最接近一组月份的日期。月份需要像这样标准化:
Months | Normalized month |
---|---|
3, 4, 5 | 4 |
6, 7, 8, 9 | 8 |
1, 2, 10, 11, 12 | 12 |
所以输出将是:
index | date | month |
---|---|---|
0 | 2019-12-1 | 12 |
1 | 2020-04-1 | 4 |
2 | 2020-08-1 | 8 |
3 | 2020-12-1 | 12 |
4 | 2021-08-1 | 8 |
您可以尝试创建一个月份字典,其中:
norm_month_dict = {3: 4, 4: 4, 5: 4, 6: 8, 7: 8, 8: 8, 9: 8, 1: 12, 2: 12, 10: 12, 11: 12, 12: 12}
然后使用此字典将月份值映射到它们各自的规范化月份值。
df['normalized_months'] = df.months.map(norm_month_dict)
您需要从第二个数据帧构建字典(假设 df1
和 df2
):
d = (
df2.assign(Months=df2['Months'].str.split(', '))
.explode('Months').astype(int)
.set_index('Months')['Normalized month'].to_dict()
)
# {3: 4, 4: 4, 5: 4, 6: 8, 7: 8, 8: 8, 9: 8, 1: 12, 2: 12, 10: 12, 11: 12, 12: 12}
然后 map
值:
df1['month'] = df1['month'].map(d)
输出:
index date month
0 0 2019-12-1 12
1 1 2020-03-1 4
2 2 2020-07-1 8
3 3 2021-02-1 12
4 4 2021-09-1 8`
您可以遍历 DataFrame 并使用 replace 来更改日期。
import pandas as pd
df = pd.DataFrame(data={'date': ["2019-12-1", "2020-03-1", "2020-07-1", "2021-02-1", "2021-09-1"],
'month': [12,3,7,2,9]})
for index, row in df.iterrows():
if (row['month'] in [3,4,5]):
df['month'][index] = 4
df["date"][index] = df["date"][0].replace(df["date"][0][5:7],"04")
elif (row['month'] in [6,7,8,9]):
df['month'][index] = 8
df["date"][index] = df["date"][0].replace(df["date"][0][5:7],"08")
else:
df['month'][index] = 12
df["date"][index] = df["date"][0].replace(df["date"][0][5:7],"12")