自动填充缺少月份的日历
Auto Fill calendar with missing months
我有一个像这样的 table,带有日期和值,但是,如您所见,有些月份不在列表中。 (下一年的 5、6、8、10 和 1)。
Date
Any other value
2020-01-01
value
2020-02-01
value
2020-02-04
value
2020-02-04
value
2020-03-11
value
2020-04-04
value
2020-07-04
value
2020-07-04
value
2020-09-01
value
2020-11-06
value
2020-12-02
value
2021-02-04
value
2021-03-11
value
有什么方法可以自动将 table 填入这些月份,成为?
Date
Any other value
2020-01-01
value
2020-02-01
value
2020-02-04
value
2020-02-04
value
2020-03-11
value
2020-04-04
value
2020-05-01
NaN
2020-06-01
NaN
2020-07-04
value
2020-07-04
value
2020-08-01
NaN
2020-09-01
value
2020-10-01
NaN
2020-11-06
value
2020-12-02
value
2021-01-01
NaN
2021-02-04
value
2021-03-11
value
谢谢大家!
我只能想到这个:
import numpy as np
df.Date = pd.to_datetime(df.Date, format='%Y-%m-%d')
df['Date1'] = df['Date']
df = df.set_index('Date').to_period('m')
t2 = (pd.date_range(df.Date1[0], df.Date1[-1],freq='MS'))
t3 = t2.to_period('m')
add_month = []
for i in range(len(t2)):
if t3[i] not in df.index:
add_month.append(t2[i])
miss_month_df = pd.DataFrame(add_month, columns=['Date1'])
miss_month_df['Any'] = np.nan
df.reset_index(inplace=True, drop=True)
df_new = pd.concat([df, miss_month_df], ignore_index=True).sort_values(by='Date1').reset_index(drop=True)
df_new:
Any Date1
0 value 2020-01-01
1 value 2020-02-01
2 value 2020-02-04
3 value 2020-02-04
4 value 2020-03-11
5 value 2020-04-04
6 NaN 2020-05-01
7 NaN 2020-06-01
8 value 2020-07-04
9 value 2020-07-04
10 NaN 2020-08-01
11 value 2020-09-01
12 NaN 2020-10-01
13 value 2020-11-06
14 value 2020-12-02
15 NaN 2021-01-01
16 value 2021-02-04
17 value 2021-03-11
从逻辑上讲,它是对您感兴趣的所有月份的外部联接。
import pandas as pd
df = pd.DataFrame({"Date":["2019-12-31","2020-01-31","2020-02-03","2020-02-03","2020-03-10","2020-04-03","2020-07-03","2020-07-03","2020-08-31","2020-11-05","2020-12-01","2021-02-03","2021-03-10"],"Any other value":["value","value","value","value","value","value","value","value","value","value","value","value","value"]})
df["Date"] = pd.to_datetime(df["Date"])
df["month"] = (df['Date'] - pd.offsets.MonthBegin(1)).dt.floor('d')
df = df.merge(
pd.DataFrame({"month":pd.date_range(df["month"].min(), df["month"].max(), freq="MS")}),
on="month", how="outer")
df["Date"].fillna(df["month"], inplace=True)
df = df.drop(columns="month")
print(df.to_string(index=False))
输出
Date Any other value
2019-12-31 value
2020-01-31 value
2020-02-03 value
2020-02-03 value
2020-03-10 value
2020-04-03 value
2020-07-03 value
2020-07-03 value
2020-08-31 value
2020-11-05 value
2020-12-01 value
2021-02-03 value
2021-03-10 value
2020-05-01 NaN
2020-06-01 NaN
2020-09-01 NaN
2020-10-01 NaN
2020-12-01 NaN
2021-01-01 NaN
我有一个像这样的 table,带有日期和值,但是,如您所见,有些月份不在列表中。 (下一年的 5、6、8、10 和 1)。
Date | Any other value |
---|---|
2020-01-01 | value |
2020-02-01 | value |
2020-02-04 | value |
2020-02-04 | value |
2020-03-11 | value |
2020-04-04 | value |
2020-07-04 | value |
2020-07-04 | value |
2020-09-01 | value |
2020-11-06 | value |
2020-12-02 | value |
2021-02-04 | value |
2021-03-11 | value |
有什么方法可以自动将 table 填入这些月份,成为?
Date | Any other value |
---|---|
2020-01-01 | value |
2020-02-01 | value |
2020-02-04 | value |
2020-02-04 | value |
2020-03-11 | value |
2020-04-04 | value |
2020-05-01 | NaN |
2020-06-01 | NaN |
2020-07-04 | value |
2020-07-04 | value |
2020-08-01 | NaN |
2020-09-01 | value |
2020-10-01 | NaN |
2020-11-06 | value |
2020-12-02 | value |
2021-01-01 | NaN |
2021-02-04 | value |
2021-03-11 | value |
谢谢大家!
我只能想到这个:
import numpy as np
df.Date = pd.to_datetime(df.Date, format='%Y-%m-%d')
df['Date1'] = df['Date']
df = df.set_index('Date').to_period('m')
t2 = (pd.date_range(df.Date1[0], df.Date1[-1],freq='MS'))
t3 = t2.to_period('m')
add_month = []
for i in range(len(t2)):
if t3[i] not in df.index:
add_month.append(t2[i])
miss_month_df = pd.DataFrame(add_month, columns=['Date1'])
miss_month_df['Any'] = np.nan
df.reset_index(inplace=True, drop=True)
df_new = pd.concat([df, miss_month_df], ignore_index=True).sort_values(by='Date1').reset_index(drop=True)
df_new:
Any Date1
0 value 2020-01-01
1 value 2020-02-01
2 value 2020-02-04
3 value 2020-02-04
4 value 2020-03-11
5 value 2020-04-04
6 NaN 2020-05-01
7 NaN 2020-06-01
8 value 2020-07-04
9 value 2020-07-04
10 NaN 2020-08-01
11 value 2020-09-01
12 NaN 2020-10-01
13 value 2020-11-06
14 value 2020-12-02
15 NaN 2021-01-01
16 value 2021-02-04
17 value 2021-03-11
从逻辑上讲,它是对您感兴趣的所有月份的外部联接。
import pandas as pd
df = pd.DataFrame({"Date":["2019-12-31","2020-01-31","2020-02-03","2020-02-03","2020-03-10","2020-04-03","2020-07-03","2020-07-03","2020-08-31","2020-11-05","2020-12-01","2021-02-03","2021-03-10"],"Any other value":["value","value","value","value","value","value","value","value","value","value","value","value","value"]})
df["Date"] = pd.to_datetime(df["Date"])
df["month"] = (df['Date'] - pd.offsets.MonthBegin(1)).dt.floor('d')
df = df.merge(
pd.DataFrame({"month":pd.date_range(df["month"].min(), df["month"].max(), freq="MS")}),
on="month", how="outer")
df["Date"].fillna(df["month"], inplace=True)
df = df.drop(columns="month")
print(df.to_string(index=False))
输出
Date Any other value
2019-12-31 value
2020-01-31 value
2020-02-03 value
2020-02-03 value
2020-03-10 value
2020-04-03 value
2020-07-03 value
2020-07-03 value
2020-08-31 value
2020-11-05 value
2020-12-01 value
2021-02-03 value
2021-03-10 value
2020-05-01 NaN
2020-06-01 NaN
2020-09-01 NaN
2020-10-01 NaN
2020-12-01 NaN
2021-01-01 NaN