使用 Python 按连续日期分组到日期范围
Grouping by consecutive dates into date ranges using Python
我有以下数据集。
ID Date
abc 2017-01-07
abc 2017-01-08
abc 2017-01-09
abc 2017-12-09
xyz 2017-01-05
xyz 2017-01-06
xyz 2017-04-15
xyz 2017-04-16
我能够生成以下输出
ID Count
abc 3
abc 1
xyz 2
xyz 2
使用
中提到的以下代码
d = {
'ID': ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz', 'xyz'],
'Date': ['2017-01-07','2017-01-08', '2017-01-09', '2017-12-09', '2017-01-05', '2017-01-06', '2017-04-15', '2017-04-16']
}
df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df['Date'])
series = df.groupby('ID').Date.diff().dt.days.ne(1).cumsum()
df.groupby(['ID', series]).size().reset_index(level=1, drop=True)
如何获得以下输出?
ID Start End
abc 2017-01-07 2017-01-09
abc 2017-12-09 2017-12-09
xyz 2017-01-05 2017-01-06
xyz 2017-04-15 2017-04-16
您可以使用:
series = df.groupby('ID').Date.diff().dt.days.ne(1).cumsum()
(df.groupby(['ID', series])
.agg(Start=('Date', 'min'), End=('Date', 'min'))
.droplevel(1)
.reset_index()
)
输出:
ID Start End
0 abc 2017-01-07 2017-01-07
1 abc 2017-12-09 2017-12-09
2 xyz 2017-01-05 2017-01-05
3 xyz 2017-04-15 2017-04-15
使用@mozway 的回答
使用agg
:
out = df.groupby(df.groupby('ID')['Date'].diff().ne(pd.Timedelta(days=1)).cumsum()) \
['Date'].agg(**{'Start': 'first', 'End': 'last'}).reset_index()
print(out)
# Output:
Start End
Date
1 2017-01-07 2017-01-09
2 2017-12-09 2017-12-09
3 2017-01-05 2017-01-06
4 2017-04-15 2017-04-16
我有以下数据集。
ID Date
abc 2017-01-07
abc 2017-01-08
abc 2017-01-09
abc 2017-12-09
xyz 2017-01-05
xyz 2017-01-06
xyz 2017-04-15
xyz 2017-04-16
我能够生成以下输出
ID Count
abc 3
abc 1
xyz 2
xyz 2
使用
d = {
'ID': ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz', 'xyz'],
'Date': ['2017-01-07','2017-01-08', '2017-01-09', '2017-12-09', '2017-01-05', '2017-01-06', '2017-04-15', '2017-04-16']
}
df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df['Date'])
series = df.groupby('ID').Date.diff().dt.days.ne(1).cumsum()
df.groupby(['ID', series]).size().reset_index(level=1, drop=True)
如何获得以下输出?
ID Start End
abc 2017-01-07 2017-01-09
abc 2017-12-09 2017-12-09
xyz 2017-01-05 2017-01-06
xyz 2017-04-15 2017-04-16
您可以使用:
series = df.groupby('ID').Date.diff().dt.days.ne(1).cumsum()
(df.groupby(['ID', series])
.agg(Start=('Date', 'min'), End=('Date', 'min'))
.droplevel(1)
.reset_index()
)
输出:
ID Start End
0 abc 2017-01-07 2017-01-07
1 abc 2017-12-09 2017-12-09
2 xyz 2017-01-05 2017-01-05
3 xyz 2017-04-15 2017-04-15
使用@mozway 的回答
使用agg
:
out = df.groupby(df.groupby('ID')['Date'].diff().ne(pd.Timedelta(days=1)).cumsum()) \
['Date'].agg(**{'Start': 'first', 'End': 'last'}).reset_index()
print(out)
# Output:
Start End
Date
1 2017-01-07 2017-01-09
2 2017-12-09 2017-12-09
3 2017-01-05 2017-01-06
4 2017-04-15 2017-04-16