使用 Python 按连续日期分组到日期范围

Question

我有以下数据集。

ID     Date   
abc    2017-01-07  
abc    2017-01-08  
abc    2017-01-09  
abc    2017-12-09  
xyz    2017-01-05  
xyz    2017-01-06 
xyz    2017-04-15  
xyz    2017-04-16

我能够生成以下输出

ID     Count
abc    3
abc    1
xyz    2
xyz    2

使用

中提到的以下代码

d = {
    'ID': ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz', 'xyz'],
    'Date': ['2017-01-07','2017-01-08', '2017-01-09', '2017-12-09', '2017-01-05', '2017-01-06', '2017-04-15', '2017-04-16']
}

df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df['Date'])

series = df.groupby('ID').Date.diff().dt.days.ne(1).cumsum()
df.groupby(['ID', series]).size().reset_index(level=1, drop=True)

如何获得以下输出？

ID     Start        End
abc    2017-01-07   2017-01-09
abc    2017-12-09   2017-12-09
xyz    2017-01-05   2017-01-06
xyz    2017-04-15   2017-04-16

Answer 1

您可以使用：

series = df.groupby('ID').Date.diff().dt.days.ne(1).cumsum()

(df.groupby(['ID', series])
   .agg(Start=('Date', 'min'), End=('Date', 'min'))
   .droplevel(1)
   .reset_index()
)

输出：

    ID      Start        End
0  abc 2017-01-07 2017-01-07
1  abc 2017-12-09 2017-12-09
2  xyz 2017-01-05 2017-01-05
3  xyz 2017-04-15 2017-04-15

Answer 2

使用@mozway 的回答

使用agg:

out = df.groupby(df.groupby('ID')['Date'].diff().ne(pd.Timedelta(days=1)).cumsum()) \
           ['Date'].agg(**{'Start': 'first', 'End': 'last'}).reset_index()
print(out)

# Output:
          Start        End
Date                      
1    2017-01-07 2017-01-09
2    2017-12-09 2017-12-09
3    2017-01-05 2017-01-06
4    2017-04-15 2017-04-16

使用 Python 按连续日期分组到日期范围

Grouping by consecutive dates into date ranges using Python

python

pandas

data-science