Groupby 并按条件计数并从 Pandas Dataframe 中的日期列获取周开始(从星期日开始)和周结束?
Groupby and count with condition and get week start (from sunday) and week end from date column in Pandas Dataframe?
我有一个如下所示的数据集。我需要根据 groupby 计算状态为关闭的 ticket_id,我需要从 ticket_time 列获取周开始(星期日)和周结束。
输入数据帧
id status ticket_time product country
1260057 open 2021-10-03 01:20:00 Broadband Grenada
2998178 open 2021-10-06 00:00:00 Fixed Voice Bahamas
3762949 closed 2021-10-04 00:00:00 Fixed Voice St Lucia
3766608 closed 2021-10-10 00:00:00 Broadband St Lucia
3767125 closed 2021-10-03 00:00:00 TV Antigua
etc.................
输出数据帧 示例:
country product week_startdate week_enddate Value
antigua TV 2021-10-03 00:00:00 2021-10-10 00:00:00 Count(id) with status=closed
St Lucia Broadband ... .... ...
St Lucia Fixedvoice ... .... ...
对于每个国家和每个产品,我应该得到处于关闭状态的 id 的计数。
groupby country and product 是否有效?
输入数据集
{'ticket_id': [1260057,
2998178,
3762949,
3766608,
3767125,
6050009,
6050608,
6050972,
6052253,
6053697,
6053812,
6053848,
6053970,
6054363,
6054549,
6055529,
6057303,
6057364,
6058428,
6101321,
6103451,
6103688,
6105180,
693170,
1259365,
1259466,
1260843,
3762579,
3762778,
3764140,
3768850,
6050528,
6050595,
6051099,
6053704,
6054013,
6054582,
6055323,
6056096,
6056363,
6057701,
6058251,
6058323,
6102386,
6103121,
6104844,
6104935,
692914,
1260730,
2997275],
'status': ['open',
'open',
'closed',
'closed',
'closed',
'closed',
'open',
'open',
'closed',
'open',
'open',
'closed',
'open',
'open',
'open',
'open',
'open',
'open',
'open',
'closed',
'closed',
'closed',
'open',
'open',
'closed',
'closed',
'open',
'closed',
'closed',
'closed',
'open',
'closed',
'closed',
'closed',
'open',
'open',
'open',
'open',
'open',
'open',
'open',
'open',
'open',
'cancelled',
'open',
'open',
'open',
'closed',
'closed',
'open'],
'ticket_start_time': [Timestamp('2021-10-04 01:20:00'),
Timestamp('2021-10-02 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-02 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-02 00:00:00'),
Timestamp('2021-10-02 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-01 00:00:00')],
'product_type': ['Broadband',
'Fixed Voice',
'Fixed Voice',
'Broadband',
'TV',
'TV',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'TV',
'TV',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Fixed Voice',
'Broadband',
'Fixed Voice',
'TV',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'TV',
'Fixed Voice',
'TV',
'TV',
'TV',
'TV',
'TV',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Fixed Voice',
'Fixed Voice'],
'org_country': ['Grenada',
'Bahamas',
'St Lucia',
'St Lucia',
'Antigua',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Trinidad',
'Trinidad',
'Trinidad',
'Trinidad',
'Curacao',
'Grenada',
'Grenada',
'Grenada',
'St Lucia',
'St Lucia',
'St Vincent',
'St Lucia',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Trinidad',
'Trinidad',
'Trinidad',
'Trinidad',
'Curacao',
'Grenada',
'Bahamas']}
IIUC:
def week_start_day(s):
return s - (pd.to_timedelta((s.dt.weekday + 1)%7, unit='D'))
def week_end_day(s):
return s + pd.to_timedelta(7-(s.dt.weekday+2)%7, unit='D')
df['ticket_time'] = pd.to_datetime(df['ticket_time'])
df = df[df['status'] == 'closed'].groupby(['country', 'product']).agg(week_startdate = pd.NamedAgg(column='ticket_time', aggfunc=week_start_day),
week_enddate = pd.NamedAgg(column='ticket_time', aggfunc=week_end_day),
Value = pd.NamedAgg(column='id', aggfunc="size"))
df.reset_index(inplace=True)
OUTPUT
country product week_startdate week_enddate Value
0 Antigua TV 2021-10-03 2021-10-09 1
1 St Lucia Broadband 2021-10-10 2021-10-16 1
2 St Lucia Fixed Voice 2021-10-03 2021-10-09 1
Setup
data = {'id': [1260057, 2998178, 3762949, 3766608, 3767125],
'status': ['open', 'open', 'closed', 'closed', 'closed'],
'ticket_time': ['2021-10-03 01:20:00', '2021-10-06 00:00:00', '2021-10-04 00:00:00',
'2021-10-10 00:00:00', '2021-10-03 00:00:00'],
'product': ['Broadband', 'Fixed Voice', 'Fixed Voice', 'Broadband', 'TV'],
'country': ['Grenada', 'Bahamas', 'St Lucia', 'St Lucia', 'Antigua']}
df = pd.DataFrame.from_dict(data)
修改后的答案
经过多次解释,似乎 OP 正在寻找周的开始和结束作为聚合的结果,而不是作为分组。
因此,这里是修改后的答案:
weekday = 6 # Sunday
wo = pd.offsets.Week(weekday=weekday, normalize=True)
t = df['ticket_start_time'] + wo
gbcols = ['org_country', 'product_type']
agg_dict = {
'ticket_id': 'count',
'week_startdate': 'first',
'week_enddate': 'last',
}
out = (
df
.assign(week_startdate=t - wo, week_enddate=t)
.groupby(['ticket_id', 'status'] + gbcols).agg(agg_dict)
.query("status == 'closed'").groupby(gbcols).agg(agg_dict)
.reset_index()
.rename(columns={'ticket_id': 'Value'})
)
从提供的 'input data',我们得到:
>>> out
org_country product_type Value week_startdate week_enddate
0 Antigua TV 1 2021-10-03 2021-10-10
1 Curacao Broadband 1 2021-10-03 2021-10-10
2 Grenada Broadband 1 2021-09-26 2021-10-03
3 Grenada Fixed Voice 2 2021-09-26 2021-10-10
4 Jamaica Broadband 5 2021-09-26 2021-10-10
5 Jamaica TV 1 2021-09-26 2021-10-03
6 St Lucia Broadband 2 2021-09-26 2021-10-10
7 St Lucia Fixed Voice 1 2021-09-26 2021-10-03
8 St Lucia TV 1 2021-09-26 2021-10-03
9 St Vincent Broadband 1 2021-09-26 2021-10-03
10 Trinidad Broadband 2 2021-10-03 2021-10-10
11 Trinidad TV 1 2021-09-26 2021-10-03
说明
首先添加/然后减去周偏移 wo
的技巧解决了当日期 在 偏移边界时减去偏移的问题。例如:
t = pd.to_datetime([
'2021-10-02', '2021-10-03', '2021-10-03 01:00',
'2021-10-04', '2021-10-09 23:59:59', '2021-10-10']
)
>>> pd.DataFrame(dict(t=t, ws=t + wo - wo, we=t + wo))
t ws we
0 2021-10-02 00:00:00 2021-09-26 2021-10-03
1 2021-10-03 00:00:00 2021-10-03 2021-10-10
2 2021-10-03 01:00:00 2021-10-03 2021-10-10
3 2021-10-04 00:00:00 2021-10-03 2021-10-10
4 2021-10-09 23:59:59 2021-10-03 2021-10-10
5 2021-10-10 00:00:00 2021-10-10 2021-10-17
接下来,我们需要 ticket_id
的非重复计数,其中状态为 'closed'
。这有点棘手。我们用两个 groupby 来做。第一个是计算每个 (ticket, status)
(和所有其他分组列)组的计数。可能存在打开或关闭同一个工单的多个实例,但我们希望忽略这一点。
tmp = (
df
.assign(week_startdate=t - wo, week_enddate=t)
.groupby(['ticket_id', 'status'] + gbcols)
.agg(agg_dict)
)
>>> tmp.head()
ticket_id week_startdate \
ticket_id status org_country product_type
692914 closed Curacao Broadband 1 2021-10-03
693170 open Curacao Broadband 1 2021-10-03
1259365 closed Grenada Fixed Voice 1 2021-09-26
1259466 closed Grenada Broadband 1 2021-09-26
1260057 open Grenada Broadband 1 2021-10-03
week_enddate
ticket_id status org_country product_type
692914 closed Curacao Broadband 2021-10-10
693170 open Curacao Broadband 2021-10-10
1259365 closed Grenada Fixed Voice 2021-10-03
1259466 closed Grenada Broadband 2021-10-03
1260057 open Grenada Broadband 2021-10-10
最后,我们现在再次简单地聚合,但只对 gbcols
列进行分组,并且在该计数中状态为 'closed',从而为我们提供上面的完整表达式。
附录
OP 希望将 week_startdate
和结束日期转换为字符串,具有完整的 H:M:S
分辨率。为此:
for k in ['week_startdate', 'week_enddate']:
out[k] = out[k].dt.strftime('%F %T')
现在:
>>> out
org_country product_type Value week_startdate week_enddate
0 Antigua TV 1 2021-10-03 00:00:00 2021-10-10 00:00:00
1 Curacao Broadband 1 2021-10-03 00:00:00 2021-10-10 00:00:00
2 Grenada Broadband 1 2021-09-26 00:00:00 2021-10-03 00:00:00
3 Grenada Fixed Voice 2 2021-09-26 00:00:00 2021-10-10 00:00:00
4 Jamaica Broadband 5 2021-09-26 00:00:00 2021-10-10 00:00:00
5 Jamaica TV 1 2021-09-26 00:00:00 2021-10-03 00:00:00
6 St Lucia Broadband 2 2021-09-26 00:00:00 2021-10-10 00:00:00
7 St Lucia Fixed Voice 1 2021-09-26 00:00:00 2021-10-03 00:00:00
8 St Lucia TV 1 2021-09-26 00:00:00 2021-10-03 00:00:00
9 St Vincent Broadband 1 2021-09-26 00:00:00 2021-10-03 00:00:00
10 Trinidad Broadband 2 2021-10-03 00:00:00 2021-10-10 00:00:00
11 Trinidad TV 1 2021-09-26 00:00:00 2021-10-03 00:00:00
我有一个如下所示的数据集。我需要根据 groupby 计算状态为关闭的 ticket_id,我需要从 ticket_time 列获取周开始(星期日)和周结束。 输入数据帧
id status ticket_time product country
1260057 open 2021-10-03 01:20:00 Broadband Grenada
2998178 open 2021-10-06 00:00:00 Fixed Voice Bahamas
3762949 closed 2021-10-04 00:00:00 Fixed Voice St Lucia
3766608 closed 2021-10-10 00:00:00 Broadband St Lucia
3767125 closed 2021-10-03 00:00:00 TV Antigua
etc.................
输出数据帧 示例:
country product week_startdate week_enddate Value
antigua TV 2021-10-03 00:00:00 2021-10-10 00:00:00 Count(id) with status=closed
St Lucia Broadband ... .... ...
St Lucia Fixedvoice ... .... ...
对于每个国家和每个产品,我应该得到处于关闭状态的 id 的计数。 groupby country and product 是否有效?
输入数据集
{'ticket_id': [1260057,
2998178,
3762949,
3766608,
3767125,
6050009,
6050608,
6050972,
6052253,
6053697,
6053812,
6053848,
6053970,
6054363,
6054549,
6055529,
6057303,
6057364,
6058428,
6101321,
6103451,
6103688,
6105180,
693170,
1259365,
1259466,
1260843,
3762579,
3762778,
3764140,
3768850,
6050528,
6050595,
6051099,
6053704,
6054013,
6054582,
6055323,
6056096,
6056363,
6057701,
6058251,
6058323,
6102386,
6103121,
6104844,
6104935,
692914,
1260730,
2997275],
'status': ['open',
'open',
'closed',
'closed',
'closed',
'closed',
'open',
'open',
'closed',
'open',
'open',
'closed',
'open',
'open',
'open',
'open',
'open',
'open',
'open',
'closed',
'closed',
'closed',
'open',
'open',
'closed',
'closed',
'open',
'closed',
'closed',
'closed',
'open',
'closed',
'closed',
'closed',
'open',
'open',
'open',
'open',
'open',
'open',
'open',
'open',
'open',
'cancelled',
'open',
'open',
'open',
'closed',
'closed',
'open'],
'ticket_start_time': [Timestamp('2021-10-04 01:20:00'),
Timestamp('2021-10-02 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-02 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-01 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-03 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-05 00:00:00'),
Timestamp('2021-10-02 00:00:00'),
Timestamp('2021-10-02 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-04 00:00:00'),
Timestamp('2021-10-01 00:00:00')],
'product_type': ['Broadband',
'Fixed Voice',
'Fixed Voice',
'Broadband',
'TV',
'TV',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'TV',
'TV',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Fixed Voice',
'Broadband',
'Fixed Voice',
'TV',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'TV',
'Fixed Voice',
'TV',
'TV',
'TV',
'TV',
'TV',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Broadband',
'Fixed Voice',
'Fixed Voice'],
'org_country': ['Grenada',
'Bahamas',
'St Lucia',
'St Lucia',
'Antigua',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Trinidad',
'Trinidad',
'Trinidad',
'Trinidad',
'Curacao',
'Grenada',
'Grenada',
'Grenada',
'St Lucia',
'St Lucia',
'St Vincent',
'St Lucia',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Jamaica',
'Trinidad',
'Trinidad',
'Trinidad',
'Trinidad',
'Curacao',
'Grenada',
'Bahamas']}
IIUC:
def week_start_day(s):
return s - (pd.to_timedelta((s.dt.weekday + 1)%7, unit='D'))
def week_end_day(s):
return s + pd.to_timedelta(7-(s.dt.weekday+2)%7, unit='D')
df['ticket_time'] = pd.to_datetime(df['ticket_time'])
df = df[df['status'] == 'closed'].groupby(['country', 'product']).agg(week_startdate = pd.NamedAgg(column='ticket_time', aggfunc=week_start_day),
week_enddate = pd.NamedAgg(column='ticket_time', aggfunc=week_end_day),
Value = pd.NamedAgg(column='id', aggfunc="size"))
df.reset_index(inplace=True)
OUTPUT
country product week_startdate week_enddate Value
0 Antigua TV 2021-10-03 2021-10-09 1
1 St Lucia Broadband 2021-10-10 2021-10-16 1
2 St Lucia Fixed Voice 2021-10-03 2021-10-09 1
Setup
data = {'id': [1260057, 2998178, 3762949, 3766608, 3767125],
'status': ['open', 'open', 'closed', 'closed', 'closed'],
'ticket_time': ['2021-10-03 01:20:00', '2021-10-06 00:00:00', '2021-10-04 00:00:00',
'2021-10-10 00:00:00', '2021-10-03 00:00:00'],
'product': ['Broadband', 'Fixed Voice', 'Fixed Voice', 'Broadband', 'TV'],
'country': ['Grenada', 'Bahamas', 'St Lucia', 'St Lucia', 'Antigua']}
df = pd.DataFrame.from_dict(data)
修改后的答案
经过多次解释,似乎 OP 正在寻找周的开始和结束作为聚合的结果,而不是作为分组。
因此,这里是修改后的答案:
weekday = 6 # Sunday
wo = pd.offsets.Week(weekday=weekday, normalize=True)
t = df['ticket_start_time'] + wo
gbcols = ['org_country', 'product_type']
agg_dict = {
'ticket_id': 'count',
'week_startdate': 'first',
'week_enddate': 'last',
}
out = (
df
.assign(week_startdate=t - wo, week_enddate=t)
.groupby(['ticket_id', 'status'] + gbcols).agg(agg_dict)
.query("status == 'closed'").groupby(gbcols).agg(agg_dict)
.reset_index()
.rename(columns={'ticket_id': 'Value'})
)
从提供的 'input data',我们得到:
>>> out
org_country product_type Value week_startdate week_enddate
0 Antigua TV 1 2021-10-03 2021-10-10
1 Curacao Broadband 1 2021-10-03 2021-10-10
2 Grenada Broadband 1 2021-09-26 2021-10-03
3 Grenada Fixed Voice 2 2021-09-26 2021-10-10
4 Jamaica Broadband 5 2021-09-26 2021-10-10
5 Jamaica TV 1 2021-09-26 2021-10-03
6 St Lucia Broadband 2 2021-09-26 2021-10-10
7 St Lucia Fixed Voice 1 2021-09-26 2021-10-03
8 St Lucia TV 1 2021-09-26 2021-10-03
9 St Vincent Broadband 1 2021-09-26 2021-10-03
10 Trinidad Broadband 2 2021-10-03 2021-10-10
11 Trinidad TV 1 2021-09-26 2021-10-03
说明
首先添加/然后减去周偏移 wo
的技巧解决了当日期 在 偏移边界时减去偏移的问题。例如:
t = pd.to_datetime([
'2021-10-02', '2021-10-03', '2021-10-03 01:00',
'2021-10-04', '2021-10-09 23:59:59', '2021-10-10']
)
>>> pd.DataFrame(dict(t=t, ws=t + wo - wo, we=t + wo))
t ws we
0 2021-10-02 00:00:00 2021-09-26 2021-10-03
1 2021-10-03 00:00:00 2021-10-03 2021-10-10
2 2021-10-03 01:00:00 2021-10-03 2021-10-10
3 2021-10-04 00:00:00 2021-10-03 2021-10-10
4 2021-10-09 23:59:59 2021-10-03 2021-10-10
5 2021-10-10 00:00:00 2021-10-10 2021-10-17
接下来,我们需要 ticket_id
的非重复计数,其中状态为 'closed'
。这有点棘手。我们用两个 groupby 来做。第一个是计算每个 (ticket, status)
(和所有其他分组列)组的计数。可能存在打开或关闭同一个工单的多个实例,但我们希望忽略这一点。
tmp = (
df
.assign(week_startdate=t - wo, week_enddate=t)
.groupby(['ticket_id', 'status'] + gbcols)
.agg(agg_dict)
)
>>> tmp.head()
ticket_id week_startdate \
ticket_id status org_country product_type
692914 closed Curacao Broadband 1 2021-10-03
693170 open Curacao Broadband 1 2021-10-03
1259365 closed Grenada Fixed Voice 1 2021-09-26
1259466 closed Grenada Broadband 1 2021-09-26
1260057 open Grenada Broadband 1 2021-10-03
week_enddate
ticket_id status org_country product_type
692914 closed Curacao Broadband 2021-10-10
693170 open Curacao Broadband 2021-10-10
1259365 closed Grenada Fixed Voice 2021-10-03
1259466 closed Grenada Broadband 2021-10-03
1260057 open Grenada Broadband 2021-10-10
最后,我们现在再次简单地聚合,但只对 gbcols
列进行分组,并且在该计数中状态为 'closed',从而为我们提供上面的完整表达式。
附录
OP 希望将 week_startdate
和结束日期转换为字符串,具有完整的 H:M:S
分辨率。为此:
for k in ['week_startdate', 'week_enddate']:
out[k] = out[k].dt.strftime('%F %T')
现在:
>>> out
org_country product_type Value week_startdate week_enddate
0 Antigua TV 1 2021-10-03 00:00:00 2021-10-10 00:00:00
1 Curacao Broadband 1 2021-10-03 00:00:00 2021-10-10 00:00:00
2 Grenada Broadband 1 2021-09-26 00:00:00 2021-10-03 00:00:00
3 Grenada Fixed Voice 2 2021-09-26 00:00:00 2021-10-10 00:00:00
4 Jamaica Broadband 5 2021-09-26 00:00:00 2021-10-10 00:00:00
5 Jamaica TV 1 2021-09-26 00:00:00 2021-10-03 00:00:00
6 St Lucia Broadband 2 2021-09-26 00:00:00 2021-10-10 00:00:00
7 St Lucia Fixed Voice 1 2021-09-26 00:00:00 2021-10-03 00:00:00
8 St Lucia TV 1 2021-09-26 00:00:00 2021-10-03 00:00:00
9 St Vincent Broadband 1 2021-09-26 00:00:00 2021-10-03 00:00:00
10 Trinidad Broadband 2 2021-10-03 00:00:00 2021-10-10 00:00:00
11 Trinidad TV 1 2021-09-26 00:00:00 2021-10-03 00:00:00