按日期分组找到平均不同的客户
Group by Date find average distinct customers
我有一个包含一个月数据的 DataFrame:
initiated_date | specialist_id
21/10/2020 05:00:01 | ab12
21/10/2020 12:20:01 | gc35
22/10/2020 04:30:01 | ad32
22/10/2020 03:40:01 | fe45
22/10/2020 01:50:01 | ad32
23/10/2020 02:10:01 | iu99
23/10/2020 11:30:01 | iu99
我想找出每天specialist_id不同的平均数名称(星期一,星期二..等)
我想复制 SQL 的子查询:
SELECT
initiated_day, CEILING(AVG(specialist_id)) AS specialist_id
FROM
(SELECT
DATE(initiated_date),
DAYNAME(initiated_date) AS initiated_day,
COUNT(DISTINCT specialist_id) specialist_id
FROM
nts.contacts
GROUP BY 1 , 2) x
GROUP BY 1
我要找的是:
Day | specialist_id
Mon | 42
Tue | 48
Wed | 51
Thu | 47
Fri | 38
Sat | 31
Sun | 22
这就是我想要做的
df.groupby([df['initiated_date'].dt.date,df['initiated_date'].dt.weekday_name])['specialist_id'].nunique().reset_index()
我不确定如何更进一步。
您可以添加第二个groupby
st1 = dt.groupby([dt['initiated_date'].dt.date,dt['initiated_date']. day_name()])['specialist_id'].nunique()
out = st1.groupby(level=1).apply(lambda x : np.ceil(x.mean())).reset_index()
IIUC,
new_df = \
df.groupby(df['initiated_date'].dt.day_name())['specialist_id']\
.value_counts()\
.mean(level='initiated_date')\ #.groupby(level=0).mean() if you need instead
.rename_axis('Day').reset_index(name='specialist_id')
如果你想在白天获得独特的:
new_df = \
df.groupby(df['initiated_date'].dt.day_name())['specialist_id']\
.nunique()\
.rename_axis('Day').reset_index(name='specialist_id')
如果需要ceil:
new_df = \
np.ceil(
df.groupby(df['initiated_date'].dt.day_name())['specialist_id']
.value_counts()
.mean(level='initiated_date')#.groupby(level=0).mean() if you need instead
)\
.rename_axis('Day').reset_index(name='specialist_id')
我有一个包含一个月数据的 DataFrame:
initiated_date | specialist_id
21/10/2020 05:00:01 | ab12
21/10/2020 12:20:01 | gc35
22/10/2020 04:30:01 | ad32
22/10/2020 03:40:01 | fe45
22/10/2020 01:50:01 | ad32
23/10/2020 02:10:01 | iu99
23/10/2020 11:30:01 | iu99
我想找出每天specialist_id不同的平均数名称(星期一,星期二..等) 我想复制 SQL 的子查询:
SELECT
initiated_day, CEILING(AVG(specialist_id)) AS specialist_id
FROM
(SELECT
DATE(initiated_date),
DAYNAME(initiated_date) AS initiated_day,
COUNT(DISTINCT specialist_id) specialist_id
FROM
nts.contacts
GROUP BY 1 , 2) x
GROUP BY 1
我要找的是:
Day | specialist_id
Mon | 42
Tue | 48
Wed | 51
Thu | 47
Fri | 38
Sat | 31
Sun | 22
这就是我想要做的
df.groupby([df['initiated_date'].dt.date,df['initiated_date'].dt.weekday_name])['specialist_id'].nunique().reset_index()
我不确定如何更进一步。
您可以添加第二个groupby
st1 = dt.groupby([dt['initiated_date'].dt.date,dt['initiated_date']. day_name()])['specialist_id'].nunique()
out = st1.groupby(level=1).apply(lambda x : np.ceil(x.mean())).reset_index()
IIUC,
new_df = \
df.groupby(df['initiated_date'].dt.day_name())['specialist_id']\
.value_counts()\
.mean(level='initiated_date')\ #.groupby(level=0).mean() if you need instead
.rename_axis('Day').reset_index(name='specialist_id')
如果你想在白天获得独特的:
new_df = \
df.groupby(df['initiated_date'].dt.day_name())['specialist_id']\
.nunique()\
.rename_axis('Day').reset_index(name='specialist_id')
如果需要ceil:
new_df = \
np.ceil(
df.groupby(df['initiated_date'].dt.day_name())['specialist_id']
.value_counts()
.mean(level='initiated_date')#.groupby(level=0).mean() if you need instead
)\
.rename_axis('Day').reset_index(name='specialist_id')