如何计算点击率
How to calculate the click-through rate
举个例子,我有这个数据;
datetime keyword COUNT
0 2016-01-05 a_click 100
1 2016-01-05 a_pv 200
2 2016-01-05 b_pv 150
3 2016-01-05 b_click 90
4 2016-01-05 c_pv 120
5 2016-01-05 c_click 90
我想将其转换为该数据
datetime keyword ctr
0 2016-01-05 a 0.5
1 2016-01-05 b 0.6
2 2016-01-05 c 0.75
我可以使用脏代码转换数据,但我想以优雅的方式进行。
你可以:
df['action'] = df.keyword.str.split('_').str.get(-1)
df['keyword'] = df.keyword.str.split('_').str.get(0)
df = df.set_index(['datetime', 'keyword', 'action']).unstack().loc[:, 'COUNT']
df['ctr'] = df.click.div(df.pv)
action click pv ctr
datetime keyword
2016-01-05 a 100 200 0.50
b 90 150 0.60
c 90 120 0.75
使用 groupby
的替代方法:
df2['key_word'] = df2.apply(lambda x: x.keyword.split('_')[0], axis=1)
df2['key_action'] = df2.apply(lambda x: x.keyword.split('_')[1], axis=1)
def compute_ctr(g):
ctr = g[g.key_action == 'click'].COUNT.values[0] / g[g.key_action == 'pv'].COUNT.values[0]
result = {'datetime': g.iloc[0,0], 'ctr': ctr}
return pd.Series(result)
rslt = df2.groupby('key_word').apply(compute_ctr)
rslt.reset_index(inplace=True, drop=False)
print(rslt)
ctr datetime keyword
0 0.5 5/1/2016 a
1 0.6 5/1/2016 b
2 0.75 5/1/2016 c
举个例子,我有这个数据;
datetime keyword COUNT
0 2016-01-05 a_click 100
1 2016-01-05 a_pv 200
2 2016-01-05 b_pv 150
3 2016-01-05 b_click 90
4 2016-01-05 c_pv 120
5 2016-01-05 c_click 90
我想将其转换为该数据
datetime keyword ctr
0 2016-01-05 a 0.5
1 2016-01-05 b 0.6
2 2016-01-05 c 0.75
我可以使用脏代码转换数据,但我想以优雅的方式进行。
你可以:
df['action'] = df.keyword.str.split('_').str.get(-1)
df['keyword'] = df.keyword.str.split('_').str.get(0)
df = df.set_index(['datetime', 'keyword', 'action']).unstack().loc[:, 'COUNT']
df['ctr'] = df.click.div(df.pv)
action click pv ctr
datetime keyword
2016-01-05 a 100 200 0.50
b 90 150 0.60
c 90 120 0.75
使用 groupby
的替代方法:
df2['key_word'] = df2.apply(lambda x: x.keyword.split('_')[0], axis=1)
df2['key_action'] = df2.apply(lambda x: x.keyword.split('_')[1], axis=1)
def compute_ctr(g):
ctr = g[g.key_action == 'click'].COUNT.values[0] / g[g.key_action == 'pv'].COUNT.values[0]
result = {'datetime': g.iloc[0,0], 'ctr': ctr}
return pd.Series(result)
rslt = df2.groupby('key_word').apply(compute_ctr)
rslt.reset_index(inplace=True, drop=False)
print(rslt)
ctr datetime keyword
0 0.5 5/1/2016 a
1 0.6 5/1/2016 b
2 0.75 5/1/2016 c