将数组参数传递给我自己应用于 Pandas groupby 的 2D 函数
Passing array arguments to my own 2D function applied on Pandas groupby
我得到以下 pandas 数据框
df
long lat weekday hour
dttm
2015-07-03 00:00:38 1.114318 0.709553 6 0
2015-08-04 00:19:18 0.797157 0.086720 3 0
2015-08-04 00:19:46 0.797157 0.086720 3 0
2015-08-04 13:24:02 0.786688 0.059632 3 13
2015-08-04 13:24:34 0.786688 0.059632 3 13
2015-08-04 18:46:36 0.859795 0.330385 3 18
2015-08-04 18:47:02 0.859795 0.330385 3 18
2015-08-04 19:46:41 0.755008 0.041488 3 19
2015-08-04 19:47:45 0.755008 0.041488 3 19
我还有一个接收 2 个数组作为输入的函数:
import pandas as pd
import numpy as np
def time_hist(weekday, hour):
hist_2d=np.histogram2d(weekday,hour, bins = [xrange(0,8), xrange(0,25)])
return hist_2d[0].astype(int)
我希望将我的 2D 函数应用于以下 groupby 的每个组:
df.groupby(['long', 'lat'])
我尝试将 *args 传递给 .apply():
df.groupby(['long', 'lat']).apply(time_hist, [df.weekday, df.hour])
但我得到一个错误:"The dimension of bins must be equal to the dimension of the sample x."
当然尺寸不匹配。整个想法是我事先不知道要向每个组发送哪个迷你 [weekday, hour] 数组。
我该怎么做?
做:
import pandas as pd
import numpy as np
df = pd.read_csv('file.csv', index_col=0)
def time_hist(x):
hour = x.hour
weekday = x.weekday
hist_2d = np.histogram2d(weekday, hour, bins=[xrange(0, 8), xrange(0, 25)])
return hist_2d[0].astype(int)
print(df.groupby(['long', 'lat']).apply(time_hist))
输出:
long lat
0.755008 0.041488 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
0.786688 0.059632 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
0.797157 0.086720 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
0.859795 0.330385 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
1.114318 0.709553 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
dtype: object
我得到以下 pandas 数据框
df
long lat weekday hour
dttm
2015-07-03 00:00:38 1.114318 0.709553 6 0
2015-08-04 00:19:18 0.797157 0.086720 3 0
2015-08-04 00:19:46 0.797157 0.086720 3 0
2015-08-04 13:24:02 0.786688 0.059632 3 13
2015-08-04 13:24:34 0.786688 0.059632 3 13
2015-08-04 18:46:36 0.859795 0.330385 3 18
2015-08-04 18:47:02 0.859795 0.330385 3 18
2015-08-04 19:46:41 0.755008 0.041488 3 19
2015-08-04 19:47:45 0.755008 0.041488 3 19
我还有一个接收 2 个数组作为输入的函数:
import pandas as pd
import numpy as np
def time_hist(weekday, hour):
hist_2d=np.histogram2d(weekday,hour, bins = [xrange(0,8), xrange(0,25)])
return hist_2d[0].astype(int)
我希望将我的 2D 函数应用于以下 groupby 的每个组:
df.groupby(['long', 'lat'])
我尝试将 *args 传递给 .apply():
df.groupby(['long', 'lat']).apply(time_hist, [df.weekday, df.hour])
但我得到一个错误:"The dimension of bins must be equal to the dimension of the sample x."
当然尺寸不匹配。整个想法是我事先不知道要向每个组发送哪个迷你 [weekday, hour] 数组。
我该怎么做?
做:
import pandas as pd
import numpy as np
df = pd.read_csv('file.csv', index_col=0)
def time_hist(x):
hour = x.hour
weekday = x.weekday
hist_2d = np.histogram2d(weekday, hour, bins=[xrange(0, 8), xrange(0, 25)])
return hist_2d[0].astype(int)
print(df.groupby(['long', 'lat']).apply(time_hist))
输出:
long lat
0.755008 0.041488 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
0.786688 0.059632 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
0.797157 0.086720 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
0.859795 0.330385 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
1.114318 0.709553 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
dtype: object