有没有办法将 bincount 与 python 中的子句一起使用?
Is there a way to use bincount with a clause in python?
我有关于自行车租赁需求和天气的每小时数据。我想根据好天气和坏天气分别绘制每小时的平均需求量。
当我绘制给定时间的平均需求(不考虑天气)时,我所做的是计算给定时间的租金总需求,然后除以总小时数:
hour_count = np.bincount(hour)
for i in range(number_of_observations):
hour_sums[hour[i]] = hour_sums[hour[i]] + rentals[i]
av_rentals = [x/y for x,y in zip(hour_sums,hour_count)]
现在我也想做同样的事情,但是分别针对好天气和坏天气。累加和很简单,我只是添加了一个 'if' 子句。我不知道如何计算好天气和坏天气的时间。我宁愿避免像总和那样做一个大循环......任何与 bincount 相同但带有子句的函数?类似于:
good_weather_hour_count = np.bincount(hour, weather == 1 or weather == 2)
有什么想法吗?
PS。也许有人知道如何在没有循环的情况下计算给定小时内的租金?我用 2d 直方图尝试了一些东西,但它没有用。
label_sums = np.histogram2d(hour, rentals, bins=24)[0]
我不确定 Numpy,但您可以使用标准库相当轻松地做到这一点:
from collections import Counter, defaultdict
weather_counts = defaultdict(Counter)
times = [
{'time': '1:00 AM', 'weather': 1},
{'time': '2:00 AM', 'weather': 2},
{'time': '5:00 PM', 'weather': 2},
{'time': '3:00 AM', 'weather': 1},
{'time': '1:00 AM', 'weather': 1},
]
rentals = [
1,
2,
5,
3,
3,
]
for times, rental_count in zip(times, rentals):
weather_counts[times['weather']][times['time']] += rental_count
import pprint; pprint.pprint(weather_counts)
np.bincount
has a weights
parameter,您可以使用它来计算按租金加权的小时数。例如,
In [39]: np.bincount([1,2,3,1], weights=[20,10,40,10])
Out[39]: array([ 0., 30., 10., 40.])
因此,您可以替换 for-loop
:
for i in range(number_of_observations):
hour_sums[hour[i]] = hour_sums[hour[i]] + rentals[i]
和
hour_sums = np.bincount(hour, weights=rentals, minlength=24)
要处理 good/bad 天气,您可以将 hour
和 rentals
数据屏蔽为 select 只有适用的数据子集:
mask = (weather == w)
masked_hour = hour[mask]
masked_rentals = rentals[mask]
然后对masked_hour
和masked_rentals
进行计算:
import numpy as np
np.random.seed(2016)
N = 2
hour = np.tile(np.arange(24), N)
rentals = np.random.randint(10, size=(len(hour),))
# say, weather=1 means good weather, 2 means bad weather
weather = np.random.randint(1, 3, size=(len(hour),))
average_rentals = dict()
for kind, w in zip(['good', 'bad', 'all'], [1, 2, None]):
if w is None:
mask = slice(None)
else:
mask = (weather == w)
masked_hour = hour[mask]
masked_rentals = rentals[mask]
total_rentals = np.bincount(masked_hour, weights=masked_rentals, minlength=24)
total_hours = np.bincount(masked_hour, minlength=24)
average_rentals[kind] = (total_rentals / total_hours)
for kind, result in average_rentals.items():
print('\n{}: {}'.format(kind, result))
产量
bad: [ 4. 6. 2. 5.5 nan 4. 4. 8. nan 3. nan 2.5 4. nan 9.
nan 3. 5.5 8. nan 8. 5. 9. 4. ]
good: [ 3. nan 4. nan 8. 4. nan 7. 5.5 2. 4. nan nan 0.5 9.
0.5 nan nan 5. 7. 1. 7. 8. 0. ]
all: [ 3.5 6. 3. 5.5 8. 4. 4. 7.5 5.5 2.5 4. 2.5 4. 0.5 9.
0.5 3. 5.5 6.5 7. 4.5 6. 8.5 2. ]
我有关于自行车租赁需求和天气的每小时数据。我想根据好天气和坏天气分别绘制每小时的平均需求量。
当我绘制给定时间的平均需求(不考虑天气)时,我所做的是计算给定时间的租金总需求,然后除以总小时数:
hour_count = np.bincount(hour)
for i in range(number_of_observations):
hour_sums[hour[i]] = hour_sums[hour[i]] + rentals[i]
av_rentals = [x/y for x,y in zip(hour_sums,hour_count)]
现在我也想做同样的事情,但是分别针对好天气和坏天气。累加和很简单,我只是添加了一个 'if' 子句。我不知道如何计算好天气和坏天气的时间。我宁愿避免像总和那样做一个大循环......任何与 bincount 相同但带有子句的函数?类似于:
good_weather_hour_count = np.bincount(hour, weather == 1 or weather == 2)
有什么想法吗?
PS。也许有人知道如何在没有循环的情况下计算给定小时内的租金?我用 2d 直方图尝试了一些东西,但它没有用。
label_sums = np.histogram2d(hour, rentals, bins=24)[0]
我不确定 Numpy,但您可以使用标准库相当轻松地做到这一点:
from collections import Counter, defaultdict
weather_counts = defaultdict(Counter)
times = [
{'time': '1:00 AM', 'weather': 1},
{'time': '2:00 AM', 'weather': 2},
{'time': '5:00 PM', 'weather': 2},
{'time': '3:00 AM', 'weather': 1},
{'time': '1:00 AM', 'weather': 1},
]
rentals = [
1,
2,
5,
3,
3,
]
for times, rental_count in zip(times, rentals):
weather_counts[times['weather']][times['time']] += rental_count
import pprint; pprint.pprint(weather_counts)
np.bincount
has a weights
parameter,您可以使用它来计算按租金加权的小时数。例如,
In [39]: np.bincount([1,2,3,1], weights=[20,10,40,10])
Out[39]: array([ 0., 30., 10., 40.])
因此,您可以替换 for-loop
:
for i in range(number_of_observations):
hour_sums[hour[i]] = hour_sums[hour[i]] + rentals[i]
和
hour_sums = np.bincount(hour, weights=rentals, minlength=24)
要处理 good/bad 天气,您可以将 hour
和 rentals
数据屏蔽为 select 只有适用的数据子集:
mask = (weather == w)
masked_hour = hour[mask]
masked_rentals = rentals[mask]
然后对masked_hour
和masked_rentals
进行计算:
import numpy as np
np.random.seed(2016)
N = 2
hour = np.tile(np.arange(24), N)
rentals = np.random.randint(10, size=(len(hour),))
# say, weather=1 means good weather, 2 means bad weather
weather = np.random.randint(1, 3, size=(len(hour),))
average_rentals = dict()
for kind, w in zip(['good', 'bad', 'all'], [1, 2, None]):
if w is None:
mask = slice(None)
else:
mask = (weather == w)
masked_hour = hour[mask]
masked_rentals = rentals[mask]
total_rentals = np.bincount(masked_hour, weights=masked_rentals, minlength=24)
total_hours = np.bincount(masked_hour, minlength=24)
average_rentals[kind] = (total_rentals / total_hours)
for kind, result in average_rentals.items():
print('\n{}: {}'.format(kind, result))
产量
bad: [ 4. 6. 2. 5.5 nan 4. 4. 8. nan 3. nan 2.5 4. nan 9.
nan 3. 5.5 8. nan 8. 5. 9. 4. ]
good: [ 3. nan 4. nan 8. 4. nan 7. 5.5 2. 4. nan nan 0.5 9.
0.5 nan nan 5. 7. 1. 7. 8. 0. ]
all: [ 3.5 6. 3. 5.5 8. 4. 4. 7.5 5.5 2.5 4. 2.5 4. 0.5 9.
0.5 3. 5.5 6.5 7. 4.5 6. 8.5 2. ]