有没有办法将 bincount 与 python 中的子句一起使用？

Question

我有关于自行车租赁需求和天气的每小时数据。我想根据好天气和坏天气分别绘制每小时的平均需求量。

当我绘制给定时间的平均需求（不考虑天气）时，我所做的是计算给定时间的租金总需求，然后除以总小时数：

hour_count = np.bincount(hour)
for i in range(number_of_observations):
    hour_sums[hour[i]] = hour_sums[hour[i]] + rentals[i]

av_rentals = [x/y for x,y in zip(hour_sums,hour_count)]

现在我也想做同样的事情，但是分别针对好天气和坏天气。累加和很简单，我只是添加了一个 'if' 子句。我不知道如何计算好天气和坏天气的时间。我宁愿避免像总和那样做一个大循环......任何与 bincount 相同但带有子句的函数？类似于：

good_weather_hour_count = np.bincount(hour, weather == 1 or weather == 2)

有什么想法吗？
PS。也许有人知道如何在没有循环的情况下计算给定小时内的租金？我用 2d 直方图尝试了一些东西，但它没有用。

label_sums = np.histogram2d(hour, rentals, bins=24)[0]

Answer 1

我不确定 Numpy，但您可以使用标准库相当轻松地做到这一点：

from collections import Counter, defaultdict

weather_counts = defaultdict(Counter)

times = [
    {'time': '1:00 AM', 'weather': 1},
    {'time': '2:00 AM', 'weather': 2},
    {'time': '5:00 PM', 'weather': 2},
    {'time': '3:00 AM', 'weather': 1},
    {'time': '1:00 AM', 'weather': 1},
]

rentals = [
    1,
    2,
    5,
    3,
    3,
]

for times, rental_count in zip(times, rentals):
    weather_counts[times['weather']][times['time']] += rental_count

import pprint; pprint.pprint(weather_counts)

Answer 2

np.bincount has a weights parameter，您可以使用它来计算按租金加权的小时数。例如，

In [39]: np.bincount([1,2,3,1], weights=[20,10,40,10]) Out[39]: array([ 0., 30., 10., 40.])

因此，您可以替换 for-loop:

for i in range(number_of_observations): hour_sums[hour[i]] = hour_sums[hour[i]] + rentals[i]

和

hour_sums = np.bincount(hour, weights=rentals, minlength=24)

要处理 good/bad 天气，您可以将 hour 和 rentals 数据屏蔽为 select 只有适用的数据子集：

mask = (weather == w) masked_hour = hour[mask] masked_rentals = rentals[mask]

然后对masked_hour和masked_rentals进行计算：

import numpy as np np.random.seed(2016) N = 2 hour = np.tile(np.arange(24), N) rentals = np.random.randint(10, size=(len(hour),)) # say, weather=1 means good weather, 2 means bad weather weather = np.random.randint(1, 3, size=(len(hour),)) average_rentals = dict() for kind, w in zip(['good', 'bad', 'all'], [1, 2, None]): if w is None: mask = slice(None) else: mask = (weather == w) masked_hour = hour[mask] masked_rentals = rentals[mask] total_rentals = np.bincount(masked_hour, weights=masked_rentals, minlength=24) total_hours = np.bincount(masked_hour, minlength=24) average_rentals[kind] = (total_rentals / total_hours) for kind, result in average_rentals.items(): print('\n{}: {}'.format(kind, result))

产量

bad: [ 4. 6. 2. 5.5 nan 4. 4. 8. nan 3. nan 2.5 4. nan 9. nan 3. 5.5 8. nan 8. 5. 9. 4. ] good: [ 3. nan 4. nan 8. 4. nan 7. 5.5 2. 4. nan nan 0.5 9. 0.5 nan nan 5. 7. 1. 7. 8. 0. ] all: [ 3.5 6. 3. 5.5 8. 4. 4. 7.5 5.5 2.5 4. 2.5 4. 0.5 9. 0.5 3. 5.5 6.5 7. 4.5 6. 8.5 2. ]

有没有办法将 bincount 与 python 中的子句一起使用？

Is there a way to use bincount with a clause in python?

python

numpy

sum

counting