如何使用 numpy 计算 table 的第 95 个百分位数?
How can I calculate the 95th percentile over a table with numpy?
我正在尝试使用 numpy 从我的 table 计算第 95 个百分位数和其他百分位数。然而,执行此操作的功能对我来说似乎不清楚,因为它需要一个数组才能工作:
>>> a = np.array([[10, 7, 4], [3, 2, 1]])
>>> a
array([[10, 7, 4],
[ 3, 2, 1]])
>>> np.percentile(a, 50)
这将是数组第 50 个百分位数的方式。
这是我的 table 的样子:
Date Hour Month Value
9/1/2019 0:00 SEPTEMBER 377.3333333
9/1/2019 0:00 SEPTEMBER 268.8
9/1/2019 0:00 SEPTEMBER 400.8
9/1/2019 0:00 SEPTEMBER 279.1304348
9/1/2019 0:05 SEPTEMBER 440
9/1/2019 0:05 SEPTEMBER 228
9/1/2019 0:05 SEPTEMBER 350
9/1/2019 0:05 SEPTEMBER 283.2
9/1/2019 0:10 SEPTEMBER 385.3333333
9/1/2019 0:10 SEPTEMBER 240
9/1/2019 0:10 SEPTEMBER 347.5
9/1/2019 0:10 SEPTEMBER 175.2
9/1/2019 0:15 SEPTEMBER 440
9/1/2019 0:15 SEPTEMBER 202.8
9/1/2019 0:15 SEPTEMBER 204
9/1/2019 0:15 SEPTEMBER 182.4
...
9/2/2019 0:00 SEPTEMBER 416
9/2/2019 0:00 SEPTEMBER 134.4
9/2/2019 0:00 SEPTEMBER 370
...
直到 9 月底
我想计算每 5 分钟间隔的第 95 个百分位数。
最终结果应该是这样的:
Time September
0:00 95th Value
0:05 95th Value
0:10 95th Value
0:15 95th Value
.....
import re
import pandas as pd
data = '''9/1/2019 0:00 SEPTEMBER 377.3333333
9/1/2019 0:00 SEPTEMBER 268.8
9/1/2019 0:00 SEPTEMBER 400.8
9/1/2019 0:00 SEPTEMBER 279.1304348
9/1/2019 0:05 SEPTEMBER 440
9/1/2019 0:05 SEPTEMBER 228
9/1/2019 0:05 SEPTEMBER 350
9/1/2019 0:05 SEPTEMBER 283.2
9/1/2019 0:10 SEPTEMBER 385.3333333
9/1/2019 0:10 SEPTEMBER 240
9/1/2019 0:10 SEPTEMBER 347.5
9/1/2019 0:10 SEPTEMBER 175.2
9/1/2019 0:15 SEPTEMBER 440
9/1/2019 0:15 SEPTEMBER 202.8
9/1/2019 0:15 SEPTEMBER 204
9/1/2019 0:15 SEPTEMBER 182.4
9/1/2019 0:20 SEPTEMBER 416
9/1/2019 0:20 SEPTEMBER 134.4
9/1/2019 0:20 SEPTEMBER 370
9/2/2019 0:05 SEPTEMBER 145.9
9/2/2019 0:05 SEPTEMBER 360'''
data = [re.split('[ ]+', x) for x in data.split('\n')]
df = pd.DataFrame(data, columns=['date','hour','month','value'])
df['value'] = df['value'].astype(float)
print(df.groupby(['date','hour']).value.quantile(0.95))
我正在尝试使用 numpy 从我的 table 计算第 95 个百分位数和其他百分位数。然而,执行此操作的功能对我来说似乎不清楚,因为它需要一个数组才能工作:
>>> a = np.array([[10, 7, 4], [3, 2, 1]])
>>> a
array([[10, 7, 4],
[ 3, 2, 1]])
>>> np.percentile(a, 50)
这将是数组第 50 个百分位数的方式。
这是我的 table 的样子:
Date Hour Month Value
9/1/2019 0:00 SEPTEMBER 377.3333333
9/1/2019 0:00 SEPTEMBER 268.8
9/1/2019 0:00 SEPTEMBER 400.8
9/1/2019 0:00 SEPTEMBER 279.1304348
9/1/2019 0:05 SEPTEMBER 440
9/1/2019 0:05 SEPTEMBER 228
9/1/2019 0:05 SEPTEMBER 350
9/1/2019 0:05 SEPTEMBER 283.2
9/1/2019 0:10 SEPTEMBER 385.3333333
9/1/2019 0:10 SEPTEMBER 240
9/1/2019 0:10 SEPTEMBER 347.5
9/1/2019 0:10 SEPTEMBER 175.2
9/1/2019 0:15 SEPTEMBER 440
9/1/2019 0:15 SEPTEMBER 202.8
9/1/2019 0:15 SEPTEMBER 204
9/1/2019 0:15 SEPTEMBER 182.4
...
9/2/2019 0:00 SEPTEMBER 416
9/2/2019 0:00 SEPTEMBER 134.4
9/2/2019 0:00 SEPTEMBER 370
...
直到 9 月底
我想计算每 5 分钟间隔的第 95 个百分位数。
最终结果应该是这样的:
Time September
0:00 95th Value
0:05 95th Value
0:10 95th Value
0:15 95th Value
.....
import re
import pandas as pd
data = '''9/1/2019 0:00 SEPTEMBER 377.3333333
9/1/2019 0:00 SEPTEMBER 268.8
9/1/2019 0:00 SEPTEMBER 400.8
9/1/2019 0:00 SEPTEMBER 279.1304348
9/1/2019 0:05 SEPTEMBER 440
9/1/2019 0:05 SEPTEMBER 228
9/1/2019 0:05 SEPTEMBER 350
9/1/2019 0:05 SEPTEMBER 283.2
9/1/2019 0:10 SEPTEMBER 385.3333333
9/1/2019 0:10 SEPTEMBER 240
9/1/2019 0:10 SEPTEMBER 347.5
9/1/2019 0:10 SEPTEMBER 175.2
9/1/2019 0:15 SEPTEMBER 440
9/1/2019 0:15 SEPTEMBER 202.8
9/1/2019 0:15 SEPTEMBER 204
9/1/2019 0:15 SEPTEMBER 182.4
9/1/2019 0:20 SEPTEMBER 416
9/1/2019 0:20 SEPTEMBER 134.4
9/1/2019 0:20 SEPTEMBER 370
9/2/2019 0:05 SEPTEMBER 145.9
9/2/2019 0:05 SEPTEMBER 360'''
data = [re.split('[ ]+', x) for x in data.split('\n')]
df = pd.DataFrame(data, columns=['date','hour','month','value'])
df['value'] = df['value'].astype(float)
print(df.groupby(['date','hour']).value.quantile(0.95))