根据所选值总结和绘制 ndarrays 列表
Summarize and plot list of ndarrays according to chosen values
我有一个 ndarray 列表:
list1 = [t1, t2, t3, t4, t5]
每个t包含:
t1 = np.array([[10,0.1],[30,0.05],[30,0.1],[20,0.1],[10,0.05],[10,0.05],[0,0.5],[20,0.05],[10,0.0]], np.float64)
t2 = np.array([[0,0.05],[0,0.05],[30,0],[10,0.25],[10,0.2],[10,0.25],[20,0.1],[20,0.05],[10,0.05]], np.float64)
...
现在我想让整个列表为每个 t 获取对应于第一个元素的值的平均值:
t1out = [[0,0.5],[10,(0.1+0.05+0.05+0)/4],[20,(0.1+0.05)/2],[30,0.075]]
t2out = [[0,0.05],[10,0.1875],[20,0.075],[30,0]]
....
生成 t_1 ... t_n 后,我想绘制每个 t 在 类 上的概率,其中第一个元素代表 类 (0,10,20,30) 和第二个元素显示这些 类 发生的概率 (0.1,0.7,0.15,0)。类似于直方图或条形图形式的概率分布,例如:
plt.bar([classes],[probabilities])
plt.bar([item[0] for item in t1out],[item[1] for item in t1out])
这是使用 itertools.groupby
的一种方法:
from statistics import mean
from itertools import groupby
def fun(t):
s = sorted(t, key=lambda x:x[0])
return [[k, mean(i[1] for i in v)] for k,v in groupby(s, key=lambda x: x[0])]
fun(t1)
[[0.0, 0.5],
[10.0, 0.05],
[20.0, 0.07500000000000001],
[30.0, 0.07500000000000001]]
并应用于所有数组:
[fun(t) for t in [t1,t2]]
[[[0.0, 0.5],
[10.0, 0.05],
[20.0, 0.07500000000000001],
[30.0, 0.07500000000000001]],
[[0.0, 0.05], [10.0, 0.1875], [20.0, 0.07500000000000001], [30.0, 0.0]]]
这是使用 NumPy 计算的方法:
import numpy as np
def mean_by_class(t, classes=None):
# Classes should be passed if you want to ensure
# that all classes are in the output even if they
# are not in the current t vector
if classes is None:
classes = np.unique(t[:, 0])
bins = np.r_[classes, classes[-1] + 1]
h, _ = np.histogram(t[:, 0], bins)
d = np.digitize(t[:, 0], bins, right=True)
out = np.zeros(len(classes), t.dtype)
np.add.at(out, d, t[:, 1])
out /= h.clip(min=1)
return np.c_[classes, out]
t1 = np.array([[10, 0.1 ], [30, 0.05], [30, 0.1 ],
[20, 0.1 ], [10, 0.05], [10, 0.05],
[ 0, 0.5 ], [20, 0.05], [10, 0.0 ]],
dtype=np.float64)
print(mean_by_class(t1))
# [[ 0. 0.5 ]
# [10. 0.05 ]
# [20. 0.075]
# [30. 0.075]]
附带说明一下,将 class 值(整数)存储在浮点数组中可能不是最佳选择。您可以考虑改用 structured array,例如:
import numpy as np
def mean_by_class(t, classes=None):
if classes is None:
classes = np.unique(t['class'])
bins = np.r_[classes, classes[-1] + 1]
h, _ = np.histogram(t['class'], bins)
d = np.digitize(t['class'], bins, right=True)
out = np.zeros(len(classes), t.dtype)
out['class'] = classes
np.add.at(out['p'], d, t['p'])
out['p'] /= h.clip(min=1)
return out
t1 = np.array([(10, 0.1 ), (30, 0.05), (30, 0.1 ),
(20, 0.1 ), (10, 0.05), (10, 0.05),
( 0, 0.5 ), (20, 0.05), (10, 0.0 )],
dtype=[('class', np.int32), ('p', np.float64)])
print(mean_by_class(t1))
# [( 0, 0.5 ) (10, 0.05 ) (20, 0.075) (30, 0.075)]
我有一个 ndarray 列表:
list1 = [t1, t2, t3, t4, t5]
每个t包含:
t1 = np.array([[10,0.1],[30,0.05],[30,0.1],[20,0.1],[10,0.05],[10,0.05],[0,0.5],[20,0.05],[10,0.0]], np.float64)
t2 = np.array([[0,0.05],[0,0.05],[30,0],[10,0.25],[10,0.2],[10,0.25],[20,0.1],[20,0.05],[10,0.05]], np.float64)
...
现在我想让整个列表为每个 t 获取对应于第一个元素的值的平均值:
t1out = [[0,0.5],[10,(0.1+0.05+0.05+0)/4],[20,(0.1+0.05)/2],[30,0.075]]
t2out = [[0,0.05],[10,0.1875],[20,0.075],[30,0]]
....
生成 t_1 ... t_n 后,我想绘制每个 t 在 类 上的概率,其中第一个元素代表 类 (0,10,20,30) 和第二个元素显示这些 类 发生的概率 (0.1,0.7,0.15,0)。类似于直方图或条形图形式的概率分布,例如:
plt.bar([classes],[probabilities])
plt.bar([item[0] for item in t1out],[item[1] for item in t1out])
这是使用 itertools.groupby
的一种方法:
from statistics import mean
from itertools import groupby
def fun(t):
s = sorted(t, key=lambda x:x[0])
return [[k, mean(i[1] for i in v)] for k,v in groupby(s, key=lambda x: x[0])]
fun(t1)
[[0.0, 0.5],
[10.0, 0.05],
[20.0, 0.07500000000000001],
[30.0, 0.07500000000000001]]
并应用于所有数组:
[fun(t) for t in [t1,t2]]
[[[0.0, 0.5],
[10.0, 0.05],
[20.0, 0.07500000000000001],
[30.0, 0.07500000000000001]],
[[0.0, 0.05], [10.0, 0.1875], [20.0, 0.07500000000000001], [30.0, 0.0]]]
这是使用 NumPy 计算的方法:
import numpy as np
def mean_by_class(t, classes=None):
# Classes should be passed if you want to ensure
# that all classes are in the output even if they
# are not in the current t vector
if classes is None:
classes = np.unique(t[:, 0])
bins = np.r_[classes, classes[-1] + 1]
h, _ = np.histogram(t[:, 0], bins)
d = np.digitize(t[:, 0], bins, right=True)
out = np.zeros(len(classes), t.dtype)
np.add.at(out, d, t[:, 1])
out /= h.clip(min=1)
return np.c_[classes, out]
t1 = np.array([[10, 0.1 ], [30, 0.05], [30, 0.1 ],
[20, 0.1 ], [10, 0.05], [10, 0.05],
[ 0, 0.5 ], [20, 0.05], [10, 0.0 ]],
dtype=np.float64)
print(mean_by_class(t1))
# [[ 0. 0.5 ]
# [10. 0.05 ]
# [20. 0.075]
# [30. 0.075]]
附带说明一下,将 class 值(整数)存储在浮点数组中可能不是最佳选择。您可以考虑改用 structured array,例如:
import numpy as np
def mean_by_class(t, classes=None):
if classes is None:
classes = np.unique(t['class'])
bins = np.r_[classes, classes[-1] + 1]
h, _ = np.histogram(t['class'], bins)
d = np.digitize(t['class'], bins, right=True)
out = np.zeros(len(classes), t.dtype)
out['class'] = classes
np.add.at(out['p'], d, t['p'])
out['p'] /= h.clip(min=1)
return out
t1 = np.array([(10, 0.1 ), (30, 0.05), (30, 0.1 ),
(20, 0.1 ), (10, 0.05), (10, 0.05),
( 0, 0.5 ), (20, 0.05), (10, 0.0 )],
dtype=[('class', np.int32), ('p', np.float64)])
print(mean_by_class(t1))
# [( 0, 0.5 ) (10, 0.05 ) (20, 0.075) (30, 0.075)]