在获取数组中每个正值区间的平均值时避免 for 循环
Avoid for-loops when getting mean of every positive-value-interval in an array
我想获得值高于阈值的每个区间的平均值。显然,我可以做一个循环,看看下一个值是否低于阈值等,但我希望有更简单的方法。您是否有类似于掩蔽之类的想法,但包括“间隔”问题?
下面2张图片是原始数据和我想得到的
之前:
之后:
我最初的想法是遍历我的数组,但由于我想这样做大约 10.000 次或更多次,我想它变得非常耗时。
有没有办法摆脱 for
循环?
transformed
是一个 numpy 数组。
plt.figure()
plt.plot(transformed)
thresh=np.percentile(transformed,30)
plt.hlines(np.percentile(transformed,30),0,700)
transformed_copy=transformed
transformed_mask=[True if x>thresh else False for x in transformed_copy]
mean_arr=[]
for k in range(0,len(transformed)):
if transformed_mask[k]==False:
mean_all=np.mean(transformed_copy[mean_arr])
for el in mean_arr:
transformed_copy[el]=mean_all
mean_arr=[]
if transformed_mask[k]==True:
mean_arr.append(k)
plt.plot(transformed_copy)
循环后输出:
我在这里使用的技巧是计算掩码中哪里有突然的差异,这意味着我们从一个连续的部分切换到另一个。然后我们得到这些部分开始和结束位置的索引,并计算它们内部的平均值。
# Imports.
import matplotlib.pyplot as plt
import numpy as np
# Create data.
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(np.sin(x)*4)
threshold = 0.30
mask = y > threshold
# Plot the raw data, threshold, and show where the data is above the threshold.
fig, ax = plt.subplots()
ax.plot(x, y, color="blue", label="y", marker="o", zorder=0)
ax.scatter(x[mask], y[mask], color="red", label="y > threshold", zorder=1)
ax.axhline(threshold, color="red", label="threshold")
ax.legend(loc="upper left", bbox_to_anchor=(1.01, 1))
# Detect the different segments.
diff = np.diff(mask) # Where the mask starts and ends.
jumps = np.where(diff)[0] # Indices of where the mask starts and ends.
for jump in jumps:
ax.axvline(x[jump], linestyle="--", color="black")
# Calculate the mean inside each segment.
for n1, n2 in zip(jumps[:-1:2], jumps[1::2]):
xn = x[n1:n2]
yn = y[n1:n2]
mean_in_section_n = np.mean(yn)
ax.hlines(mean_in_section_n, xn[0], xn[-1], color="red", lw=10)
fig.show()
再花一点时间,我们可以想象一个包含所有这些逻辑并具有此签名的函数:f(data, mask) -> data1, data2, ...
为每个连续部分返回一个元素。
def data_where_mask_is_contiguous(data:np.array, mask:np.array) -> list:
sections = []
diff = np.diff(mask) # Where the mask starts and ends.
jumps = np.where(diff)[0] # Indices of where the mask starts and ends.
for n1, n2 in zip(jumps[:-1:2], jumps[1::2]):
sections.append(data[n1:n2])
return sections
有了这个,你可以很容易地得到每个部分的平均值:
print([np.mean(yn) for yn in data_where_mask_is_contiguous(y, mask)])
>>> [0.745226, 0.747790, 0.599429]
我只是注意到当掩码全部为真时它不起作用,所以我需要添加一个默认情况,但你明白了。
我想获得值高于阈值的每个区间的平均值。显然,我可以做一个循环,看看下一个值是否低于阈值等,但我希望有更简单的方法。您是否有类似于掩蔽之类的想法,但包括“间隔”问题?
下面2张图片是原始数据和我想得到的
之前:
之后:
我最初的想法是遍历我的数组,但由于我想这样做大约 10.000 次或更多次,我想它变得非常耗时。
有没有办法摆脱 for
循环?
transformed
是一个 numpy 数组。
plt.figure()
plt.plot(transformed)
thresh=np.percentile(transformed,30)
plt.hlines(np.percentile(transformed,30),0,700)
transformed_copy=transformed
transformed_mask=[True if x>thresh else False for x in transformed_copy]
mean_arr=[]
for k in range(0,len(transformed)):
if transformed_mask[k]==False:
mean_all=np.mean(transformed_copy[mean_arr])
for el in mean_arr:
transformed_copy[el]=mean_all
mean_arr=[]
if transformed_mask[k]==True:
mean_arr.append(k)
plt.plot(transformed_copy)
循环后输出:
我在这里使用的技巧是计算掩码中哪里有突然的差异,这意味着我们从一个连续的部分切换到另一个。然后我们得到这些部分开始和结束位置的索引,并计算它们内部的平均值。
# Imports.
import matplotlib.pyplot as plt
import numpy as np
# Create data.
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(np.sin(x)*4)
threshold = 0.30
mask = y > threshold
# Plot the raw data, threshold, and show where the data is above the threshold.
fig, ax = plt.subplots()
ax.plot(x, y, color="blue", label="y", marker="o", zorder=0)
ax.scatter(x[mask], y[mask], color="red", label="y > threshold", zorder=1)
ax.axhline(threshold, color="red", label="threshold")
ax.legend(loc="upper left", bbox_to_anchor=(1.01, 1))
# Detect the different segments.
diff = np.diff(mask) # Where the mask starts and ends.
jumps = np.where(diff)[0] # Indices of where the mask starts and ends.
for jump in jumps:
ax.axvline(x[jump], linestyle="--", color="black")
# Calculate the mean inside each segment.
for n1, n2 in zip(jumps[:-1:2], jumps[1::2]):
xn = x[n1:n2]
yn = y[n1:n2]
mean_in_section_n = np.mean(yn)
ax.hlines(mean_in_section_n, xn[0], xn[-1], color="red", lw=10)
fig.show()
再花一点时间,我们可以想象一个包含所有这些逻辑并具有此签名的函数:f(data, mask) -> data1, data2, ...
为每个连续部分返回一个元素。
def data_where_mask_is_contiguous(data:np.array, mask:np.array) -> list:
sections = []
diff = np.diff(mask) # Where the mask starts and ends.
jumps = np.where(diff)[0] # Indices of where the mask starts and ends.
for n1, n2 in zip(jumps[:-1:2], jumps[1::2]):
sections.append(data[n1:n2])
return sections
有了这个,你可以很容易地得到每个部分的平均值:
print([np.mean(yn) for yn in data_where_mask_is_contiguous(y, mask)])
>>> [0.745226, 0.747790, 0.599429]
我只是注意到当掩码全部为真时它不起作用,所以我需要添加一个默认情况,但你明白了。