如何离散化连续函数以避免噪声产生(见图)
How do I discretize a continuous function avoiding noise generation (see picture)
我有一个连续输入函数,我想将其离散化为 1 和 0 之间的 5-10 个离散分箱。现在我正在使用 np.digitize
并将输出分箱重新调整为 0-1。现在的问题是有时数据集(蓝线)会产生这样的结果:
我尝试增加离散化箱的数量,但我最终保持相同的噪声并获得更多增量。作为算法使用相同设置但使用另一个数据集的示例:
这是我在那里使用的代码NumOfDisc
= 垃圾箱数量
intervals = np.linspace(0,1,NumOfDisc)
discretized_Array = np.digitize(Continuous_Array, intervals)
图中的红色线并不重要。连续的蓝线是我尝试离散化的,绿线是离散化的 result.The 图是使用 matplotlyib.pyplot 使用以下代码创建的:
def CheckPlots(discretized_Array, Continuous_Array, Temperature, time, PlotName)
logging.info("Plotting...")
#Setting Axis properties and titles
fig, ax = plt.subplots(1, 1)
ax.set_title(PlotName)
ax.set_ylabel('Temperature [°C]')
ax.set_ylim(40, 110)
ax.set_xlabel('Time [s]')
ax.grid(b=True, which="both")
ax2=ax.twinx()
ax2.set_ylabel('DC Power [%]')
ax2.set_ylim(-1.5,3.5)
#Plotting stuff
ax.plot(time, Temperature, label= "Input Temperature", color = '#c70e04')
ax2.plot(time, Continuous_Array, label= "Continuous Power", color = '#040ec7')
ax2.plot(time, discretized_Array, label= "Discrete Power", color = '#539600')
fig.legend(loc = "upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)
logging.info("Done!")
logging.info("---")
return
有什么想法可以像第二种情况一样获得合理的离散化吗?
如果我在评论中描述的是问题所在,有几个选项可以解决这个问题:
- 什么都不做:根据离散化的原因,您可能希望离散值准确反映连续值
- 更改垃圾箱:您可以移动垃圾箱或更改垃圾箱数量 ,这样相对 'flat' 部分蓝线留在一个箱子内,因此在这些部分也给出一条平坦的绿线,这在视觉上会更令人愉悦,就像在你的第二个情节中一样。
以下解决方案给出了您需要的确切结果。
基本上,该算法会找到一条理想线,并尝试使用较少的数据点尽可能地复制它。它从边缘(直线)的 2 个点开始,然后在中心添加一个,然后检查哪一侧的误差最大,并在其中心添加一个点,依此类推,直到达到所需的 bin 计数.简单:)
import warnings
warnings.simplefilter('ignore', np.RankWarning)
def line_error(x0, y0, x1, y1, ideal_line, integral_points=100):
"""Assume a straight line between (x0,y0)->(x1,p1). Then sample the perfect line multiple times and compute the distance."""
straight_line = np.poly1d(np.polyfit([x0, x1], [y0, y1], 1))
xs = np.linspace(x0, x1, num=integral_points)
ys = straight_line(xs)
perfect_ys = ideal_line(xs)
err = np.abs(ys - perfect_ys).sum() / integral_points * (x1 - x0) # Remove (x1 - x0) to only look at avg errors
return err
def discretize_bisect(xs, ys, bin_count):
"""Returns xs and ys of discrete points"""
# For a large number of datapoints, without loss of generality you can treat xs and ys as bin edges
# If it gives bad results, you can edges in many ways, e.g. with np.polyline or np.histogram_bin_edges
ideal_line = np.poly1d(np.polyfit(xs, ys, 50))
new_xs = [xs[0], xs[-1]]
new_ys = [ys[0], ys[-1]]
while len(new_xs) < bin_count:
errors = []
for i in range(len(new_xs)-1):
err = line_error(new_xs[i], new_ys[i], new_xs[i+1], new_ys[i+1], ideal_line)
errors.append(err)
max_segment_id = np.argmax(errors)
new_x = (new_xs[max_segment_id] + new_xs[max_segment_id+1]) / 2
new_y = ideal_line(new_x)
new_xs.insert(max_segment_id+1, new_x)
new_ys.insert(max_segment_id+1, new_y)
return new_xs, new_ys
BIN_COUNT = 25
new_xs, new_ys = discretize_bisect(xs, ys, BIN_COUNT)
plot_graph(xs, ys, new_xs, new_ys, f"Discretized and Continuous comparison, N(cont) = {N_MOCK}, N(disc) = {BIN_COUNT}")
print("Bin count:", len(new_xs))
此外,这是我测试过的简化绘图功能。
def plot_graph(cont_time, cont_array, disc_time, disc_array, plot_name):
"""A simplified version of the provided plotting function"""
# Setting Axis properties and titles
fig, ax = plt.subplots(figsize=(20, 4))
ax.set_title(plot_name)
ax.set_xlabel('Time [s]')
ax.set_ylabel('DC Power [%]')
# Plotting stuff
ax.plot(cont_time, cont_array, label="Continuous Power", color='#0000ff')
ax.plot(disc_time, disc_array, label="Discrete Power", color='#00ff00')
fig.legend(loc="upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)
最后,Google Colab
我有一个连续输入函数,我想将其离散化为 1 和 0 之间的 5-10 个离散分箱。现在我正在使用 np.digitize
并将输出分箱重新调整为 0-1。现在的问题是有时数据集(蓝线)会产生这样的结果:
我尝试增加离散化箱的数量,但我最终保持相同的噪声并获得更多增量。作为算法使用相同设置但使用另一个数据集的示例:
这是我在那里使用的代码NumOfDisc
= 垃圾箱数量
intervals = np.linspace(0,1,NumOfDisc)
discretized_Array = np.digitize(Continuous_Array, intervals)
图中的红色线并不重要。连续的蓝线是我尝试离散化的,绿线是离散化的 result.The 图是使用 matplotlyib.pyplot 使用以下代码创建的:
def CheckPlots(discretized_Array, Continuous_Array, Temperature, time, PlotName)
logging.info("Plotting...")
#Setting Axis properties and titles
fig, ax = plt.subplots(1, 1)
ax.set_title(PlotName)
ax.set_ylabel('Temperature [°C]')
ax.set_ylim(40, 110)
ax.set_xlabel('Time [s]')
ax.grid(b=True, which="both")
ax2=ax.twinx()
ax2.set_ylabel('DC Power [%]')
ax2.set_ylim(-1.5,3.5)
#Plotting stuff
ax.plot(time, Temperature, label= "Input Temperature", color = '#c70e04')
ax2.plot(time, Continuous_Array, label= "Continuous Power", color = '#040ec7')
ax2.plot(time, discretized_Array, label= "Discrete Power", color = '#539600')
fig.legend(loc = "upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)
logging.info("Done!")
logging.info("---")
return
有什么想法可以像第二种情况一样获得合理的离散化吗?
如果我在评论中描述的是问题所在,有几个选项可以解决这个问题:
- 什么都不做:根据离散化的原因,您可能希望离散值准确反映连续值
- 更改垃圾箱:您可以移动垃圾箱或更改垃圾箱数量 ,这样相对 'flat' 部分蓝线留在一个箱子内,因此在这些部分也给出一条平坦的绿线,这在视觉上会更令人愉悦,就像在你的第二个情节中一样。
以下解决方案给出了您需要的确切结果。
基本上,该算法会找到一条理想线,并尝试使用较少的数据点尽可能地复制它。它从边缘(直线)的 2 个点开始,然后在中心添加一个,然后检查哪一侧的误差最大,并在其中心添加一个点,依此类推,直到达到所需的 bin 计数.简单:)
import warnings
warnings.simplefilter('ignore', np.RankWarning)
def line_error(x0, y0, x1, y1, ideal_line, integral_points=100):
"""Assume a straight line between (x0,y0)->(x1,p1). Then sample the perfect line multiple times and compute the distance."""
straight_line = np.poly1d(np.polyfit([x0, x1], [y0, y1], 1))
xs = np.linspace(x0, x1, num=integral_points)
ys = straight_line(xs)
perfect_ys = ideal_line(xs)
err = np.abs(ys - perfect_ys).sum() / integral_points * (x1 - x0) # Remove (x1 - x0) to only look at avg errors
return err
def discretize_bisect(xs, ys, bin_count):
"""Returns xs and ys of discrete points"""
# For a large number of datapoints, without loss of generality you can treat xs and ys as bin edges
# If it gives bad results, you can edges in many ways, e.g. with np.polyline or np.histogram_bin_edges
ideal_line = np.poly1d(np.polyfit(xs, ys, 50))
new_xs = [xs[0], xs[-1]]
new_ys = [ys[0], ys[-1]]
while len(new_xs) < bin_count:
errors = []
for i in range(len(new_xs)-1):
err = line_error(new_xs[i], new_ys[i], new_xs[i+1], new_ys[i+1], ideal_line)
errors.append(err)
max_segment_id = np.argmax(errors)
new_x = (new_xs[max_segment_id] + new_xs[max_segment_id+1]) / 2
new_y = ideal_line(new_x)
new_xs.insert(max_segment_id+1, new_x)
new_ys.insert(max_segment_id+1, new_y)
return new_xs, new_ys
BIN_COUNT = 25
new_xs, new_ys = discretize_bisect(xs, ys, BIN_COUNT)
plot_graph(xs, ys, new_xs, new_ys, f"Discretized and Continuous comparison, N(cont) = {N_MOCK}, N(disc) = {BIN_COUNT}")
print("Bin count:", len(new_xs))
此外,这是我测试过的简化绘图功能。
def plot_graph(cont_time, cont_array, disc_time, disc_array, plot_name):
"""A simplified version of the provided plotting function"""
# Setting Axis properties and titles
fig, ax = plt.subplots(figsize=(20, 4))
ax.set_title(plot_name)
ax.set_xlabel('Time [s]')
ax.set_ylabel('DC Power [%]')
# Plotting stuff
ax.plot(cont_time, cont_array, label="Continuous Power", color='#0000ff')
ax.plot(disc_time, disc_array, label="Discrete Power", color='#00ff00')
fig.legend(loc="upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)
最后,Google Colab