如何从图表中获取第二个 derivative/dip 或生成最佳 eps 值
How to get the second derivative/dip from the graph or generate the best eps value
数据集如下
,id,revenue ,profit
0,101,779183,281257
1,101,144829,838451
2,101,766465,757565
3,101,353297,261071
4,101,1615461,275760
5,101,246731,949229
6,101,951518,301016
7,101,444669,430583
代码如下
import pandas as pd;
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import StandardScaler
import seaborn as sns
from sklearn.neighbors import NearestNeighbors
df = pd.read_csv('1.csv',index_col=None)
df1 = StandardScaler().fit_transform(df)
dbsc = DBSCAN(eps = 2.5, min_samples = 20).fit(df1)
labels = dbsc.labels_
我df的形状是1999
我从下面的方法得到了 dip 值 eps
值,从图中可以清楚地看出 eps=2.5
下面是找到最佳 eps 值的方法
ns = 5
nbrs = NearestNeighbors(n_neighbors=ns).fit(df3)
distances, indices = nbrs.kneighbors(df3)
distanceDec = sorted(distances[:,ns-1], reverse=True)
plt.plot(indices[:,0], distanceDec)
#plt.plot(list(range(1,2000)), distanceDec)
- 如何通过系统自动找到图形中的 dip 均值最好
eps
是预期的?不看图表,我的系统必须告诉最好 eps
如果我没理解错的话,您正在寻找出现在 ε(x) 图中的拐点的精确 y 值(应该是2.0左右)吧?
如果这是正确的,即 ε(x) 您的曲线,问题将简化为:
- 计算曲线的二阶导数:ε''(x).
- 找到这样的二阶导数的零(或零):x0.
- 恢复优化的 ε 值,只需将零插入曲线即可:ε(x0).
在此附上我的答案,基于另外两个 Stack Overflow 答案:
(计算数组的导数)
(在数组中查找零)
import numpy as np
import matplotlib.pyplot as plt
# Generating x data range from -1 to 4 with a step of 0.01
x = np.arange(-1, 4, 0.01)
# Simulating y data with an inflection point as y(x) = x³ - 5x² + 2x
y = x**3 - 5*x**2 + 2*x
# Plotting your curve
plt.plot(x, y, label="y(x)")
# Computing y 1st derivative of your curve with a step of 0.01 and plotting it
y_1prime = np.gradient(y, 0.01)
plt.plot(x, y_1prime, label="y'(x)")
# Computing y 2nd derivative of your curve with a step of 0.01 and plotting it
y_2prime = np.gradient(y_1prime, 0.01)
plt.plot(x, y_2prime, label="y''(x)")
# Finding the index of the zero (or zeroes) of your curve
x_zero_index = np.where(np.diff(np.sign(y_2prime)))[0]
# Finding the x value of the zero of your curve
x_zero_value = x[x_zero_index][0]
# Finding the y value corresponding to the x value of the zero
y_zero_value = y[x_zero_index][0]
# Reporting
print(f'The inflection point of your curve is {y_zero_value:.3f}.')
无论如何,请记住拐点(2.0 左右)与出现在 2.5 左右的“下降”点不匹配。
数据集如下
,id,revenue ,profit
0,101,779183,281257
1,101,144829,838451
2,101,766465,757565
3,101,353297,261071
4,101,1615461,275760
5,101,246731,949229
6,101,951518,301016
7,101,444669,430583
代码如下
import pandas as pd;
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import StandardScaler
import seaborn as sns
from sklearn.neighbors import NearestNeighbors
df = pd.read_csv('1.csv',index_col=None)
df1 = StandardScaler().fit_transform(df)
dbsc = DBSCAN(eps = 2.5, min_samples = 20).fit(df1)
labels = dbsc.labels_
我df的形状是1999
我从下面的方法得到了 dip 值 eps
值,从图中可以清楚地看出 eps=2.5
下面是找到最佳 eps 值的方法
ns = 5
nbrs = NearestNeighbors(n_neighbors=ns).fit(df3)
distances, indices = nbrs.kneighbors(df3)
distanceDec = sorted(distances[:,ns-1], reverse=True)
plt.plot(indices[:,0], distanceDec)
#plt.plot(list(range(1,2000)), distanceDec)
- 如何通过系统自动找到图形中的 dip 均值最好
eps
是预期的?不看图表,我的系统必须告诉最好eps
如果我没理解错的话,您正在寻找出现在 ε(x) 图中的拐点的精确 y 值(应该是2.0左右)吧?
如果这是正确的,即 ε(x) 您的曲线,问题将简化为:
- 计算曲线的二阶导数:ε''(x).
- 找到这样的二阶导数的零(或零):x0.
- 恢复优化的 ε 值,只需将零插入曲线即可:ε(x0).
在此附上我的答案,基于另外两个 Stack Overflow 答案: (计算数组的导数) (在数组中查找零)
import numpy as np
import matplotlib.pyplot as plt
# Generating x data range from -1 to 4 with a step of 0.01
x = np.arange(-1, 4, 0.01)
# Simulating y data with an inflection point as y(x) = x³ - 5x² + 2x
y = x**3 - 5*x**2 + 2*x
# Plotting your curve
plt.plot(x, y, label="y(x)")
# Computing y 1st derivative of your curve with a step of 0.01 and plotting it
y_1prime = np.gradient(y, 0.01)
plt.plot(x, y_1prime, label="y'(x)")
# Computing y 2nd derivative of your curve with a step of 0.01 and plotting it
y_2prime = np.gradient(y_1prime, 0.01)
plt.plot(x, y_2prime, label="y''(x)")
# Finding the index of the zero (or zeroes) of your curve
x_zero_index = np.where(np.diff(np.sign(y_2prime)))[0]
# Finding the x value of the zero of your curve
x_zero_value = x[x_zero_index][0]
# Finding the y value corresponding to the x value of the zero
y_zero_value = y[x_zero_index][0]
# Reporting
print(f'The inflection point of your curve is {y_zero_value:.3f}.')
无论如何,请记住拐点(2.0 左右)与出现在 2.5 左右的“下降”点不匹配。