找到我的数据中斜率变化的位置作为可以轻松索引和提取的参数

Find where the slope changes in my data as a parameter that can be easily indexed and extracted

我有以下数据:

0.8340502011561366 0.8423491600218922 0.8513456021654467 0.8458192388553084 0.8440111276014195 0.8489589671423143 0.8738088120491972 0.8845129900705279 0.8988298998926688 0.924633964692693 0.9544790734065157 0.9908034431246875 1.0236430466543138 1.061619773027915 1.1050038249835414 1.1371449802490126 1.1921182610371368 1.2752207659022576 1.344047620255176 1.4198117350668353 1.507943067143741 1.622137968203745 1.6814098429502085 1.7646810054280595 1.8485457435775694 1.919591124757554 1.9843144220593145 2.030158014640226 2.018184122476175 2.0323466012624207 2.0179200409023874 2.0316932950853723 2.013683870089898 2.03010703506514 2.0216151623726977 2.038855467786505 2.0453923522466093 2.03759031642753 2.019424996752278 2.0441806106428606 2.0607521369415136 2.059310067318373 2.0661157975162485 2.053216429539864 2.0715123971225564 2.0580473413362075 2.055814512721712 2.0808278560688964 2.0601637029377113 2.0539429365156003 2.0609648613513754 2.0585135712612646 2.087674625814453 2.062482961966647 2.066476100210777 2.0568444178944967 2.0587903943282266 2.0506399365756396

绘制的数据如下所示:

我想找到斜率符号变化的点(我用黑色圈了它。应该在索引 26 附近):

我需要为数百个文件找到这个更改点。到目前为止,我尝试了这个 post:

的建议

我认为由于我的数据有点嘈杂,所以我无法在斜率变化中实现平滑过渡。

这是我目前试过的代码:

import numpy as np

#load 1-D data file
file = str(sys.argv[1])
y = np.loadtxt(file)

#create X based on file length
x = np.linspace(1,len(y), num=len(y))

Find first derivative:

m = np.diff(y)/np.diff(x)
print(m)

#Find second derivative
b = np.diff(m)
print(b)
#find Index

index = 0
for difference in b:
    index += 1
    if difference < 0: 
        print(index, difference)

因为我的数据有噪音,所以我在我想要的索引之前得到了一些负值。在这种情况下,我希望它检索的索引大约是 26(这是我的数据变为常量的地方)。有人对我可以做些什么来解决这个问题有什么建议吗?谢谢!

当一阶导数改变符号时,也就是斜率符号改变的时候。我不认为你需要二阶导数,除非你想确定斜率的变化率。你也没有得到二阶导数。你只是得到一阶导数的差异。

此外,您似乎在指定任意 x 值。如果你 y-values 表示等间距的点,那没关系,否则导数将是错误的。

这是一个如何获得第一和第二 der 的示例...



import numpy as np

x = np.linspace(1, 100, 1000)

y = np.cos(x)

# Find first derivative:
m = np.diff(y)/np.diff(x)

#Find second derivative
m2 = np.diff(m)/np.diff(x[:-1])

print(m)
print(m2)

# Get x-values where slope sign changes

c = len(m)

changes_index = []
for i in range(1, c):
    prev_val = m[i-1]
    val = m[i]
    if prev_val < 0 and val > 0:
        changes_index.append(i)
    elif prev_val > 0 and val < 0:
        changes_index.append(i)

for i in changes_index:
    print(x[i])


注意我不得不减少第二个 der 的 x 值。那是因为 np.diff() returns 比原来的输入少了一个点。

梯度方法在这种情况下没有用,因为您不关心速度或矢量场。梯度的知识不会添加额外的信息来定位最大值,因为 运行 总是正的,因此不会影响梯度的符号。建议使用完全基于 raise 的方法。

检测数据正在减少的索引,找到它们与最大值位置之间的差异。然后通过索引操作,您可以找到数据具有最大值的值。

data = '0.8340502011561366 0.8423491600218922 0.8513456021654467 0.8458192388553084 0.8440111276014195 0.8489589671423143 0.8738088120491972 0.8845129900705279 0.8988298998926688 0.924633964692693 0.9544790734065157 0.9908034431246875 1.0236430466543138 1.061619773027915 1.1050038249835414 1.1371449802490126 1.1921182610371368 1.2752207659022576 1.344047620255176 1.4198117350668353 1.507943067143741 1.622137968203745 1.6814098429502085 1.7646810054280595 1.8485457435775694 1.919591124757554 1.9843144220593145 2.030158014640226 2.018184122476175 2.0323466012624207 2.0179200409023874 2.0316932950853723 2.013683870089898 2.03010703506514 2.0216151623726977 2.038855467786505 2.0453923522466093 2.03759031642753 2.019424996752278 2.0441806106428606 2.0607521369415136 2.059310067318373 2.0661157975162485 2.053216429539864 2.0715123971225564 2.0580473413362075 2.055814512721712 2.0808278560688964 2.0601637029377113 2.0539429365156003 2.0609648613513754 2.0585135712612646 2.087674625814453 2.062482961966647 2.066476100210777 2.0568444178944967 2.0587903943282266 2.0506399365756396'

data = data.split()
import numpy as np

a = np.array(data, dtype=float)

diff = np.diff(a)

neg_indeces = np.where(diff<0)[0]
neg_diff = np.diff(neg_indeces)

i_max_dif = np.where(neg_diff == neg_diff.max())[0][0] + 1

i_max = neg_indeces[i_max_dif] - 1 # because aise as a difference of two consecutive values

print(i_max, a[i_max])

输出

26 1.9843144220593145

一些细节

print(neg_indeces) # all indeces of the negative values in the data
# [ 2  3 27 29 31 33 36 37 40 42 44 45 47 48 50 52 54 56]
print(neg_diff) # difference between such indices
# [ 1 24  2  2  2  3  1  3  2  2  1  2  1  2  2  2  2]
print(neg_diff.max()) # value with highest difference
# 24
print(i_max_dif) # location of the max index of neg_indeces -> 27
# 2
print(i_max) # index of the max of the origonal data
# 26