如何在 python 中查找数据的变化点

How to find changepoints in data in python

我有很多文件,它们的情节是这样的:

我想在 python 中找到变化点,但我找不到任何合适的包来尽快正确地完成这项工作。 我使用了 ruptures 和 changefinder 包,但它们没有解决我的问题。 这是我试图找到变化点(用逗号分隔)的文件的 link,采样率为 1000 Hz txt File of above image

也就是说,我想找到下图中红线对应的数组元素的索引:

试试这个

def avg(listi):
    s = 0
    for i in listi:
        s += i
    return s / len(listi)

def chunk(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

file = open('test.txt', 'r')
list_of_numbers = [float(x.strip()) for x in file.readline().split(',')]
file.close()


chunk_size = 100
chunks = list(chunk(list_of_numbers, chunk_size))
diff_treshold = 0.1
last_chunk_avg = avg(chunks[0])
i = 0
cnt = 0
changepoints = []
for chunki in chunks:
    this_chunk_avg = avg(chunki)
    if abs(this_chunk_avg - last_chunk_avg) > diff_treshold:
        # print(int((i * chunk_size) + (chunk_size / 2)))
        changepoints.append(int((i * chunk_size) + (chunk_size / 2)))
        cnt += 1
    i += 1
    last_chunk_avg = this_chunk_avg

print("Count of changepoints: ", cnt)
print("Changepoints: ", changepoints)

我为您的文件获得了以下输出:

Count of changepoints:  20
Changepoints:  [20550, 39050, 39150, 44750, 44850, 52650, 52750, 57550, 57650, 71850, 71950, 81050, 81150, 90250, 90350, 105950, 106050, 119550, 125150, 125250]

您可以更改 chunk_sizediff_treshold 以调整对变化的敏感度