如何在 python 中查找数据的变化点
How to find changepoints in data in python
我有很多文件,它们的情节是这样的:
我想在 python 中找到变化点,但我找不到任何合适的包来尽快正确地完成这项工作。
我使用了 ruptures 和 changefinder 包,但它们没有解决我的问题。
这是我试图找到变化点(用逗号分隔)的文件的 link,采样率为 1000 Hz
txt File of above image
也就是说,我想找到下图中红线对应的数组元素的索引:
试试这个
def avg(listi):
s = 0
for i in listi:
s += i
return s / len(listi)
def chunk(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
file = open('test.txt', 'r')
list_of_numbers = [float(x.strip()) for x in file.readline().split(',')]
file.close()
chunk_size = 100
chunks = list(chunk(list_of_numbers, chunk_size))
diff_treshold = 0.1
last_chunk_avg = avg(chunks[0])
i = 0
cnt = 0
changepoints = []
for chunki in chunks:
this_chunk_avg = avg(chunki)
if abs(this_chunk_avg - last_chunk_avg) > diff_treshold:
# print(int((i * chunk_size) + (chunk_size / 2)))
changepoints.append(int((i * chunk_size) + (chunk_size / 2)))
cnt += 1
i += 1
last_chunk_avg = this_chunk_avg
print("Count of changepoints: ", cnt)
print("Changepoints: ", changepoints)
我为您的文件获得了以下输出:
Count of changepoints: 20
Changepoints: [20550, 39050, 39150, 44750, 44850, 52650, 52750, 57550, 57650, 71850, 71950, 81050, 81150, 90250, 90350, 105950, 106050, 119550, 125150, 125250]
您可以更改 chunk_size
和 diff_treshold
以调整对变化的敏感度
我有很多文件,它们的情节是这样的:
我想在 python 中找到变化点,但我找不到任何合适的包来尽快正确地完成这项工作。 我使用了 ruptures 和 changefinder 包,但它们没有解决我的问题。 这是我试图找到变化点(用逗号分隔)的文件的 link,采样率为 1000 Hz txt File of above image
也就是说,我想找到下图中红线对应的数组元素的索引:
试试这个
def avg(listi):
s = 0
for i in listi:
s += i
return s / len(listi)
def chunk(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i:i + n]
file = open('test.txt', 'r')
list_of_numbers = [float(x.strip()) for x in file.readline().split(',')]
file.close()
chunk_size = 100
chunks = list(chunk(list_of_numbers, chunk_size))
diff_treshold = 0.1
last_chunk_avg = avg(chunks[0])
i = 0
cnt = 0
changepoints = []
for chunki in chunks:
this_chunk_avg = avg(chunki)
if abs(this_chunk_avg - last_chunk_avg) > diff_treshold:
# print(int((i * chunk_size) + (chunk_size / 2)))
changepoints.append(int((i * chunk_size) + (chunk_size / 2)))
cnt += 1
i += 1
last_chunk_avg = this_chunk_avg
print("Count of changepoints: ", cnt)
print("Changepoints: ", changepoints)
我为您的文件获得了以下输出:
Count of changepoints: 20
Changepoints: [20550, 39050, 39150, 44750, 44850, 52650, 52750, 57550, 57650, 71850, 71950, 81050, 81150, 90250, 90350, 105950, 106050, 119550, 125150, 125250]
您可以更改 chunk_size
和 diff_treshold
以调整对变化的敏感度