以比较信号为目的的标准化互相关的基础知识

Basics of Normalizing Cross-Correlation with a View to Comparing Signals

我正在尝试了解如何使用互相关来确定两个信号的相似性。 This tutorial offers a very clear explanation of the basics, but I still don't understand how to use normalization effectively to prevent strong signals from dominating the cross-correlation measure when you have signals with different energy levels. The same tutor, David Dorran, discusses the issue of normalization here,并解释了如何使用点积对相关性进行归一化,但我仍有一些疑问。

我写了这个 python 例程来对来自一组信号的每对信号进行互相关:

import numpy as np
import pandas as pd

def mycorrelate2d(df, normalized=False):
    # initialize cross correlation matrix with zeros
    ccm = np.zeros(shape=df.shape, dtype=list)
    for i, row_dict1 in enumerate(
                        df.to_dict(orient='records')):
        outer_row = list(row_dict1.values())    
        for j, row_dict2 in enumerate(
                            df.to_dict(orient='records')):
            inner_row = list(row_dict2.values())   
            x = np.correlate(inner_row, outer_row)
            if normalized:
                n = np.dot(inner_row, outer_row)                
                x = x / n
            ccm[i][j] = x
    return ccm

假设我有 3 个幅度递增的信号: [1、2、3]、[4、5、6] 和 [7、8、9]

我想将这三个信号互相关以查看哪些对相似,但是当我将这 3 个信号传递到我编写的例程中时,我似乎没有得到相似性的度量。互相关值的大小只是能量信号的函数。时期。甚至信号 与自身 的互相关产生的值也低于同一信号与另一个更高能量信号的互相关。

df_x3 = pd.DataFrame(
        np.array([[1, 2, 3], 
                  [4, 5, 6], 
                  [7, 8, 9]]).reshape(3, -1))
mycorrelate2d(df_x3)

这产生:

array([[array([ 3,  8, 14,  8,  3]), 
        array([12, 23, 32, 17,  6]),
        array([21, 38, 50, 26,  9])],
       [array([ 6, 17, 32, 23, 12]), 
        array([24, 50, 77, 50, 24]),
        array([ 42,  83, 122,  77,  36])],
       [array([ 9, 26, 50, 38, 21]), 
        array([ 36,  77, 122,  83,  42]),
        array([ 63, 128, 194, 128,  63])]], dtype=object)

现在,我传入相同的 3 个信号,但这次我表示我想要标准化结果:

mycorrelate2d(df_x3, normalized=True)

这产生:

array([[array([ 0.2142, 0.5714,  1., 0.5714, 0.2142]),
        array([ 0.375,  0.71875, 1., 0.5312, 0.1875]),
        array([ 0.42,   0.76,    1., 0.52,   0.18])],
       [array([ 0.1875, 0.5312,  1., 0.7187, 0.375]),
        array([ 0.3116, 0.6493,  1., 0.6493, 0.3116]),
        array([ 0.3442, 0.6803,  1., 0.6311, 0.2950])],
       [array([ 0.18,   0.52,    1., 0.76,   0.42]),
        array([ 0.2950, 0.6311,  1., 0.6803, 0.3442]),
        array([ 0.3247, 0.6597,  1., 0.6597, 0.3247])]],
        dtype=object)

现在所有的最大值都是1!!因此,我们从拥有基于虚假差异的最大值转变为完全没有最大值之间的差异!我欣然承认,我不明白如何使用互相关来检测信号之间的相似性。比较信号与互相关的人的分析工作流程是什么?

看看

所以您用于标准化的公式不太正确。归一化发生在我们在 NCC 中进行关联之前,然后我们将答案除以向量长度,如维基百科公式 https://en.wikipedia.org/wiki/Cross-correlation#Zero-normalized_cross-correlation_(ZNCC)

所示

所以你需要像

这样的东西
import numpy as np


def mycorrelate2d(df, normalized=False):
    # initialize cross correlation matrix with zeros
    ccm = np.zeros((3,3))
    for i in range(3):
        outer_row = df[i][:]
        for j in range(3):
            inner_row = df[j][:]
            if(not normalized):
                x = np.correlate(inner_row, outer_row)
            else:
                a = (inner_row-np.mean(inner_row))/(np.std(inner_row)*len(inner_row))
                #print(a)
                b = (outer_row-np.mean(outer_row))/(np.std(outer_row))
                #print(b)
                x = np.correlate(a,b)
            ccm[i][j] = x
    return ccm

df_x3 =np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]]).reshape(3, -1)
print(mycorrelate2d(df_x3,True))
df_x3 =np.array([[1, 2, 3],
                  [9, 5, 6],
                  [74, 8, 9]]).reshape(3, -1)
print(mycorrelate2d(df_x3,True))

输出为

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[ 1.         -0.72057669 -0.85933941]
 [-0.72057669  1.          0.97381599]
 [-0.85933941  0.97381599  1.        ]]