在相似性度量中格式化 DTW 参数的正确方法是什么？

Question

我正在尝试使用相似性度量库中的 DTW 算法。但是，我遇到了一个错误，指出需要二维数组。我不确定我是否了解如何正确格式化数据，文档让我摸不着头脑。

https://github.com/cjekel/similarity_measures/blob/master/docs/similaritymeasures.html

根据文档，该函数采用数据集的两个参数（exp_data 和 num_data），这是有道理的。对我来说没有意义的是：

exp_data : array_like

Curve from your experimental data. exp_data is of (M, N) shape, where M is the number of data points, and N is the number of dimensions

这对于 exp_data 和 num_data 参数都是一样的。

因此，为了进一步说明，假设我正在实施 fastdtw 库。它看起来像这样：

from fastdtw import fastdtw
from scipy.spatial.distance import euclidean

x = np.array([1, 2, 3, 3, 7])
y = np.array([1, 2, 2, 2, 2, 2, 2, 4])

distance, path = fastdtw(x, y, dist=euclidean)

print(distance)
print(path)

或者我可以用 dtaidistance 实现相同的代码：

from dtaidistance import dtw

x = [1, 2, 3, 3, 7]
y = [1, 2, 2, 2, 2, 2, 2, 4]

distance = dtw.distance(x, y)

print(distance)

但是，将同一代码与相似性度量一起使用会导致错误。例如：

import similaritymeasures
import numpy as np

x = np.array([1, 2, 3, 3, 7])
y = np.array([1, 2, 2, 2, 2, 2, 2, 4])

dtw, d = similaritymeasures.dtw(x, y)

print(dtw)
print(d)

所以，我的问题是为什么这里需要二维数组？什么是其他库没有的相似性度量？

如果相似性度量需要 (M, N) 形状的数据，其中 M 是数据点的数量，N 是维度的数量，那么我的数据去哪里了？或者，换句话说，M 是数据点的数量，所以在上面的例子中 x 有 5 个数据点。 N 是维数，在上面的例子中 x 是一维的。那么我要传递它 [5, 1] 吗？由于显而易见的原因，这似乎不正确，但我找不到任何示例代码可以使它更清楚。

我想使用相似性度量的原因是它有多个我想利用的其他函数，例如 Fretchet 距离和 Hausdorff 距离。我真的很想了解如何使用它。

非常感谢任何帮助。

Answer 1

看来我的解决方案是在数组中包含索引。例如，如果您的数据如下所示：

x = [1, 2, 3, 3, 7]
y = [1, 2, 2, 2, 2, 2, 2, 4]

它需要看起来像这样：

x = [[1, 1], [2, 2], [3, 3], [4, 3], [5, 7]]
y = [[1, 1], [2, 2], [3, 2], [4, 2], [5, 2], [6, 2], [7, 2], [8, 4]]

在我的例子中，x 和 y 是 pandas 数据框中的两个独立列。我的解决方案如下：

df['index'] = df.index

x1 = df['index']
y1 = df['column1']
P = np.array([x1, y1]).T

x2 = df['index']
y2 = df['column2']
Q = np.array([x2, y2]).T

dtw, d = similaritymeasures.dtw(P, Q)

print(dtw)

在相似性度量中格式化 DTW 参数的正确方法是什么？

What is the correct way to format the parameters for DTW in Similarity Measures?

python

numpy

time-series

similarity

data-analysis