如何知道 sktime 的 TimeSeriesForestClassifier 中使用的特征是从输入的哪个区间计算的

How to know from which interval of the input the features used in sktime's TimeSeriesForestClassifier are calculated

我使用 sktime 库的 TimeSeriesForestClassifier class 执行多变量时间序列 class化。

代码如下

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator

X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

steps = [
    ("concatenate", ColumnConcatenator()),
    ("classify", TimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

我想查看feature_importances_的值,它不是与输入长度相同,而是与特征数长度相同的数组。

clf.steps[1][1].feature_importances_

我想知道每个重要性对应于输入的哪一部分。有什么方法可以获取有关 TimeSeriesForestClassifier 从输入的哪一部分计算特征的信息?

您可以从以下位置获取集合中每棵树的间隔(开始和结束索引):

clf.steps[1][1].intervals_

sktime 现在也有更新的 Canonical Interval Forecast.

的实现

当我们第一次实现时间序列森林算法时,我们最终得到了两个版本。您使用的是推荐版本,但旧版本为特征重要性图提供了自己的功能(见下文)。

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.compose import ComposableTimeSeriesForestClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator

X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

steps = [
    ("concatenate", ColumnConcatenator()),
    ("classify", ComposableTimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

clf.steps[-1][-1].feature_importances_.rename(columns={"_slope": "slope"}).plot(xlabel="time", ylabel="feature importance")

注意特征重要性的计算和解释中的一些细微问题。相关问题在这里: