featuretools:如何在整数类型的时间索引上应用“time_since”、“time_since_first”原语?

featuretools: how can I apply `time_since`, `time_since_first` primitives on integer type of time index?

当时间索引为整数时(例如每个用户从 0 开始),运行 dfs 显示警告:

UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
  agg_primitives: ['avg_time_between', 'time_since_first', 'time_since_last', 'trend']
  groupby_trans_primitives: ['cum_count', 'time_since', 'time_since_previous']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.

但是,timeindex在很多情况下可以是整数(例如https://www.kaggle.com/c/riiid-test-answer-prediction/data):

在这种情况下,即使我在创建实体集时将timestamp变量设置为ft.variable_types.TimeIndex(numeric_time_index),它仍然显示相同的警告并且没有出现['avg_time_between', 'time_since_first', 'time_since_last', 'trend']生成的特征。

我该如何处理?

感谢提问。 time_sincetime_since_first 原语目前仅用于处理 DatetimeDatetimeTimeIndex 变量。要处理时间索引为数字的情况,您可以创建自定义基元来处理 NumericTimeIndex 变量。

from featuretools.primitives import AggregationPrimitive, TransformPrimitive
from featuretools.variable_types import NumericTimeIndex


class TimeSinceNumeric(TransformPrimitive):
    input_types = [NumericTimeIndex]
    ...


class TimeSinceFirstNumeric(AggregationPrimitive):
    input_types = [NumericTimeIndex]
    ...

然后,您可以将自定义基元直接传递给 DFS。

ft.dfs(
    ...
    trans_primitives=[TimeSinceNumeric],
    agg_primitives=[TimeSinceFirstNumeric],
)