内置聚合和转换原语列表
list of built-in aggregate and transform primitives
首先,我喜欢功能工具。它使我的工作变得更加轻松和高效。一个简单的问题:我只是在寻找一份完整的非自定义 agg 和 trans 原语列表,但似乎找不到。我是否只使用 API 中的方法列表并将大写字母(和之间的下划线)替换为小写字母?
如果你运行 featuretools.list_primitives()
,它return是一个包含所有基元名称的数据框。 "name"列中的字符串可以提供给ft.dfs
>>> import featuretools as ft
>>> ft.list_primitives()
name type description
0 percent_true aggregation Determines the percent of `True` values.
1 last aggregation Determines the last value in a list.
2 num_true aggregation Counts the number of `True` values.
3 std aggregation Computes the dispersion relative to the mean v...
4 num_unique aggregation Determines the number of distinct values, igno...
5 sum aggregation Calculates the total addition, ignoring `NaN`.
6 skew aggregation Computes the extent to which a distribution di...
7 mode aggregation Determines the most commonly repeated value.
8 time_since_first aggregation Calculates the time elapsed since the first da...
9 max aggregation Calculates the highest value, ignoring `NaN` v...
10 median aggregation Determines the middlemost number in a list of ...
11 mean aggregation Computes the average for a list of values.
12 time_since_last aggregation Calculates the time elapsed since the last dat...
此外,您还可以直接导入并传递原语class。例如,这两个调用是等价的。
>>> from featuretools.primitives import Max, TimeSincePrevious
>>> ft.dfs(agg_primtives=[Max, TimeSincePrevious], ...)
>>> ft.dfs(agg_primtives=["max", "time_since_previous"], ...)
如果需要修改可控参数,导入原始对象会很有帮助。例如,使 TimeSincePrevious
return 以小时为单位(默认为秒)
>>> ft.dfs(agg_primtives=[Max, TimeSincePrevious(unit="hours")], ...)
首先,我喜欢功能工具。它使我的工作变得更加轻松和高效。一个简单的问题:我只是在寻找一份完整的非自定义 agg 和 trans 原语列表,但似乎找不到。我是否只使用 API 中的方法列表并将大写字母(和之间的下划线)替换为小写字母?
如果你运行 featuretools.list_primitives()
,它return是一个包含所有基元名称的数据框。 "name"列中的字符串可以提供给ft.dfs
>>> import featuretools as ft
>>> ft.list_primitives()
name type description
0 percent_true aggregation Determines the percent of `True` values.
1 last aggregation Determines the last value in a list.
2 num_true aggregation Counts the number of `True` values.
3 std aggregation Computes the dispersion relative to the mean v...
4 num_unique aggregation Determines the number of distinct values, igno...
5 sum aggregation Calculates the total addition, ignoring `NaN`.
6 skew aggregation Computes the extent to which a distribution di...
7 mode aggregation Determines the most commonly repeated value.
8 time_since_first aggregation Calculates the time elapsed since the first da...
9 max aggregation Calculates the highest value, ignoring `NaN` v...
10 median aggregation Determines the middlemost number in a list of ...
11 mean aggregation Computes the average for a list of values.
12 time_since_last aggregation Calculates the time elapsed since the last dat...
此外,您还可以直接导入并传递原语class。例如,这两个调用是等价的。
>>> from featuretools.primitives import Max, TimeSincePrevious
>>> ft.dfs(agg_primtives=[Max, TimeSincePrevious], ...)
>>> ft.dfs(agg_primtives=["max", "time_since_previous"], ...)
如果需要修改可控参数,导入原始对象会很有帮助。例如,使 TimeSincePrevious
return 以小时为单位(默认为秒)
>>> ft.dfs(agg_primtives=[Max, TimeSincePrevious(unit="hours")], ...)