未在 featuretools 中为我的 Entityset Set Up 生成功能
Features are not being generated for my Entityset Set Up in featuretools
我在尝试创建实体集之间的关系(使用我自己的数据)时遇到了问题。没有错误,但它只是没有为我的实体之一("prods" 实体)创建功能,尽管一切都应该连接得很好。
我不能分享我的数据,但我用一些模拟数据创建了一个最小的例子:
import pandas as pd
import featuretools as ft
创建模拟数据
cust = pd.DataFrame([[1,50],[2,60]],
columns=['CUST_ID','AGE'])#
orders = pd.DataFrame([[1,1,50,33.0],[2,1,60,20],[3,2,66,999.9]],
columns=['ORD_ID','CUST_ID','QTY','PRICE'])
order_items = pd.DataFrame([[1,1,1,2,3.0],[2,2,2,8,5.0],[3,2,1,2,3.0],[4,3,3,2,3.0]],
columns=['ORD_ITM_ID','ORD_ID','PROD_ID','QTY','PRICE'])
prods = pd.DataFrame([[1,3.0],[2,5.0],[3,3.0]],
columns=['PROD_ID','PRICE'])
定义实体集
es = ft.EntitySet('test')
## Adding Customers Entity
es.entity_from_dataframe(dataframe=cust,
entity_id='cust',
index='CUST_ID')
## Adding Orders Entity
es.entity_from_dataframe(dataframe=orders,
entity_id='orders',
index='ORD_ID')
## Adding Order Items Entity
es.entity_from_dataframe(dataframe=order_items,
entity_id='order_items',
index='ORD_ITM_ID')
## Adding Products Entity
es.entity_from_dataframe(dataframe=prods,
entity_id='prods',
index='PROD_ID')
建立关系
customer_relationship = ft.Relationship(es["cust"]["CUST_ID"],
es["orders"]["CUST_ID"])
orderitems_relationship = ft.Relationship(es["orders"]["ORD_ID"],
es["order_items"]["ORD_ID"])
products_relationship = ft.Relationship(es["prods"]["PROD_ID"],
es["order_items"]["PROD_ID"])
### Add Relationships
es = es.add_relationship(customer_relationship)
es = es.add_relationship(orderitems_relationship)
es = es.add_relationship(products_relationship)
生成特征
feature_defs = ft.dfs(entityset=es,
target_entity="cust",
agg_primitives=["count", "sum"],
verbose = True,
features_only = True)
## Show features
feature_defs
输出:
Built 7 features
[<Feature: AGE>,
<Feature: COUNT(order_items)>,
<Feature: SUM(orders.QTY)>,
<Feature: SUM(orders.PRICE)>,
<Feature: SUM(order_items.QTY)>,
<Feature: COUNT(orders)>,
<Feature: SUM(order_items.PRICE)>]
这也应该向我展示产品变量的特征,但它没有。
所以我期望的是 SUM 会对每个客户的产品价格求和。反而什么都没有。
最终,我想为有趣的值创建特征。但是由于没有显示产品变量,因此添加有趣的值也不起作用。
## Get All Product IDs
interesting_products = es["prods"].df.PROD_ID.unique()
es["prods"]["PROD_ID"].interesting_values=interesting_products
feature_defs = ft.dfs(entityset=es,
target_entity="cust",
agg_primitives=["count", "sum"],
where_primitives=["count", "sum"],
verbose = True,
features_only = True)
## Show features
feature_defs
输出:
Built 7 features
[<Feature: AGE>,
<Feature: COUNT(order_items)>,
<Feature: SUM(orders.QTY)>,
<Feature: SUM(orders.PRICE)>,
<Feature: SUM(order_items.QTY)>,
<Feature: COUNT(orders)>,
<Feature: SUM(order_items.PRICE)>]
希望有人能帮忙:)
产品未显示的原因是因为从它创建的任何特征的深度都是 3。您可以使用 max_depth
参数控制 ft.dfs
中的深度 [=18] =]
feature_defs = ft.dfs(entityset=es,
target_entity="cust",
agg_primitives=["count", "sum"],
verbose = True,
max_depth=3, # add max_depth
features_only = True)
现在返回的特征是
[<Feature: AGE>,
<Feature: SUM(order_items.QTY)>,
<Feature: SUM(order_items.PRICE)>,
<Feature: SUM(orders.PRICE)>,
<Feature: SUM(orders.QTY)>,
<Feature: COUNT(order_items)>,
<Feature: COUNT(orders)>,
<Feature: SUM(order_items.prods.PRICE)>]
您可以在最后看到SUM(order_items.prods.PRICE)
使用产品价格。
要使 where 子句起作用,请改为将有趣的值添加到 order_items
实体。
interesting_products = es["prods"].df.PROD_ID.unique()
es["order_items"]["PROD_ID"].interesting_values=interesting_products
feature_defs = ft.dfs(entityset=es,
target_entity="cust",
agg_primitives=["count", "sum"],
where_primitives=["count", "sum"],
verbose=True,
max_depth=3,
features_only=True)
这创建了 20 个特征,您可以在下面看到
[<Feature: AGE>,
<Feature: SUM(order_items.QTY)>,
<Feature: SUM(order_items.PRICE)>,
<Feature: SUM(orders.PRICE)>,
<Feature: SUM(orders.QTY)>,
<Feature: COUNT(order_items)>,
<Feature: COUNT(orders)>,
<Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 2)>,
<Feature: SUM(order_items.QTY WHERE PROD_ID = 2)>,
<Feature: SUM(order_items.QTY WHERE PROD_ID = 3)>,
<Feature: SUM(order_items.prods.PRICE)>,
<Feature: COUNT(order_items WHERE PROD_ID = 2)>,
<Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 1)>,
<Feature: SUM(order_items.PRICE WHERE PROD_ID = 3)>,
<Feature: COUNT(order_items WHERE PROD_ID = 1)>,
<Feature: COUNT(order_items WHERE PROD_ID = 3)>,
<Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 3)>,
<Feature: SUM(order_items.QTY WHERE PROD_ID = 1)>,
<Feature: SUM(order_items.PRICE WHERE PROD_ID = 2)>,
<Feature: SUM(order_items.PRICE WHERE PROD_ID = 1)>]
我在尝试创建实体集之间的关系(使用我自己的数据)时遇到了问题。没有错误,但它只是没有为我的实体之一("prods" 实体)创建功能,尽管一切都应该连接得很好。
我不能分享我的数据,但我用一些模拟数据创建了一个最小的例子:
import pandas as pd
import featuretools as ft
创建模拟数据
cust = pd.DataFrame([[1,50],[2,60]],
columns=['CUST_ID','AGE'])#
orders = pd.DataFrame([[1,1,50,33.0],[2,1,60,20],[3,2,66,999.9]],
columns=['ORD_ID','CUST_ID','QTY','PRICE'])
order_items = pd.DataFrame([[1,1,1,2,3.0],[2,2,2,8,5.0],[3,2,1,2,3.0],[4,3,3,2,3.0]],
columns=['ORD_ITM_ID','ORD_ID','PROD_ID','QTY','PRICE'])
prods = pd.DataFrame([[1,3.0],[2,5.0],[3,3.0]],
columns=['PROD_ID','PRICE'])
定义实体集
es = ft.EntitySet('test')
## Adding Customers Entity
es.entity_from_dataframe(dataframe=cust,
entity_id='cust',
index='CUST_ID')
## Adding Orders Entity
es.entity_from_dataframe(dataframe=orders,
entity_id='orders',
index='ORD_ID')
## Adding Order Items Entity
es.entity_from_dataframe(dataframe=order_items,
entity_id='order_items',
index='ORD_ITM_ID')
## Adding Products Entity
es.entity_from_dataframe(dataframe=prods,
entity_id='prods',
index='PROD_ID')
建立关系
customer_relationship = ft.Relationship(es["cust"]["CUST_ID"],
es["orders"]["CUST_ID"])
orderitems_relationship = ft.Relationship(es["orders"]["ORD_ID"],
es["order_items"]["ORD_ID"])
products_relationship = ft.Relationship(es["prods"]["PROD_ID"],
es["order_items"]["PROD_ID"])
### Add Relationships
es = es.add_relationship(customer_relationship)
es = es.add_relationship(orderitems_relationship)
es = es.add_relationship(products_relationship)
生成特征
feature_defs = ft.dfs(entityset=es,
target_entity="cust",
agg_primitives=["count", "sum"],
verbose = True,
features_only = True)
## Show features
feature_defs
输出:
Built 7 features
[<Feature: AGE>,
<Feature: COUNT(order_items)>,
<Feature: SUM(orders.QTY)>,
<Feature: SUM(orders.PRICE)>,
<Feature: SUM(order_items.QTY)>,
<Feature: COUNT(orders)>,
<Feature: SUM(order_items.PRICE)>]
这也应该向我展示产品变量的特征,但它没有。
所以我期望的是 SUM 会对每个客户的产品价格求和。反而什么都没有。
最终,我想为有趣的值创建特征。但是由于没有显示产品变量,因此添加有趣的值也不起作用。
## Get All Product IDs
interesting_products = es["prods"].df.PROD_ID.unique()
es["prods"]["PROD_ID"].interesting_values=interesting_products
feature_defs = ft.dfs(entityset=es,
target_entity="cust",
agg_primitives=["count", "sum"],
where_primitives=["count", "sum"],
verbose = True,
features_only = True)
## Show features
feature_defs
输出:
Built 7 features
[<Feature: AGE>,
<Feature: COUNT(order_items)>,
<Feature: SUM(orders.QTY)>,
<Feature: SUM(orders.PRICE)>,
<Feature: SUM(order_items.QTY)>,
<Feature: COUNT(orders)>,
<Feature: SUM(order_items.PRICE)>]
希望有人能帮忙:)
产品未显示的原因是因为从它创建的任何特征的深度都是 3。您可以使用 max_depth
参数控制 ft.dfs
中的深度 [=18] =]
feature_defs = ft.dfs(entityset=es,
target_entity="cust",
agg_primitives=["count", "sum"],
verbose = True,
max_depth=3, # add max_depth
features_only = True)
现在返回的特征是
[<Feature: AGE>,
<Feature: SUM(order_items.QTY)>,
<Feature: SUM(order_items.PRICE)>,
<Feature: SUM(orders.PRICE)>,
<Feature: SUM(orders.QTY)>,
<Feature: COUNT(order_items)>,
<Feature: COUNT(orders)>,
<Feature: SUM(order_items.prods.PRICE)>]
您可以在最后看到SUM(order_items.prods.PRICE)
使用产品价格。
要使 where 子句起作用,请改为将有趣的值添加到 order_items
实体。
interesting_products = es["prods"].df.PROD_ID.unique()
es["order_items"]["PROD_ID"].interesting_values=interesting_products
feature_defs = ft.dfs(entityset=es,
target_entity="cust",
agg_primitives=["count", "sum"],
where_primitives=["count", "sum"],
verbose=True,
max_depth=3,
features_only=True)
这创建了 20 个特征,您可以在下面看到
[<Feature: AGE>,
<Feature: SUM(order_items.QTY)>,
<Feature: SUM(order_items.PRICE)>,
<Feature: SUM(orders.PRICE)>,
<Feature: SUM(orders.QTY)>,
<Feature: COUNT(order_items)>,
<Feature: COUNT(orders)>,
<Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 2)>,
<Feature: SUM(order_items.QTY WHERE PROD_ID = 2)>,
<Feature: SUM(order_items.QTY WHERE PROD_ID = 3)>,
<Feature: SUM(order_items.prods.PRICE)>,
<Feature: COUNT(order_items WHERE PROD_ID = 2)>,
<Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 1)>,
<Feature: SUM(order_items.PRICE WHERE PROD_ID = 3)>,
<Feature: COUNT(order_items WHERE PROD_ID = 1)>,
<Feature: COUNT(order_items WHERE PROD_ID = 3)>,
<Feature: SUM(order_items.prods.PRICE WHERE PROD_ID = 3)>,
<Feature: SUM(order_items.QTY WHERE PROD_ID = 1)>,
<Feature: SUM(order_items.PRICE WHERE PROD_ID = 2)>,
<Feature: SUM(order_items.PRICE WHERE PROD_ID = 1)>]