featuretools 可以用在 vaex dataframe 上吗?
Can featuretools be used on a vaex dataframe?
我正在尝试使用自动化特征工程 - 我已经让它在原始数据帧上工作,但我不确定是否在 vaex 等内存不足的数据帧上进行。我的目的是找到一种在数据帧超出内存时使用自动化特征工程的方法。
我想知道是否有人成功过?这就是我 doing/code:
#playing with vaex
#install items
# !pip install vaex
# !pip install --upgrade ipython
# !pip install numpy --upgrade
#if using colab you may have to restart your runtime
!pip install featuretools
#import items
import featuretools as ft
import vaex
import pandas as pd
vaex.multithreading.thread_count_default = 8
import vaex.ml
# Load the titanic dataset
df = vaex.ml.datasets.load_titanic()
# See the description
df.info()
# let's try to use featuretools to scale out the features on vaex
es = ft.EntitySet(id = 'titanic_data')
es = es.entity_from_dataframe(entity_id = 'df', dataframe = df.drop(['Survived']),
variable_types =
{
'Embarked': ft.variable_types.Categorical,
'Sex': ft.variable_types.Boolean,
'Title': ft.variable_types.Categorical,
'Family_Size': ft.variable_types.Numeric,
'LastName': ft.variable_types.Categorical
},
index = 'PassengerId')
我收到这个错误:
KeyError Traceback (most recent call last)
<ipython-input-6-55607b93fccd> in <module>
1 # let's try to use featuretools to scale out the features on vaex
2 es = ft.EntitySet(id = 'titanic_data')
----> 3 es = es.entity_from_dataframe(entity_id = 'df', dataframe = df.drop(['Survived']),
4 variable_types =
5 {
1 frames
/usr/local/lib/python3.7/dist-packages/vaex/dataframe.py in drop(self, columns, inplace, check)
4758 df._hide_column(column)
4759 else:
-> 4760 df._real_drop(column)
4761 return df
4762
/usr/local/lib/python3.7/dist-packages/vaex/dataframe.py in _real_drop(self, item)
4733 self.column_names.remove(name)
4734 else:
-> 4735 raise KeyError('no such column or virtual_columns named %r' % name)
4736 self.signal_column_changed.emit(self, name, "delete")
4737 if hasattr(self, name):
KeyError: "no such column or virtual_columns named 'Survived'"
是否可以做我正在做的事情?这是我的方法错误吗?或者换一种方式?
目前,Featuretools 不适用于 vaex 数据帧。对 dask 或 koalas 数据帧有一些支持。
我正在尝试使用自动化特征工程 - 我已经让它在原始数据帧上工作,但我不确定是否在 vaex 等内存不足的数据帧上进行。我的目的是找到一种在数据帧超出内存时使用自动化特征工程的方法。
我想知道是否有人成功过?这就是我 doing/code:
#playing with vaex
#install items
# !pip install vaex
# !pip install --upgrade ipython
# !pip install numpy --upgrade
#if using colab you may have to restart your runtime
!pip install featuretools
#import items
import featuretools as ft
import vaex
import pandas as pd
vaex.multithreading.thread_count_default = 8
import vaex.ml
# Load the titanic dataset
df = vaex.ml.datasets.load_titanic()
# See the description
df.info()
# let's try to use featuretools to scale out the features on vaex
es = ft.EntitySet(id = 'titanic_data')
es = es.entity_from_dataframe(entity_id = 'df', dataframe = df.drop(['Survived']),
variable_types =
{
'Embarked': ft.variable_types.Categorical,
'Sex': ft.variable_types.Boolean,
'Title': ft.variable_types.Categorical,
'Family_Size': ft.variable_types.Numeric,
'LastName': ft.variable_types.Categorical
},
index = 'PassengerId')
我收到这个错误:
KeyError Traceback (most recent call last)
<ipython-input-6-55607b93fccd> in <module>
1 # let's try to use featuretools to scale out the features on vaex
2 es = ft.EntitySet(id = 'titanic_data')
----> 3 es = es.entity_from_dataframe(entity_id = 'df', dataframe = df.drop(['Survived']),
4 variable_types =
5 {
1 frames
/usr/local/lib/python3.7/dist-packages/vaex/dataframe.py in drop(self, columns, inplace, check)
4758 df._hide_column(column)
4759 else:
-> 4760 df._real_drop(column)
4761 return df
4762
/usr/local/lib/python3.7/dist-packages/vaex/dataframe.py in _real_drop(self, item)
4733 self.column_names.remove(name)
4734 else:
-> 4735 raise KeyError('no such column or virtual_columns named %r' % name)
4736 self.signal_column_changed.emit(self, name, "delete")
4737 if hasattr(self, name):
KeyError: "no such column or virtual_columns named 'Survived'"
是否可以做我正在做的事情?这是我的方法错误吗?或者换一种方式?
目前,Featuretools 不适用于 vaex 数据帧。对 dask 或 koalas 数据帧有一些支持。