Scikit-Learn 管道中的新功能 - 两个现有功能之间的交互
New Feature in Scikit-Learn Pipeline - Interaction between two existing Features
我的数据集中有两个特征:高度和面积。我想通过在 scikit-learn 中使用管道交互面积和高度来创建一个新特征。
任何人都可以指导我如何实现这一目标吗?
谢谢
您可以使用自定义转换器实现此目的,实现适合和转换方法。 Optionnaly 你可以让它从 sklearn TransformerMixin 继承来进行 bullet-profing。
from sklearn.base import TransformerMixin
class CustomTransformer(TransformerMixin):
def fit(self, X, y=None):
"""The fit method doesn't do much here,
but it still required if your pipeline
ever need to be fit. Just returns self."""
return self
def transform(self, X, y=None):
"""This is where the actual transformation occurs.
Assuming you want to compute the product of your feature
height and area.
"""
# Copy X to avoid mutating the original dataset
X_ = X.copy()
# change new_feature and right member according to your needs
X_["new_feature"] = X_["height"] * X_["area"]
# you then return the newly transformed dataset. It will be
# passed to the next step of your pipeline
return X_
您可以使用此代码进行测试:
import pandas as pd
from sklearn.pipeline import Pipeline
# Instantiate fake DataSet, your Transformer and Pipeline
X = pd.DataFrame({"height": [10, 23, 34], "area": [345, 33, 45]})
custom = CustomTransformer()
pipeline = Pipeline([("heightxarea", custom)])
# Test it
pipeline.fit(X)
pipeline.transform(X)
对于这样一个简单的处理,这似乎有点矫枉过正,但将任何数据集操作放入 Transformers 中是一个很好的做法。这样它们的重现性更高。
我的数据集中有两个特征:高度和面积。我想通过在 scikit-learn 中使用管道交互面积和高度来创建一个新特征。
任何人都可以指导我如何实现这一目标吗?
谢谢
您可以使用自定义转换器实现此目的,实现适合和转换方法。 Optionnaly 你可以让它从 sklearn TransformerMixin 继承来进行 bullet-profing。
from sklearn.base import TransformerMixin
class CustomTransformer(TransformerMixin):
def fit(self, X, y=None):
"""The fit method doesn't do much here,
but it still required if your pipeline
ever need to be fit. Just returns self."""
return self
def transform(self, X, y=None):
"""This is where the actual transformation occurs.
Assuming you want to compute the product of your feature
height and area.
"""
# Copy X to avoid mutating the original dataset
X_ = X.copy()
# change new_feature and right member according to your needs
X_["new_feature"] = X_["height"] * X_["area"]
# you then return the newly transformed dataset. It will be
# passed to the next step of your pipeline
return X_
您可以使用此代码进行测试:
import pandas as pd
from sklearn.pipeline import Pipeline
# Instantiate fake DataSet, your Transformer and Pipeline
X = pd.DataFrame({"height": [10, 23, 34], "area": [345, 33, 45]})
custom = CustomTransformer()
pipeline = Pipeline([("heightxarea", custom)])
# Test it
pipeline.fit(X)
pipeline.transform(X)
对于这样一个简单的处理,这似乎有点矫枉过正,但将任何数据集操作放入 Transformers 中是一个很好的做法。这样它们的重现性更高。