使用带有字典的 OrdinalEncoder 进行逆变换时出现 ValueError
ValueError on inverse transform using OrdinalEncoder with dictionary
我可以使用分类编码和顺序编码将目标列转换为所需的有序数值。但是我无法执行 inverse_transform
,因为下面显示错误。
import pandas as pd
import category_encoders as ce
from sklearn.preprocessing import OrdinalEncoder
lst = [ 'BRANCHING/ELONGATION', 'EARLY', 'EARLY', 'EARLY', 'EARLY', 'MID', 'MID', 'ADVANCED/TILLERING',
'FLOWERING', 'FLOWERING', 'FLOWERING', 'SEEDLING/EMERGED']
filtered_df = pd.DataFrame(lst, columns =['growth_state'])
filtered_df['growth_state'].value_counts()
EARLY 4
FLOWERING 3
MID 2
ADVANCED/TILLERING 1
SEEDLING/EMERGED 1
BRANCHING/ELONGATION 1
Name: growth_state, dtype: int64
dictionary = [{'col': 'growth_state',
'mapping':{'SEEDLING/EMERGED':0, 'EARLY':1, 'MID':2,
'ADVANCED/TILLERING':3, 'BRANCHING/ELONGATION':4, 'FLOWERING':5 }}]
# instiating encoder
encoder = ce.OrdinalEncoder(cols = 'growth_state', mapping= dictionary)
filtered_df['growth_state'] = encoder.fit_transform(filtered_df['growth_state'])
filtered_df
growth_state
0 4
1 1
2 1
3 1
4 1
5 2
6 2
7 3
8 5
9 5
10 5
11 0
但是当我执行 inverse_transform:
newCol = encoder.inverse_transform(filtered_df['growth_state'])
AttributeError Traceback (most recent call last)
<ipython-input-26-b6505b4be1e1> in <module>
----> 1 newCol = encoder.inverse_transform(filtered_df['growth_state'])
d:\users\tiwariam\appdata\local\programs\python\python36\lib\site-packages\category_encoders\ordinal.py in inverse_transform(self, X_in)
266 for switch in self.mapping:
267 column_mapping = switch.get('mapping')
--> 268 inverse = pd.Series(data=column_mapping.index, index=column_mapping.values)
269 X[switch.get('col')] = X[switch.get('col')].map(inverse).astype(switch.get('data_type'))
270
AttributeError: 'dict' object has no attribute 'index'
注意:上面的列是目标列,我可以应用标签编码器,因为这是一个与分类相关的问题。但我采用了上述分类和顺序编码的组合,因为变量在自然界中是有序的。
错误来自 inverse_transform
source code 中的这一行:
inverse = pd.Series(data=column_mapping.index, index=column_mapping.values)
似乎即使 category_encoders
documentation 说 mapping
应该作为字典提供,他们的 inverse_transform
代码实际上是在寻找 pd.Series
:
import pandas as pd
from category_encoders import OrdinalEncoder
df = pd.DataFrame({
'growth_state': ['BRANCHING/ELONGATION', 'EARLY', 'EARLY', 'EARLY', 'EARLY', 'MID', 'MID', 'ADVANCED/TILLERING', 'FLOWERING', 'FLOWERING', 'FLOWERING', 'SEEDLING/EMERGED']
})
mapping = [{
'col': 'growth_state',
'mapping': pd.Series(data={'SEEDLING/EMERGED': 0, 'EARLY': 1, 'MID': 2, 'ADVANCED/TILLERING': 3, 'BRANCHING/ELONGATION': 4, 'FLOWERING': 5}),
'data_type': object
}]
enc = OrdinalEncoder(cols=['growth_state'], mapping=mapping)
df_transformed = enc.fit_transform(df)
df_transformed.head()
# growth_state
# 0 4
# 1 1
# 2 1
# 3 1
# 4 1
df_inverse = enc.inverse_transform(df_transformed)
df_inverse.head()
# growth_state
# 0 BRANCHING/ELONGATION
# 1 EARLY
# 2 EARLY
# 3 EARLY
# 4 EARLY
我可以使用分类编码和顺序编码将目标列转换为所需的有序数值。但是我无法执行 inverse_transform
,因为下面显示错误。
import pandas as pd
import category_encoders as ce
from sklearn.preprocessing import OrdinalEncoder
lst = [ 'BRANCHING/ELONGATION', 'EARLY', 'EARLY', 'EARLY', 'EARLY', 'MID', 'MID', 'ADVANCED/TILLERING',
'FLOWERING', 'FLOWERING', 'FLOWERING', 'SEEDLING/EMERGED']
filtered_df = pd.DataFrame(lst, columns =['growth_state'])
filtered_df['growth_state'].value_counts()
EARLY 4
FLOWERING 3
MID 2
ADVANCED/TILLERING 1
SEEDLING/EMERGED 1
BRANCHING/ELONGATION 1
Name: growth_state, dtype: int64
dictionary = [{'col': 'growth_state',
'mapping':{'SEEDLING/EMERGED':0, 'EARLY':1, 'MID':2,
'ADVANCED/TILLERING':3, 'BRANCHING/ELONGATION':4, 'FLOWERING':5 }}]
# instiating encoder
encoder = ce.OrdinalEncoder(cols = 'growth_state', mapping= dictionary)
filtered_df['growth_state'] = encoder.fit_transform(filtered_df['growth_state'])
filtered_df
growth_state
0 4
1 1
2 1
3 1
4 1
5 2
6 2
7 3
8 5
9 5
10 5
11 0
但是当我执行 inverse_transform:
newCol = encoder.inverse_transform(filtered_df['growth_state'])
AttributeError Traceback (most recent call last)
<ipython-input-26-b6505b4be1e1> in <module>
----> 1 newCol = encoder.inverse_transform(filtered_df['growth_state'])
d:\users\tiwariam\appdata\local\programs\python\python36\lib\site-packages\category_encoders\ordinal.py in inverse_transform(self, X_in)
266 for switch in self.mapping:
267 column_mapping = switch.get('mapping')
--> 268 inverse = pd.Series(data=column_mapping.index, index=column_mapping.values)
269 X[switch.get('col')] = X[switch.get('col')].map(inverse).astype(switch.get('data_type'))
270
AttributeError: 'dict' object has no attribute 'index'
注意:上面的列是目标列,我可以应用标签编码器,因为这是一个与分类相关的问题。但我采用了上述分类和顺序编码的组合,因为变量在自然界中是有序的。
错误来自 inverse_transform
source code 中的这一行:
inverse = pd.Series(data=column_mapping.index, index=column_mapping.values)
似乎即使 category_encoders
documentation 说 mapping
应该作为字典提供,他们的 inverse_transform
代码实际上是在寻找 pd.Series
:
import pandas as pd
from category_encoders import OrdinalEncoder
df = pd.DataFrame({
'growth_state': ['BRANCHING/ELONGATION', 'EARLY', 'EARLY', 'EARLY', 'EARLY', 'MID', 'MID', 'ADVANCED/TILLERING', 'FLOWERING', 'FLOWERING', 'FLOWERING', 'SEEDLING/EMERGED']
})
mapping = [{
'col': 'growth_state',
'mapping': pd.Series(data={'SEEDLING/EMERGED': 0, 'EARLY': 1, 'MID': 2, 'ADVANCED/TILLERING': 3, 'BRANCHING/ELONGATION': 4, 'FLOWERING': 5}),
'data_type': object
}]
enc = OrdinalEncoder(cols=['growth_state'], mapping=mapping)
df_transformed = enc.fit_transform(df)
df_transformed.head()
# growth_state
# 0 4
# 1 1
# 2 1
# 3 1
# 4 1
df_inverse = enc.inverse_transform(df_transformed)
df_inverse.head()
# growth_state
# 0 BRANCHING/ELONGATION
# 1 EARLY
# 2 EARLY
# 3 EARLY
# 4 EARLY