编码目标列只显示一个类别?
Encoded target column shows only one category?
我正在处理多class class化问题。我的目标列有 4 class 个,分别是低、中、高和非常高。当我尝试对其进行编码时,我只得到 0 作为 value_counts()。我不确定,为什么。
value count in original data frame is :
High 18767
Very High 15856
Medium 9212
Low 5067
Name: physician_segment, dtype: int64
我尝试了以下方法来编码我的目标列:
Using replace() method :
target_enc = {'Low':0,'Medium':1,'High':2,'Very High':3}
df1['physician_segment'] = df1['physician_segment'].astype(object)
df1['physician_segment'] = df1['physician_segment'].replace(target_enc)
df1['physician_segment'].value_counts()
0 48902
Name: physician_segment, dtype: int64
using factorize method():
from pandas.api.types import CategoricalDtype
df1['physician_segment'] = df1['physician_segment'].factorize()[0]
df1['physician_segment'].value_counts()
0 48902
Name: physician_segment, dtype: int64
Using Label Encoder :
from sklearn import preprocessing
labelencoder= LabelEncoder()
df1['physician_segment'] = labelencoder.fit_transform(df1['physician_segment']) df1['physician_segment'].value_counts()
0 48902
Name: physician_segment, dtype: int64
在所有这三种技术中,我只得到一种 class 作为 0,数据帧的长度是 48902。
有人可以指出我做错了什么吗?
我希望我的目标列的值为 0, 1, 2, 3.
target_enc = {'Low':0,'Medium':1,'High':2,'Very High':3}
df1['physician_segment'] = df1['physician_segment'].astype(object)
之后create/define一个函数:-
def func(val):
if val in target_enc.keys():
return target_enc[val]
最后使用apply()
方法:-
df1['physician_segment']=df1['physician_segment'].apply(func)
现在如果你打印 df1['physician_segment'].value_counts()
你会得到正确的输出
我正在处理多class class化问题。我的目标列有 4 class 个,分别是低、中、高和非常高。当我尝试对其进行编码时,我只得到 0 作为 value_counts()。我不确定,为什么。
value count in original data frame is :
High 18767
Very High 15856
Medium 9212
Low 5067
Name: physician_segment, dtype: int64
我尝试了以下方法来编码我的目标列:
Using replace() method :
target_enc = {'Low':0,'Medium':1,'High':2,'Very High':3}
df1['physician_segment'] = df1['physician_segment'].astype(object)
df1['physician_segment'] = df1['physician_segment'].replace(target_enc)
df1['physician_segment'].value_counts()
0 48902
Name: physician_segment, dtype: int64
using factorize method():
from pandas.api.types import CategoricalDtype
df1['physician_segment'] = df1['physician_segment'].factorize()[0]
df1['physician_segment'].value_counts()
0 48902
Name: physician_segment, dtype: int64
Using Label Encoder :
from sklearn import preprocessing
labelencoder= LabelEncoder()
df1['physician_segment'] = labelencoder.fit_transform(df1['physician_segment']) df1['physician_segment'].value_counts()
0 48902
Name: physician_segment, dtype: int64
在所有这三种技术中,我只得到一种 class 作为 0,数据帧的长度是 48902。
有人可以指出我做错了什么吗? 我希望我的目标列的值为 0, 1, 2, 3.
target_enc = {'Low':0,'Medium':1,'High':2,'Very High':3}
df1['physician_segment'] = df1['physician_segment'].astype(object)
之后create/define一个函数:-
def func(val):
if val in target_enc.keys():
return target_enc[val]
最后使用apply()
方法:-
df1['physician_segment']=df1['physician_segment'].apply(func)
现在如果你打印 df1['physician_segment'].value_counts()
你会得到正确的输出