pandas.dataframe.astype 没有转换 dtype
pandas.dataframe.astype is not converting dtype
我正在尝试将某些列从对象列转换为分类列。
# dtyp_cat = 'category'
# mapper = {'Segment':dtyp_cat,
# "Sub-Category":dtyp_cat,
# "Postal Code":dtyp_cat,
# "Region":dtyp_cat,
# }
df.astype({'Segment':'category'})
df.dtypes
但输出仍然是对象类型。
数据集托管于:
url = r"https://raw.githubusercontent.com/jaegarbomb/TSF_GRIP/main/Retail_EDA/Superstore.csv"
df = pd.read_csv(url)
这样做:
df['Segment'] = df.Segment.astype('category')
哪个returns
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ship Mode 9994 non-null object
1 Segment 9994 non-null category
2 Country 9994 non-null object
3 City 9994 non-null object
4 State 9994 non-null object
5 Postal Code 9994 non-null int64
6 Region 9994 non-null object
7 Category 9994 non-null object
8 Sub-Category 9994 non-null object
9 Sales 9994 non-null float64
10 Quantity 9994 non-null int64
11 Discount 9994 non-null float64
12 Profit 9994 non-null float64
dtypes: category(1), float64(3), int64(2), object(7)
memory usage: 946.9+ KB
编辑
如果你想转换多个列(在你的情况下,我想它是所有对象,你需要删除那些不是的,转换剩下的然后重新附加其他列。
df2 = df.drop([ 'Postal Code', 'Sales', 'Quantity', 'Discount', 'Profit'], axis=1)
df3 = df2.apply(lambda x: x.astype('category'))
给出
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ship Mode 9994 non-null category
1 Segment 9994 non-null category
2 Country 9994 non-null category
3 City 9994 non-null category
4 State 9994 non-null category
5 Region 9994 non-null category
6 Category 9994 non-null category
7 Sub-Category 9994 non-null category
dtypes: category(8)
memory usage: 115.2 KB
我将把附加的其他专栏留给您。提示是:
df4 = pd.concat([df3, df], axis=1, sort=False)
df_final = df4.loc[:,~df4.columns.duplicated()]
我正在尝试将某些列从对象列转换为分类列。
# dtyp_cat = 'category'
# mapper = {'Segment':dtyp_cat,
# "Sub-Category":dtyp_cat,
# "Postal Code":dtyp_cat,
# "Region":dtyp_cat,
# }
df.astype({'Segment':'category'})
df.dtypes
但输出仍然是对象类型。
数据集托管于:
url = r"https://raw.githubusercontent.com/jaegarbomb/TSF_GRIP/main/Retail_EDA/Superstore.csv"
df = pd.read_csv(url)
这样做:
df['Segment'] = df.Segment.astype('category')
哪个returns
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ship Mode 9994 non-null object
1 Segment 9994 non-null category
2 Country 9994 non-null object
3 City 9994 non-null object
4 State 9994 non-null object
5 Postal Code 9994 non-null int64
6 Region 9994 non-null object
7 Category 9994 non-null object
8 Sub-Category 9994 non-null object
9 Sales 9994 non-null float64
10 Quantity 9994 non-null int64
11 Discount 9994 non-null float64
12 Profit 9994 non-null float64
dtypes: category(1), float64(3), int64(2), object(7)
memory usage: 946.9+ KB
编辑
如果你想转换多个列(在你的情况下,我想它是所有对象,你需要删除那些不是的,转换剩下的然后重新附加其他列。
df2 = df.drop([ 'Postal Code', 'Sales', 'Quantity', 'Discount', 'Profit'], axis=1)
df3 = df2.apply(lambda x: x.astype('category'))
给出
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ship Mode 9994 non-null category
1 Segment 9994 non-null category
2 Country 9994 non-null category
3 City 9994 non-null category
4 State 9994 non-null category
5 Region 9994 non-null category
6 Category 9994 non-null category
7 Sub-Category 9994 non-null category
dtypes: category(8)
memory usage: 115.2 KB
我将把附加的其他专栏留给您。提示是:
df4 = pd.concat([df3, df], axis=1, sort=False)
df_final = df4.loc[:,~df4.columns.duplicated()]