手动分配类别的最佳方式
Best Way to Manually Assign Categories
假设我有一个数据框,其值如下:
Food
----
Turkey
Tomato
Rice
Chicken
Lettuce
我想添加一个类别,使其看起来像:
Food Category
---- ----
Turkey Meat
Tomato Vegetable
Rice Grain
Chicken Meat
Lettuce Vegetable
但实际上我有约 100 个不同的值,我想将其分类为约 10 个组,并且我想手动进行。
我一直在尝试直接将它们编写成脚本,而不是链接数据库或电子表格。到目前为止我一直在尝试的内容连同错误代码打印在下面,但我也想知道是否有更好的方法来实现这一目标?
当前代码:
df.loc[df.Food.any(
[
'Turkey'
,'Chicken'
]
)
, 'Category'] = 'Meat'
df.loc[df.Food.any(
[
'Tomato'
,'Lettuce'
]
)
, 'Category'] = 'Vegetable'
错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-49-41349bcd38a0> in <module>
41 ]
42 )
---> 43 , 'Category'] = 'Meat'
~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\generic.py in logical_func(self, axis, bool_only, skipna, level, **kwargs)
11721 skipna=skipna,
11722 numeric_only=bool_only,
> 11723 filter_type="bool",
11724 )
11725
~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
4061
4062 if axis is not None:
-> 4063 self._get_axis_number(axis)
4064
4065 if isinstance(delegate, Categorical):
~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\generic.py in _get_axis_number(cls, axis)
400 @classmethod
401 def _get_axis_number(cls, axis):
--> 402 axis = cls._AXIS_ALIASES.get(axis, axis)
403 if is_integer(axis):
404 if axis in cls._AXIS_NAMES:
TypeError: unhashable type: 'list'
我建议将映射值存储在字典中,以类别作为键,将与该类别对应的选项列表作为值,如下所示:
mapping = {'Meat': ['Turkey','Chicken'], 'Vegetable': ['Tomato','Lettuce'], 'Grain': ['Rice']}
那么你可以使用pd.Series.map
:
df['Category'] = df['Food'].map({i: k for k, v in mapping.items() for i in v})
产量:
Food Category
0 Turkey Meat
1 Tomato Vegetable
2 Rice Grain
3 Chicken Meat
4 Lettuce Vegetable
假设我有一个数据框,其值如下:
Food
----
Turkey
Tomato
Rice
Chicken
Lettuce
我想添加一个类别,使其看起来像:
Food Category
---- ----
Turkey Meat
Tomato Vegetable
Rice Grain
Chicken Meat
Lettuce Vegetable
但实际上我有约 100 个不同的值,我想将其分类为约 10 个组,并且我想手动进行。
我一直在尝试直接将它们编写成脚本,而不是链接数据库或电子表格。到目前为止我一直在尝试的内容连同错误代码打印在下面,但我也想知道是否有更好的方法来实现这一目标?
当前代码:
df.loc[df.Food.any(
[
'Turkey'
,'Chicken'
]
)
, 'Category'] = 'Meat'
df.loc[df.Food.any(
[
'Tomato'
,'Lettuce'
]
)
, 'Category'] = 'Vegetable'
错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-49-41349bcd38a0> in <module>
41 ]
42 )
---> 43 , 'Category'] = 'Meat'
~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\generic.py in logical_func(self, axis, bool_only, skipna, level, **kwargs)
11721 skipna=skipna,
11722 numeric_only=bool_only,
> 11723 filter_type="bool",
11724 )
11725
~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
4061
4062 if axis is not None:
-> 4063 self._get_axis_number(axis)
4064
4065 if isinstance(delegate, Categorical):
~\AppData\Local\Continuum\miniconda3\lib\site-packages\pandas\core\generic.py in _get_axis_number(cls, axis)
400 @classmethod
401 def _get_axis_number(cls, axis):
--> 402 axis = cls._AXIS_ALIASES.get(axis, axis)
403 if is_integer(axis):
404 if axis in cls._AXIS_NAMES:
TypeError: unhashable type: 'list'
我建议将映射值存储在字典中,以类别作为键,将与该类别对应的选项列表作为值,如下所示:
mapping = {'Meat': ['Turkey','Chicken'], 'Vegetable': ['Tomato','Lettuce'], 'Grain': ['Rice']}
那么你可以使用pd.Series.map
:
df['Category'] = df['Food'].map({i: k for k, v in mapping.items() for i in v})
产量:
Food Category
0 Turkey Meat
1 Tomato Vegetable
2 Rice Grain
3 Chicken Meat
4 Lettuce Vegetable