Pandas – ValueError: Cannot setitem on a Categorical with a new category, set the categories first

Pandas – ValueError: Cannot setitem on a Categorical with a new category, set the categories first

过去几个小时我一直在寻找解决方案。相关 pandas 文档没有帮助, 给了我同样的错误。

我正在尝试按以下方式使用分类对我的数据框进行排序:

metabolites_order = CategoricalDtype(['Header', 'Metabolite', 'Unknown'], ordered=True)
df2['Feature type'] = df2['Feature type'].astype(metabolites_order)
df2 = df2.sort_values('Feature type')

“要素类型”列已正确填充类别。此代码 运行 在 Jupyter Notebooks 中完美无缺,但是当我 运行 它在 Pycharm 中时,我收到以下错误:

Traceback (most recent call last):
  File "/Users/wasim.sandhu/Documents/MSDIALPostProcessor/postprocessor.py", line 138, in process_alignment_file
    df2.loc[4] = list(df2.columns)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 692, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1635, in _setitem_with_indexer
    self._setitem_with_indexer_split_path(indexer, value, name)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1700, in _setitem_with_indexer_split_path
    self._setitem_single_column(loc, v, pi)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/indexing.py", line 1813, in _setitem_single_column
    ser._mgr = ser._mgr.setitem(indexer=(pi,), value=value)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 568, in setitem
    return self.apply("setitem", indexer=indexer, value=value)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/internals/blocks.py", line 1846, in setitem
    self.values[indexer] = value
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/_mixins.py", line 211, in __setitem__
    value = self._validate_setitem_value(value)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/arrays/categorical.py", line 1898, in _validate_setitem_value
    raise ValueError(
ValueError: Cannot setitem on a Categorical with a new category, set the categories first

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

这可能是什么原因造成的?我相信我已经正确设置了类别...

在我发布问题后几分钟就想通了。此数据集中的 header 列位于第 5 行。我检查了“特征类型”列,“特征类型”是它的值之一,它引发了这个错误。

通过将列 header 名称添加到类别中解决。

metabolites_order = CategoricalDtype(['Header', 'Feature type', 'Metabolite', 'Unknown'], ordered=True)

我建议只将这些类别映射到整数,然后在该列上排序。

categories = ['Header', 'Metabolite', 'Unknown']
feature_map = {categories[i]:i for i in range(len(categories))}
df['Feature Order'] = df['Feature Type'].map(feature_map)
df.sort_values('Feature Order')