在 pandas 数据框中添加列会导致分类索引错误

adding up columns in pandas dataframe results in categorical index error

我有空。数据框:

ps_yd_1            [=11=]^{th} - 25^{th}$  ^{th} - 50^{th}$  \
ps_variable_1                                                   
[=11=]^{th} - 25^{th}$             47.566800            23.441332   
^{th} - 50^{th}$            32.764905            40.947438   
^{th} - 75^{th}$            10.830286            21.435877   
^{th} - 100^{th}$           14.388537            33.796734   
ps_yd_1            ^{th} - 75^{th}$  ^{th} - 100^{th}$  
ps_variable_1                                                    
[=11=]^{th} - 25^{th}$              21.237253              7.754615  
^{th} - 50^{th}$              8.634613             17.653044  
^{th} - 75^{th}$             14.684188             53.049650  
^{th} - 100^{th}$            13.072976             38.741753  

我想添加 2 列以创建一个新列:

df_hmp['a'] = df_hmp['[=12=]^{th} - 25^{th}$'] + df_hmp['^{th} - 50^{th}$']

但我收到此错误:

*** TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category

这是索引的样子:

CategoricalIndex(['[=14=]^{th} - 25^{th}$', '^{th} - 50^{th}$',
                  '^{th} - 75^{th}$', '^{th} - 100^{th}$'],
                 categories=['[=14=]^{th} - 25^{th}$', '^{th} - 50^{th}$', '^{th} - 75^{th}$', '^{th} - 100^{th}$'], ordered=True, name='ps_variable_1', dtype='category')

如何解决?

数据框中的所有列和行都有分类索引。如果要添加另一列,必须先向分类索引添加另一个值。

让我们首先重新创建您的数据框:

df_hmp = pd.DataFrame([[47.566800 ,32.764905,10.830286,14.388537],
                 [23.441332,40.947438,21.435877,33.796734],
                 [21.237253,8.634613,14.684188,13.072976],
                 [7.75461,17.653044,53.049650,38.741753]]).T

idx = pd.CategoricalIndex(['[=10=]^{th} - 25^{th}$', '^{th} - 50^{th}$',
               '^{th} - 75^{th}$', '^{th} - 100^{th}$'],
                categories=['[=10=]^{th} - 25^{th}$', '^{th} - 50^{th}$',  
               '^{th} - 75^{th}$', '^{th} - 100^{th}$'], 
               ordered=True, name='ps_variable_1', dtype='category')
df_hmp.columns = idx
df_hmp.index = idx.copy()
df_hmp.columns.name = 'ps_yd_1'

现在,操作分类变量:

df_hmp.columns = df_hmp.columns.add_categories('a')
df_hmp['a'] = df_hmp['[=11=]^{th} - 25^{th}$'] + df_hmp['^{th} - 50^{th}$']
# Works like charm