在 pandas 数据框中添加列会导致分类索引错误
adding up columns in pandas dataframe results in categorical index error
我有空。数据框:
ps_yd_1 [=11=]^{th} - 25^{th}$ ^{th} - 50^{th}$ \
ps_variable_1
[=11=]^{th} - 25^{th}$ 47.566800 23.441332
^{th} - 50^{th}$ 32.764905 40.947438
^{th} - 75^{th}$ 10.830286 21.435877
^{th} - 100^{th}$ 14.388537 33.796734
ps_yd_1 ^{th} - 75^{th}$ ^{th} - 100^{th}$
ps_variable_1
[=11=]^{th} - 25^{th}$ 21.237253 7.754615
^{th} - 50^{th}$ 8.634613 17.653044
^{th} - 75^{th}$ 14.684188 53.049650
^{th} - 100^{th}$ 13.072976 38.741753
我想添加 2 列以创建一个新列:
df_hmp['a'] = df_hmp['[=12=]^{th} - 25^{th}$'] + df_hmp['^{th} - 50^{th}$']
但我收到此错误:
*** TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category
这是索引的样子:
CategoricalIndex(['[=14=]^{th} - 25^{th}$', '^{th} - 50^{th}$',
'^{th} - 75^{th}$', '^{th} - 100^{th}$'],
categories=['[=14=]^{th} - 25^{th}$', '^{th} - 50^{th}$', '^{th} - 75^{th}$', '^{th} - 100^{th}$'], ordered=True, name='ps_variable_1', dtype='category')
如何解决?
数据框中的所有列和行都有分类索引。如果要添加另一列,必须先向分类索引添加另一个值。
让我们首先重新创建您的数据框:
df_hmp = pd.DataFrame([[47.566800 ,32.764905,10.830286,14.388537],
[23.441332,40.947438,21.435877,33.796734],
[21.237253,8.634613,14.684188,13.072976],
[7.75461,17.653044,53.049650,38.741753]]).T
idx = pd.CategoricalIndex(['[=10=]^{th} - 25^{th}$', '^{th} - 50^{th}$',
'^{th} - 75^{th}$', '^{th} - 100^{th}$'],
categories=['[=10=]^{th} - 25^{th}$', '^{th} - 50^{th}$',
'^{th} - 75^{th}$', '^{th} - 100^{th}$'],
ordered=True, name='ps_variable_1', dtype='category')
df_hmp.columns = idx
df_hmp.index = idx.copy()
df_hmp.columns.name = 'ps_yd_1'
现在,操作分类变量:
df_hmp.columns = df_hmp.columns.add_categories('a')
df_hmp['a'] = df_hmp['[=11=]^{th} - 25^{th}$'] + df_hmp['^{th} - 50^{th}$']
# Works like charm
我有空。数据框:
ps_yd_1 [=11=]^{th} - 25^{th}$ ^{th} - 50^{th}$ \
ps_variable_1
[=11=]^{th} - 25^{th}$ 47.566800 23.441332
^{th} - 50^{th}$ 32.764905 40.947438
^{th} - 75^{th}$ 10.830286 21.435877
^{th} - 100^{th}$ 14.388537 33.796734
ps_yd_1 ^{th} - 75^{th}$ ^{th} - 100^{th}$
ps_variable_1
[=11=]^{th} - 25^{th}$ 21.237253 7.754615
^{th} - 50^{th}$ 8.634613 17.653044
^{th} - 75^{th}$ 14.684188 53.049650
^{th} - 100^{th}$ 13.072976 38.741753
我想添加 2 列以创建一个新列:
df_hmp['a'] = df_hmp['[=12=]^{th} - 25^{th}$'] + df_hmp['^{th} - 50^{th}$']
但我收到此错误:
*** TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category
这是索引的样子:
CategoricalIndex(['[=14=]^{th} - 25^{th}$', '^{th} - 50^{th}$',
'^{th} - 75^{th}$', '^{th} - 100^{th}$'],
categories=['[=14=]^{th} - 25^{th}$', '^{th} - 50^{th}$', '^{th} - 75^{th}$', '^{th} - 100^{th}$'], ordered=True, name='ps_variable_1', dtype='category')
如何解决?
数据框中的所有列和行都有分类索引。如果要添加另一列,必须先向分类索引添加另一个值。
让我们首先重新创建您的数据框:
df_hmp = pd.DataFrame([[47.566800 ,32.764905,10.830286,14.388537],
[23.441332,40.947438,21.435877,33.796734],
[21.237253,8.634613,14.684188,13.072976],
[7.75461,17.653044,53.049650,38.741753]]).T
idx = pd.CategoricalIndex(['[=10=]^{th} - 25^{th}$', '^{th} - 50^{th}$',
'^{th} - 75^{th}$', '^{th} - 100^{th}$'],
categories=['[=10=]^{th} - 25^{th}$', '^{th} - 50^{th}$',
'^{th} - 75^{th}$', '^{th} - 100^{th}$'],
ordered=True, name='ps_variable_1', dtype='category')
df_hmp.columns = idx
df_hmp.index = idx.copy()
df_hmp.columns.name = 'ps_yd_1'
现在,操作分类变量:
df_hmp.columns = df_hmp.columns.add_categories('a')
df_hmp['a'] = df_hmp['[=11=]^{th} - 25^{th}$'] + df_hmp['^{th} - 50^{th}$']
# Works like charm