更改 pandas 中类别代码的数据类型

Question

假设我有一个布尔列存储为 pandas.DataFrame 中的 category。但有一个转折 - 基础值是 str，而不是 bool。即，值是 "True"/"False"，而不是 True/False。

我如何：

更改基础类别值的 dtype（例如从 "True" 到 True）和
继续将字段存储为 category?

例如，将布尔值作为字符串是 DataFrame.query 的一个问题。我必须指定 DataFrame.query("field == 'True'")，这太可怕了哈哈。

仅供参考 - 我不想做 DataFrame.astype(dict(field=bool))，因为那样我就失去了 category 的内存效率。我想保留类别 dtype。

Answer 1

也许你可以试试：

df['field'] = df['field'].replace({'True': True, 'False': False})
print(df['field'])

# Output
0    False
1     True
2     True
3    False
Name: field, dtype: category
Categories (2, object): [False, True]  # <- bool

与query:

>>> df.query('field == True')
  field
1  True
2  True

设置：

df = pd.DataFrame({'field': ['False', 'True', 'True', 'False']}, dtype='category')
print(df['field'])

# Output
0    False
1     True
2     True
3    False
Name: field, dtype: category
Categories (2, object): ['False', 'True']  # <- str

Answer 2

您可以尝试这样做（这些值可以用作傻瓜，但在数据类型中被称为类别）：

import pandas as pd

# before
data = ['True', 'False', 'True']
df = pd.DataFrame({'data': data}).astype("category")

print('[BEFORE] \n data type = {0} \n values : {1}'.format(df['data'].dtypes, df.values))

# after
df['data'] = list(map(bool, list(df['data'].values)))
df = df.astype("category")

print('[AFTER] \n data type = {0} \n values : {1}'.format(df['data'].dtypes, df.values))

输出：

[BEFORE] 
 data type = category
 values : [['True']
 ['False']
 ['True']]
[AFTER]
 data type = category
 values : [[True]
 [True]
 [True]]

更改 pandas 中类别代码的数据类型

Changing the dtype of category codes in pandas

python

casting

dataframe

pandas

dtype