python 中的数据因列名无法离散化？

Question

我有这个包含许多分类列的数据集，我必须对数据进行离散化。

首先，我使用 Pandas 上传数据，它给了我这个：

X = pd.read_excel("/content/drive/MyDrive/APR NÃO SUP_Tarefa_Trilha 4 (2) (1).ods")
X.head()

之后我尝试使用这段代码对数据进行去离散化：

coluna = ["LEG","GRANGE_REG", "SIGLA_UF", "NOME", "TIPO", "CAT_ASSOC", "NOME_MUN", "LEG"]
for col in coluna:
  classes = np.unique(X[col])
  number = 0 # valor que será usado para representar a clases
  for i in classes:
    X = X.replace(i, number)
    number = number + 1
  print('Novos dados:')
  print(X[col])

而这段代码给出了这个错误：

<ipython-input-72-c6cc213e95a5> in <module>()
      3 for col in coluna:
      4   print(col)
----> 5   classes = np.unique(X[col])
      6   number = 0 # valor que será usado para representar a clases
      7   for i in classes:

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2904             if self.columns.nlevels > 1:
   2905                 return self._getitem_multilevel(key)
-> 2906             indexer = self.columns.get_loc(key)
   2907             if is_integer(indexer):
   2908                 indexer = [indexer]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: 'GRANGE_REG'

P.S.: 列“LEG”没有问题，仅当变量列更改为“CHANGE_REG”

时才会弹出错误

P.P.S.: 抱歉英语不好

Answer 1

您认为的列名与 .ods 文件中的列名似乎有所不同（我对 .ods 文件不熟悉）。可能缺少 space 之类的东西。你能试试吗：

print(X.columns)

这应该会告诉您 X 数据框中的列名称字符串是什么。

编辑：仔细观察图像，我发现它在数据框中是“GRANDE_REG”，但您正在寻找“GRANGE_REG”（即“D”换成了“G” ).

Answer 2

这只是一个错字。你写的是“GRANGE_REG”而不是“GRANDE_REG".

python 中的数据因列名无法离散化？

Can't make the discretization of data in python because of column name?

python

numpy

pandas

discretization