"ValueError: A given column is not a column of the dataframe" when trying to convert categorical feature into numerical

Question

为了培训，我正在使用 Udemy 课程中的 csv 文件。我只想使用年龄和国家列来保持简单。这是代码：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.compose import ColumnTransformer as ct
from sklearn.model_selection import train_test_split as tts

data = pd.read_csv("advertising.csv")

X = data[["Age","Country"]]
y = data[["Clicked on Ad"]]


from sklearn.preprocessing import OneHotEncoder
cat = X["Country"]
one_hot = OneHotEncoder()
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
transformed_X = transformer.fit_transform(X)

print(transformed_X)

我收到这个错误：

runfile('C:/Users/--/.spyder-py3/untitled0.py', wdir='C:/Users/--/.spyder-py3')
Traceback (most recent call last):

  File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
    return self._engine.get_loc(casted_key)

  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc

  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Tunisia'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 447, in _get_column_indices
    col_idx = all_columns.get_loc(col)

  File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
    raise KeyError(key) from err

KeyError: 'Tunisia'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "C:\Users\--\.spyder-py3\untitled0.py", line 17, in <module>
    transformed_X = transformer.fit_transform(X)

  File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 529, in fit_transform
    self._validate_remainder(X)

  File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 327, in _validate_remainder
    cols.extend(_get_column_indices(X, columns))

  File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 454, in _get_column_indices
    raise ValueError(

ValueError: A given column is not a column of the dataframe

“突尼斯”是“国家”栏下的第一个国家

可能是什么导致了这个问题？

提前致谢。

Answer 1

出现此问题是因为您没有指定要正确转换的列。在这一行中：

transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")

cat 应该代表要转换的索引或列的名称。但是，您正在传递整个数据框，因为您设置了 cat = X["Country"].

要解决此问题，只需使用以下方法之一：

#option 1
cat = ['Country']

# option 2
cat = [1]

它应该可以正常工作。

"ValueError: A given column is not a column of the dataframe" when trying to convert categorical feature into numerical

"ValueError: A given column is not a column of the dataframe" when trying to convert categorical feature into numerical

python

machine-learning

pandas

scikit-learn

one-hot-encoding