"ValueError: A given column is not a column of the dataframe" when trying to convert categorical feature into numerical
"ValueError: A given column is not a column of the dataframe" when trying to convert categorical feature into numerical
为了培训,我正在使用 Udemy 课程中的 csv 文件。我只想使用年龄和国家列来保持简单。
这是代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.compose import ColumnTransformer as ct
from sklearn.model_selection import train_test_split as tts
data = pd.read_csv("advertising.csv")
X = data[["Age","Country"]]
y = data[["Clicked on Ad"]]
from sklearn.preprocessing import OneHotEncoder
cat = X["Country"]
one_hot = OneHotEncoder()
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
transformed_X = transformer.fit_transform(X)
print(transformed_X)
我收到这个错误:
runfile('C:/Users/--/.spyder-py3/untitled0.py', wdir='C:/Users/--/.spyder-py3')
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Tunisia'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 447, in _get_column_indices
col_idx = all_columns.get_loc(col)
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'Tunisia'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\--\.spyder-py3\untitled0.py", line 17, in <module>
transformed_X = transformer.fit_transform(X)
File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 529, in fit_transform
self._validate_remainder(X)
File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 327, in _validate_remainder
cols.extend(_get_column_indices(X, columns))
File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 454, in _get_column_indices
raise ValueError(
ValueError: A given column is not a column of the dataframe
“突尼斯”是“国家”栏下的第一个国家
可能是什么导致了这个问题?
提前致谢。
出现此问题是因为您没有指定要正确转换的列。在这一行中:
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
cat
应该代表要转换的索引或列的名称。但是,您正在传递整个数据框,因为您设置了 cat = X["Country"]
.
要解决此问题,只需使用以下方法之一:
#option 1
cat = ['Country']
# option 2
cat = [1]
它应该可以正常工作。
为了培训,我正在使用 Udemy 课程中的 csv 文件。我只想使用年龄和国家列来保持简单。 这是代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.compose import ColumnTransformer as ct
from sklearn.model_selection import train_test_split as tts
data = pd.read_csv("advertising.csv")
X = data[["Age","Country"]]
y = data[["Clicked on Ad"]]
from sklearn.preprocessing import OneHotEncoder
cat = X["Country"]
one_hot = OneHotEncoder()
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
transformed_X = transformer.fit_transform(X)
print(transformed_X)
我收到这个错误:
runfile('C:/Users/--/.spyder-py3/untitled0.py', wdir='C:/Users/--/.spyder-py3')
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2895, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Tunisia'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 447, in _get_column_indices
col_idx = all_columns.get_loc(col)
File "C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'Tunisia'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\--\.spyder-py3\untitled0.py", line 17, in <module>
transformed_X = transformer.fit_transform(X)
File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 529, in fit_transform
self._validate_remainder(X)
File "C:\Anaconda\lib\site-packages\sklearn\compose\_column_transformer.py", line 327, in _validate_remainder
cols.extend(_get_column_indices(X, columns))
File "C:\Anaconda\lib\site-packages\sklearn\utils\__init__.py", line 454, in _get_column_indices
raise ValueError(
ValueError: A given column is not a column of the dataframe
“突尼斯”是“国家”栏下的第一个国家
可能是什么导致了这个问题?
提前致谢。
出现此问题是因为您没有指定要正确转换的列。在这一行中:
transformer = ct([("one_hot", one_hot, cat)],remainder="passthrough")
cat
应该代表要转换的索引或列的名称。但是,您正在传递整个数据框,因为您设置了 cat = X["Country"]
.
要解决此问题,只需使用以下方法之一:
#option 1
cat = ['Country']
# option 2
cat = [1]
它应该可以正常工作。