在数据集上实施 Z 分数时获得 "KeyError"
getting "KeyError" while implementing Z-score on a dataset
我一直在尝试使用以下代码对 combined_data 中存在的所有数值实施 z-score 标准化:
from scipy.stats import zscore
# Calculate the zscores and drop zscores into new column
combined_data['zscore'] = zscore(combined_data['zscore'])
这里,combined_data
是将训练集和测试集组合成一个dataframe,通过one-hot encoding。
我看到以下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/indexes/base.py:2646, in Index.get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
File pandas/_libs/index.pyx:111, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:1619, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:1627, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'zscore'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
Input In [29], in <cell line: 2>()
1 # Calculate the zscores and drop zscores into new column
----> 2 combined_data['zscore'] = zscore(combined_data['zscore'])
File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/frame.py:2800, in DataFrame.__getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/indexes/base.py:2648, in Index.get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
File pandas/_libs/index.pyx:111, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:1619, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:1627, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'zscore'
数据集 combined_data
包含 257673 行 & 198 列
这里是combined_data
的样本数据
id dur spkts dpkts sbytes dbytes rate sttl dttl sload ... state_CLO state_CON state_ECO state_FIN state_INT state_PAR state_REQ state_RST state_URN state_no
60662 60663 1.193334 10 10 608 646 15.921779 254 252 3673.740967 ... 0 0 0 1 0 0 0 0 0 0
image of sample data
我对这种错误很陌生。我做错了什么?
[更新:代码试图用 zscore 创建一个单独的列,这是不可能的,如下所述]
您应该将函数 zscore
应用于整个数据框,而不是 non-existent 列:
result = zscore(combined_data)
结果是一个numpy数组。您不能将其作为原始数据框的一列。但是你可以创建另一个 DataFrame:
pd.DataFrame(result, columns=combined_data.columns, index=combined_data.index)
我一直在尝试使用以下代码对 combined_data 中存在的所有数值实施 z-score 标准化:
from scipy.stats import zscore
# Calculate the zscores and drop zscores into new column
combined_data['zscore'] = zscore(combined_data['zscore'])
这里,combined_data
是将训练集和测试集组合成一个dataframe,通过one-hot encoding。
我看到以下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/indexes/base.py:2646, in Index.get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
File pandas/_libs/index.pyx:111, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:1619, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:1627, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'zscore'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
Input In [29], in <cell line: 2>()
1 # Calculate the zscores and drop zscores into new column
----> 2 combined_data['zscore'] = zscore(combined_data['zscore'])
File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/frame.py:2800, in DataFrame.__getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
File ~/anaconda3/envs/tf-gpu/lib/python3.8/site-packages/pandas/core/indexes/base.py:2648, in Index.get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
File pandas/_libs/index.pyx:111, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:1619, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:1627, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'zscore'
数据集 combined_data
包含 257673 行 & 198 列
这里是combined_data
id dur spkts dpkts sbytes dbytes rate sttl dttl sload ... state_CLO state_CON state_ECO state_FIN state_INT state_PAR state_REQ state_RST state_URN state_no
60662 60663 1.193334 10 10 608 646 15.921779 254 252 3673.740967 ... 0 0 0 1 0 0 0 0 0 0
image of sample data
我对这种错误很陌生。我做错了什么?
[更新:代码试图用 zscore 创建一个单独的列,这是不可能的,如下所述]
您应该将函数 zscore
应用于整个数据框,而不是 non-existent 列:
result = zscore(combined_data)
结果是一个numpy数组。您不能将其作为原始数据框的一列。但是你可以创建另一个 DataFrame:
pd.DataFrame(result, columns=combined_data.columns, index=combined_data.index)