在 16.0 Pandas dataframe Python 3.4 中按索引访问 keyerror
Get keyerror accessing row by index in 16.0 Pandas dataframe in Python 3.4
为什么我总是出现按键错误?
[编辑] 这是数据:
GEO,LAT,LON
AALBORG DENMARK,57.0482206,9.9193939
AARHUS DENMARK,56.1496278,10.2134046
ABBOTSFORD BC CANADA,49.0519047,-122.3290473
ABEOKUTA NIGERIA,7.161,3.348
ABERDEEN SCOTLAND,57.1452452,-2.0913745
[结束编辑]
无法按索引找到行,但很明显:
geocache = pd.read_csv('geolog.csv',index_col=['GEO']) # index_col=['GEO']
geocache.head()
演出
LAT LON
GEO
AALBORG DENMARK 57.048221 9.919394
AARHUS DENMARK 56.149628 10.213405
ABBOTSFORD BC CANADA 49.051905 -122.329047
ABEOKUTA NIGERIA 7.161000 3.348000
ABERDEEN SCOTLAND 57.145245 -2.091374
那我测试一下:
x = 'AARHUS DENMARK'
print(x)
geocache[x]
这就是我得到的:
丹麦奥鲁斯
KeyError Traceback(最后一次调用)
在 ()
2 x = u'AARHUS 丹麦'
3 打印(x)
----> 4 geocache[x]
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
1785 return self._getitem_multilevel(key)
1786 else:
-> 1787 return self._getitem_column(key)
1788
1789 def _getitem_column(self, key):
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
1792 # get column
1793 if self.columns.is_unique:
-> 1794 return self._get_item_cache(key)
1795
1796 # duplicate columns & possible reduce dimensionaility
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1077 res = cache.get(item)
1078 if res is None:
-> 1079 values = self._data.get(item)
1080 res = self._box_item_values(item, values)
1081 cache[item] = res
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
2841
2842 if not isnull(item):
-> 2843 loc = self.items.get_loc(item)
2844 else:
2845 indexer = np.arange(len(self.items))[isnull(self.items)]
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\index.py in get_loc(self, key, method)
1435 """
1436 if method is None:
-> 1437 return self._engine.get_loc(_values_from_object(key))
1438
1439 indexer = self.get_indexer([key], method=method)
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12349)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12300)()
KeyError: 'AARHUS DENMARK'
没有多余的空格或不可见的字符,尝试将 r 和 u 放在字符串赋值之前,但行为没有变化。
好的,我错过了什么?
因为您没有将 sep
(分隔符)arg 传递给 read_csv
,默认情况下以逗号分隔。由于您的 csv 在逗号之后包含 spaces/tabs,因此这些被视为数据的一部分,因此您的索引数据包含嵌入的空格。
所以你需要传递额外的参数给read_csv
:
pd.read_csv('geolog.csv',index_col=['GEO'], sep=',\s+', engine='python')
sep
arg 表示它将查找逗号前面有 1 个或多个可选空格的逗号,我们传递 engine='python'
因为 c 引擎不接受分隔符的正则表达式。
为什么我总是出现按键错误?
[编辑] 这是数据:
GEO,LAT,LON
AALBORG DENMARK,57.0482206,9.9193939
AARHUS DENMARK,56.1496278,10.2134046
ABBOTSFORD BC CANADA,49.0519047,-122.3290473
ABEOKUTA NIGERIA,7.161,3.348
ABERDEEN SCOTLAND,57.1452452,-2.0913745
[结束编辑] 无法按索引找到行,但很明显:
geocache = pd.read_csv('geolog.csv',index_col=['GEO']) # index_col=['GEO']
geocache.head()
演出
LAT LON
GEO
AALBORG DENMARK 57.048221 9.919394
AARHUS DENMARK 56.149628 10.213405
ABBOTSFORD BC CANADA 49.051905 -122.329047
ABEOKUTA NIGERIA 7.161000 3.348000
ABERDEEN SCOTLAND 57.145245 -2.091374
那我测试一下:
x = 'AARHUS DENMARK'
print(x)
geocache[x]
这就是我得到的:
丹麦奥鲁斯
KeyError Traceback(最后一次调用) 在 () 2 x = u'AARHUS 丹麦' 3 打印(x) ----> 4 geocache[x]
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
1785 return self._getitem_multilevel(key)
1786 else:
-> 1787 return self._getitem_column(key)
1788
1789 def _getitem_column(self, key):
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
1792 # get column
1793 if self.columns.is_unique:
-> 1794 return self._get_item_cache(key)
1795
1796 # duplicate columns & possible reduce dimensionaility
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1077 res = cache.get(item)
1078 if res is None:
-> 1079 values = self._data.get(item)
1080 res = self._box_item_values(item, values)
1081 cache[item] = res
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
2841
2842 if not isnull(item):
-> 2843 loc = self.items.get_loc(item)
2844 else:
2845 indexer = np.arange(len(self.items))[isnull(self.items)]
C:\Users\g\Anaconda3\lib\site-packages\pandas\core\index.py in get_loc(self, key, method)
1435 """
1436 if method is None:
-> 1437 return self._engine.get_loc(_values_from_object(key))
1438
1439 indexer = self.get_indexer([key], method=method)
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12349)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12300)()
KeyError: 'AARHUS DENMARK'
没有多余的空格或不可见的字符,尝试将 r 和 u 放在字符串赋值之前,但行为没有变化。
好的,我错过了什么?
因为您没有将 sep
(分隔符)arg 传递给 read_csv
,默认情况下以逗号分隔。由于您的 csv 在逗号之后包含 spaces/tabs,因此这些被视为数据的一部分,因此您的索引数据包含嵌入的空格。
所以你需要传递额外的参数给read_csv
:
pd.read_csv('geolog.csv',index_col=['GEO'], sep=',\s+', engine='python')
sep
arg 表示它将查找逗号前面有 1 个或多个可选空格的逗号,我们传递 engine='python'
因为 c 引擎不接受分隔符的正则表达式。