在 Pandas 数据帧中设置索引时出现 KeyError
KeyError when setting index in a Pandas dataframe
我在尝试设置数据帧的索引时遇到键盘错误。我以前在以相同方式设置索引时从未遇到过这种情况,我想知道出了什么问题?数据没有列 headers,因此 DataFrame headers 是 0,1,2,4,5 等。错误发生在任何列 header.
我在尝试使用第一列(我想将其用作唯一索引)时收到 KeyError: '0'。
上下文:
在下面的示例中,我选择了启用宏的 excel 电子表格,压缩数据,读取并将它们转换为数据帧。
然后我想将文件名包含在一列中,设置索引并去除空白,以便我可以使用索引标签来提取我需要的数据。并非每个工作表都会有索引标签,所以我尝试跳过索引中不包含这些标签的工作表。然后我想将每个结果连接成一个 DataFrame 并压缩未使用的列。
import itertools
import glob
from openpyxl import load_workbook
from pandas import DataFrame
import pandas as pd
import os
def get_data(ws):
for row in ws.values:
row_it = iter(row)
for cell in row_it:
if cell is not None:
yield itertools.chain((cell,), row_it)
break
def read_workbook(file_):
wb = load_workbook(file_, data_only=True)
for sheet in wb.worksheets:
ws = sheet
return DataFrame(get_data(ws))
path =r'dir'
allFiles = glob.glob(path + "/*.xlsm")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
parsed_file = read_workbook(file_)
parsed_file['filename'] = os.path.basename(file_)
parsed_file.set_index(['0'], inplace = True)
parsed_file.index.str.strip()
try:
parsed_file.loc["Staff" : "Total"].copy()
list_.append(parsed_file)
except KeyError:
pass
frame = pd.concat(list_)
print(frame.dropna(axis='columns', thresh=2, inplace = True))
示例数据框、需要的索引位置和要提取的标签。
index
0 1 2
0 5 2 4
1 RTJHD 5 9
2 ABCD 4 6
3 Staff 9 3 --- extract from here
4 FHDHSK 3 2
5 IRRJWK 7 1
6 FJDDCN 1 8
7 67 4 7
8 Total 5 3 --- to here
错误
Traceback (most recent call last):
File "<ipython-input-29-d8fd24ca84ec>", line 1, in <module>
runfile('dir.py', wdir='C:/dir/Documents')
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "dir.py", line 36, in <module>
parsed_file.set_index(['0'], inplace = True)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2830, in set_index
level = frame[col]._values
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__
return self._getitem_column(key)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column
return self._get_item_cache(key)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache
values = self._data.get(item)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: '0'
您收到此错误是因为您的数据帧是在没有任何 headers 的情况下读入的。这意味着您的 headers 属于 Int64Index
:
类型
Int64Index([0, 1, 2, 3, ...], dtype='int64')
在这一点上,我建议只通过索引访问 df.columns
,无论您在哪里被迫处理它们:
parsed_file.set_index(parsed_file.columns[0], inplace = True)
如果您按索引访问,请不要对您的列名进行硬编码。替代方法是分配一些您自己的列名,然后引用这些名称。
我在尝试设置数据帧的索引时遇到键盘错误。我以前在以相同方式设置索引时从未遇到过这种情况,我想知道出了什么问题?数据没有列 headers,因此 DataFrame headers 是 0,1,2,4,5 等。错误发生在任何列 header.
我在尝试使用第一列(我想将其用作唯一索引)时收到 KeyError: '0'。
上下文: 在下面的示例中,我选择了启用宏的 excel 电子表格,压缩数据,读取并将它们转换为数据帧。
然后我想将文件名包含在一列中,设置索引并去除空白,以便我可以使用索引标签来提取我需要的数据。并非每个工作表都会有索引标签,所以我尝试跳过索引中不包含这些标签的工作表。然后我想将每个结果连接成一个 DataFrame 并压缩未使用的列。
import itertools
import glob
from openpyxl import load_workbook
from pandas import DataFrame
import pandas as pd
import os
def get_data(ws):
for row in ws.values:
row_it = iter(row)
for cell in row_it:
if cell is not None:
yield itertools.chain((cell,), row_it)
break
def read_workbook(file_):
wb = load_workbook(file_, data_only=True)
for sheet in wb.worksheets:
ws = sheet
return DataFrame(get_data(ws))
path =r'dir'
allFiles = glob.glob(path + "/*.xlsm")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
parsed_file = read_workbook(file_)
parsed_file['filename'] = os.path.basename(file_)
parsed_file.set_index(['0'], inplace = True)
parsed_file.index.str.strip()
try:
parsed_file.loc["Staff" : "Total"].copy()
list_.append(parsed_file)
except KeyError:
pass
frame = pd.concat(list_)
print(frame.dropna(axis='columns', thresh=2, inplace = True))
示例数据框、需要的索引位置和要提取的标签。
index
0 1 2
0 5 2 4
1 RTJHD 5 9
2 ABCD 4 6
3 Staff 9 3 --- extract from here
4 FHDHSK 3 2
5 IRRJWK 7 1
6 FJDDCN 1 8
7 67 4 7
8 Total 5 3 --- to here
错误
Traceback (most recent call last):
File "<ipython-input-29-d8fd24ca84ec>", line 1, in <module>
runfile('dir.py', wdir='C:/dir/Documents')
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "dir.py", line 36, in <module>
parsed_file.set_index(['0'], inplace = True)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2830, in set_index
level = frame[col]._values
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__
return self._getitem_column(key)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column
return self._get_item_cache(key)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache
values = self._data.get(item)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: '0'
您收到此错误是因为您的数据帧是在没有任何 headers 的情况下读入的。这意味着您的 headers 属于 Int64Index
:
Int64Index([0, 1, 2, 3, ...], dtype='int64')
在这一点上,我建议只通过索引访问 df.columns
,无论您在哪里被迫处理它们:
parsed_file.set_index(parsed_file.columns[0], inplace = True)
如果您按索引访问,请不要对您的列名进行硬编码。替代方法是分配一些您自己的列名,然后引用这些名称。