在 Pandas 数据帧中设置索引时出现 KeyError

Question

我在尝试设置数据帧的索引时遇到键盘错误。我以前在以相同方式设置索引时从未遇到过这种情况，我想知道出了什么问题？数据没有列 headers，因此 DataFrame headers 是 0,1,2,4,5 等。错误发生在任何列 header.

我在尝试使用第一列（我想将其用作唯一索引）时收到 KeyError: '0'。

上下文： 在下面的示例中，我选择了启用宏的 excel 电子表格，压缩数据，读取并将它们转换为数据帧。

然后我想将文件名包含在一列中，设置索引并去除空白，以便我可以使用索引标签来提取我需要的数据。并非每个工作表都会有索引标签，所以我尝试跳过索引中不包含这些标签的工作表。然后我想将每个结果连接成一个 DataFrame 并压缩未使用的列。

import itertools
import glob
from openpyxl import load_workbook
from pandas import DataFrame
import pandas as pd
import os

def get_data(ws):
        for row in ws.values:
            row_it = iter(row)
            for cell in row_it:
                if cell is not None:
                    yield itertools.chain((cell,), row_it)
                    break

def read_workbook(file_):
        wb = load_workbook(file_, data_only=True)
        for sheet in wb.worksheets:
            ws = sheet
        return DataFrame(get_data(ws))

path =r'dir'
allFiles = glob.glob(path + "/*.xlsm")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
        parsed_file = read_workbook(file_)
        parsed_file['filename'] = os.path.basename(file_)
        parsed_file.set_index(['0'], inplace = True)
        parsed_file.index.str.strip()
    try: 
        parsed_file.loc["Staff" : "Total"].copy()
        list_.append(parsed_file)
    except KeyError:
        pass

frame = pd.concat(list_)
print(frame.dropna(axis='columns', thresh=2, inplace = True))

示例数据框、需要的索引位置和要提取的标签。

     index
     0          1   2 
0    5          2   4
1    RTJHD      5   9
2    ABCD       4   6
3    Staff      9   3 --- extract from here
4    FHDHSK     3   2
5    IRRJWK     7   1
6    FJDDCN     1   8
7    67         4   7
8    Total      5   3 --- to here

错误

Traceback (most recent call last):

  File "<ipython-input-29-d8fd24ca84ec>", line 1, in <module>
    runfile('dir.py', wdir='C:/dir/Documents')

  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
    execfile(filename, namespace)

  File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "dir.py", line 36, in <module>
    parsed_file.set_index(['0'], inplace = True)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2830, in set_index
    level = frame[col]._values

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__
    return self._getitem_column(key)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals.py", line 3590, in get
    loc = self.items.get_loc(item)

  File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)

  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)

  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)

  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)

KeyError: '0'

Answer 1

您收到此错误是因为您的数据帧是在没有任何 headers 的情况下读入的。这意味着您的 headers 属于 Int64Index:

类型

Int64Index([0, 1, 2, 3, ...], dtype='int64')

在这一点上，我建议只通过索引访问 df.columns，无论您在哪里被迫处理它们：

parsed_file.set_index(parsed_file.columns[0], inplace = True)

如果您按索引访问，请不要对您的列名进行硬编码。替代方法是分配一些您自己的列名，然后引用这些名称。

在 Pandas 数据帧中设置索引时出现 KeyError

KeyError when setting index in a Pandas dataframe

python

excel

dataframe

pandas

openpyxl