读取文件路径时出现额外字符

Question

我正在使用此代码读取目录中扩展名为“.xlsx”的所有文件，并将数据从它们上传到数据库。在读取文件时，我在文件名中得到了一些额外的字符。

C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\NTS-IM-PHYSICS.xlsx
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\NTS-QUANTATIVE.xlsx
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\~$ECAT-CHEMISTRY.xlsx

上面的最后一行在最后的文件名之前有“~$”。我尝试删除该文件并重新创建。它恰好发生在读取的最后一个文件上。因此，我得到了最后放置的错误堆栈。

datafiles = os.path.join(os.getcwd(), "Data Files")
for r, d, f in os.walk(datafiles):
    for file in f:
        if file.endswith(".xlsx"):
            header, data = read(os.path.join(r, file))
            for i in range(1, len(data)):
                insertRow(mydb, mycursor, data[i], file)
                totalRows+=1
                print("FileName: {} | Row: {} | Total: {}".format(file, i, totalRows))

我的阅读功能：

def read(file_name):
    import pandas as pd
    import numpy as np
    print(file_name)
    df = pd.read_excel(file_name, index_col=None, header=None) 
    df1 = df.replace(np.nan, '', regex=True)
    data = df1.values.tolist()
    header = data[0]
    return header, data

错误跟踪：

C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\~$ECAT-CHEMISTRY.xlsx
Traceback (most recent call last):
  File "c:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\main.py", line 15, in <module>
    header, data = read(os.path.join(r, file))
  File "c:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\readXLSXFile.py", line 5, in read
    df = pd.read_excel(file_name, index_col=None, header=None)
  File "C:\Python38\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "C:\Python38\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Python38\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:\Python38\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
    super().__init__(filepath_or_buffer)
  File "C:\Python38\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
    self.book = self.load_workbook(filepath_or_buffer)
  File "C:\Python38\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
    return open_workbook(filepath_or_buffer)
  File "C:\Python38\lib\site-packages\xlrd\__init__.py", line 148, in open_workbook
    bk = book.open_workbook_xls(
  File "C:\Python38\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
    biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
  File "C:\Python38\lib\site-packages\xlrd\book.py", line 1278, in getbof
    bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
  File "C:\Python38\lib\site-packages\xlrd\book.py", line 1272, in bof_error
    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x06Haseeb '

Answer 1

这称为 owner file，是在您打开 Office 文件时自动创建的。您可能应该忽略代码中的那些内容，除非您有特定的理由阅读它们。

读取文件路径时出现额外字符

Extra characters while reading file path

python

xlrd