读取文件路径时出现额外字符
Extra characters while reading file path
我正在使用此代码读取目录中扩展名为“.xlsx”的所有文件,并将数据从它们上传到数据库。在读取文件时,我在文件名中得到了一些额外的字符。
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\NTS-IM-PHYSICS.xlsx
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\NTS-QUANTATIVE.xlsx
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\~$ECAT-CHEMISTRY.xlsx
上面的最后一行在最后的文件名之前有“~$”。我尝试删除该文件并重新创建。它恰好发生在读取的最后一个文件上。因此,我得到了最后放置的错误堆栈。
datafiles = os.path.join(os.getcwd(), "Data Files")
for r, d, f in os.walk(datafiles):
for file in f:
if file.endswith(".xlsx"):
header, data = read(os.path.join(r, file))
for i in range(1, len(data)):
insertRow(mydb, mycursor, data[i], file)
totalRows+=1
print("FileName: {} | Row: {} | Total: {}".format(file, i, totalRows))
我的阅读功能:
def read(file_name):
import pandas as pd
import numpy as np
print(file_name)
df = pd.read_excel(file_name, index_col=None, header=None)
df1 = df.replace(np.nan, '', regex=True)
data = df1.values.tolist()
header = data[0]
return header, data
错误跟踪:
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\~$ECAT-CHEMISTRY.xlsx
Traceback (most recent call last):
File "c:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\main.py", line 15, in <module>
header, data = read(os.path.join(r, file))
File "c:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\readXLSXFile.py", line 5, in read
df = pd.read_excel(file_name, index_col=None, header=None)
File "C:\Python38\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "C:\Python38\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Python38\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
self._reader = self._engines[engine](self._io)
File "C:\Python38\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
super().__init__(filepath_or_buffer)
File "C:\Python38\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
self.book = self.load_workbook(filepath_or_buffer)
File "C:\Python38\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
return open_workbook(filepath_or_buffer)
File "C:\Python38\lib\site-packages\xlrd\__init__.py", line 148, in open_workbook
bk = book.open_workbook_xls(
File "C:\Python38\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python38\lib\site-packages\xlrd\book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python38\lib\site-packages\xlrd\book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x06Haseeb '
这称为 owner file,是在您打开 Office 文件时自动创建的。您可能应该忽略代码中的那些内容,除非您有特定的理由阅读它们。
我正在使用此代码读取目录中扩展名为“.xlsx”的所有文件,并将数据从它们上传到数据库。在读取文件时,我在文件名中得到了一些额外的字符。
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\NTS-IM-PHYSICS.xlsx
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\NTS-QUANTATIVE.xlsx
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\~$ECAT-CHEMISTRY.xlsx
上面的最后一行在最后的文件名之前有“~$”。我尝试删除该文件并重新创建。它恰好发生在读取的最后一个文件上。因此,我得到了最后放置的错误堆栈。
datafiles = os.path.join(os.getcwd(), "Data Files")
for r, d, f in os.walk(datafiles):
for file in f:
if file.endswith(".xlsx"):
header, data = read(os.path.join(r, file))
for i in range(1, len(data)):
insertRow(mydb, mycursor, data[i], file)
totalRows+=1
print("FileName: {} | Row: {} | Total: {}".format(file, i, totalRows))
我的阅读功能:
def read(file_name):
import pandas as pd
import numpy as np
print(file_name)
df = pd.read_excel(file_name, index_col=None, header=None)
df1 = df.replace(np.nan, '', regex=True)
data = df1.values.tolist()
header = data[0]
return header, data
错误跟踪:
C:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\Data Files\~$ECAT-CHEMISTRY.xlsx
Traceback (most recent call last):
File "c:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\main.py", line 15, in <module>
header, data = read(os.path.join(r, file))
File "c:\Users\Haseeb\Desktop\UCP DATA\Cloud Based Entry Test Praparator\Project Files\Database\Python Scripts\Upload Data to database\readXLSXFile.py", line 5, in read
df = pd.read_excel(file_name, index_col=None, header=None)
File "C:\Python38\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "C:\Python38\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Python38\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
self._reader = self._engines[engine](self._io)
File "C:\Python38\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
super().__init__(filepath_or_buffer)
File "C:\Python38\lib\site-packages\pandas\io\excel\_base.py", line 353, in __init__
self.book = self.load_workbook(filepath_or_buffer)
File "C:\Python38\lib\site-packages\pandas\io\excel\_xlrd.py", line 37, in load_workbook
return open_workbook(filepath_or_buffer)
File "C:\Python38\lib\site-packages\xlrd\__init__.py", line 148, in open_workbook
bk = book.open_workbook_xls(
File "C:\Python38\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Python38\lib\site-packages\xlrd\book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Python38\lib\site-packages\xlrd\book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x06Haseeb '
这称为 owner file,是在您打开 Office 文件时自动创建的。您可能应该忽略代码中的那些内容,除非您有特定的理由阅读它们。