如何在 python 中找到哪个文件有键盘错误?
How do I locate which file has a keyerror in python?
我在python写了一个预处理脚本,有助于巩固信心。下面是我的脚本:
import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob
inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/')
for file in inp_dir.glob('*.csv'):
df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
df.to_csv(file,index=False)
错误:
Traceback (most recent call last):
File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'confidence'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<ipython-input-1-0cbf17caf540>", line 11, in <module>
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'confidence'
我不明白我收到这个错误是因为我的目录中的一个文件没有列 'confidence'。但是如何找到该文件或打印文件名?
添加 try
和 exception
案例:
import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob
inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/')
for file in inp_dir.glob('*.csv'):
try:
df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
df.to_csv(file,index=False)
except:
# assumes error is known
print("Invalid column in file:", file)
您也可以使用 sys module 来获取带有异常的错误输出。
打印您正在处理的文件的最简单方法。
import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob
inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/')
for file in inp_dir.glob('*.csv'):
print(f"Reading: {file}")
df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
df.to_csv(file,index=False)
也许检查列名称是否已列出 confidence
,如果没有则中断...
import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob
inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/')
for file in inp_dir.glob('*.csv'):
df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
if 'confidence' not in df.columns:
print('filename: ' + str(file))
break
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
df.to_csv(file,index=False)
我在python写了一个预处理脚本,有助于巩固信心。下面是我的脚本:
import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob
inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/')
for file in inp_dir.glob('*.csv'):
df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
df.to_csv(file,index=False)
错误:
Traceback (most recent call last):
File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'confidence'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<ipython-input-1-0cbf17caf540>", line 11, in <module>
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3024, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Users\jtharian\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 'confidence'
我不明白我收到这个错误是因为我的目录中的一个文件没有列 'confidence'。但是如何找到该文件或打印文件名?
添加 try
和 exception
案例:
import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob
inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/')
for file in inp_dir.glob('*.csv'):
try:
df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
df.to_csv(file,index=False)
except:
# assumes error is known
print("Invalid column in file:", file)
您也可以使用 sys module 来获取带有异常的错误输出。
打印您正在处理的文件的最简单方法。
import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob
inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/')
for file in inp_dir.glob('*.csv'):
print(f"Reading: {file}")
df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
df.to_csv(file,index=False)
也许检查列名称是否已列出 confidence
,如果没有则中断...
import pandas as pd
import numpy as np
from pathlib import Path
import glob as glob
inp_dir = Path(r'C:/Users/jtharian/Desktop/bbc/')
for file in inp_dir.glob('*.csv'):
df = pd.read_csv(file, sep=',', quotechar='|',error_bad_lines=False)
if 'confidence' not in df.columns:
print('filename: ' + str(file))
break
df['confidence'] = df['confidence'].replace(np.nan, 0.01)
df.to_csv(file,index=False)