遍历目录下的文件时出现 FileNotFoundError

FileNotFoundError in iterating over files under a directory

import os
import pandas as pd

FILES = os.listdir("/CADEC/original")

for file in FILES:
    if file.startswith("ARTHROTEC."):
        print(file)
ARTHROTEC.1.ann
ARTHROTEC.10.ann
ARTHROTEC.100.ann
ARTHROTEC.101.ann
ARTHROTEC.102.ann
ARTHROTEC.103.ann
ARTHROTEC.104.ann
ARTHROTEC.105.ann
ARTHROTEC.106.ann
ARTHROTEC.107.ann
ARTHROTEC.108.ann
ARTHROTEC.109.ann
ARTHROTEC.11.ann
ARTHROTEC.110.ann
ARTHROTEC.111.ann
ARTHROTEC.112.ann
ARTHROTEC.113.ann
ARTHROTEC.114.ann
ARTHROTEC.115.ann
...

我想从目录下所有以特定字母开头的文件中提取数据。如上所示,当我遍历目录并打印适合的每个文件名时,我得到一列文件名(字符串)。同时,data = pd.read_csv("/CADEC/original/ARTHROTEC.1.ann", sep='\t', header=None) 工作得很好。但是,运行 下面的代码只会 return 错误。为什么找不到文件?我应该怎么做才能解决这个问题?

for file in FILES:
    if file.startswith("ARTHROTEC."):
        data = pd.read_csv(file, sep='\t', header=None)
FileNotFoundError: [Errno 2] File ARTHROTEC.1.ann does not exist: 'ARTHROTEC.1.ann'
  • os.listdir只return目录中的文件名,不return路径,pandas需要路径(或相对路径)文件,除非文件与代码位于同一目录中。
  • 学习 pathlib 模块会更好,它将路径视为具有方法的对象,而不是字符串。
  • pathlib 可能需要一些时间来适应,但是所有用于提取路径特定部分的方法,例如 .suffix for the file extension, or .stem 文件名,都是值得的。
import pandas as pd
from pathlib import Path

# create the path object and get the files with .glob
files = Path('/CADEC/original').glob('ARTHROTEC*.ann')

# create a list of dataframes, 1 dataframe for each file
df_list = [pd.read_csv(file, sep='\t', header=None) for file in files]

# alternatively, create a dict of dataframes with the filename as the key
df_dict = {file.stem: pd.read_csv(file, sep='\t', header=None) for file in files}

例子

Python 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] on win32
import os
  ...: from pathlib import Path
  ...: os.listdir('e:/PythonProjects/stack_overflow/t-files')
Out[2]: 
['.ipynb_checkpoints',
 '03900169.txt',
 '142233.0.txt',
 '153431.2.txt',
 '17371271.txt',
 '274301.5.txt',
 '42010316.txt',
 '429237.7.txt',
 '570651.4.txt',
 '65500027.txt',
 '688599.3.txt',
 '740103.5.txt',
 '742537.6.txt',
 '87505504.txt',
 '90950222.txt',
 't1.txt',
 't2.txt',
 't3.txt']

list(Path('e:/PythonProjects/stack_overflow/t-files').glob('*'))
Out[3]: 
[WindowsPath('e:/PythonProjects/stack_overflow/t-files/.ipynb_checkpoints'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/03900169.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/142233.0.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/153431.2.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/17371271.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/274301.5.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/42010316.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/429237.7.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/570651.4.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/65500027.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/688599.3.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/740103.5.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/742537.6.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/87505504.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/90950222.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/t1.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/t2.txt'),
 WindowsPath('e:/PythonProjects/stack_overflow/t-files/t3.txt')]