python 函数从子目录读取文件

Question

我正在尝试编写此函数，以便我可以传递文件或文件夹并使用 pandas 从中读取。

import pandas as pd
import os

path = os.getcwd()
path = '..' #this would be root

revenue_folder = '../Data/Revenue'
random_file = '2017-08-01_Aug.csv'

def csv_reader(csv_file):
for root, dirs, files in os.walk(path):
    for f in files:
        with open(os.path.join(root, csv_file)) as f1:
            pd.read_csv(f1, sep = ';')
            print(f1)

csv_reader(random_file)

FileNotFoundError: [Errno 2] No such file or directory: '../2017-08-01_Aug.csv'

此后我尝试进行一些更改，但现在的问题是它转到了另一个子目录。我想要的是遍历我所有的文件和文件夹，找到所需的文件，然后读取它。要清楚我想要的文件在 revenue_folder.

def csv_reader(csv_file):
for root, dirs, files in os.walk(path):
    for f in files:
        base, ext = os.path.splitext(f)
        if ('csv' in ext):
            print (root)
            with open(os.path.join(root, csv_file)) as f1:
                pd.read_excel(f1, sep = ':')
                print(f1)

csv_reader(random_file)

FileNotFoundError: [Errno 2] No such file or directory: './Data/Backlog/2017-08-01_Aug.csv'

Answer 1

经过编辑，问题的整个场景都发生了变化。下面的代码通过 Files 和 Folders 递归搜索以查找符合条件

的文件

def get_all_matching_files(root_path, matching_criteria):
    """
    Gets all files that match a string criteria.
    :param root_path: the root directory path from where searching needs to begin
    :param matching_criteria: a string or a tuple of strings that needs to be matched in the file n
    :return: a list of all matching files
    """
    return [os.path.join(root, name) for root, dirs, files in os.walk(root_path) for name in files

            if name.endswith(matching_criteria)]


def main(root_path):
    """
    The main method to start finding the file.
    :param root_path: The root dir where the search needs to be started.
    :return: None
    """
    if len(root_path) < 2:
        raise ValueError('The root path must be more than 2 characters')

    all_matching_files = get_all_matching_files(root_path, '2017-08-01_Aug.csv')
    if not all_matching_files:
        print('no files were found matching that criteria.')
        return

    for matched_files in all_matching_files:
        data_frame = pd.read_csv(matched_files)
        # your code here on what to do with the dataframe


    print('Completed search!')


if __name__ == '__main__':
    root_dir_path = os.getcwd()
    main(root_dir_path)

注意 endswith() 我曾经用来匹配文件，这样您就可以灵活地发送文件 extension (.csv) 并获得所有文件。此外，endswith() 也接受一个元组，因此创建一个包含所有文件或扩展名的 tuple，该方法将起作用。

其他建议：

当尝试使用 pandas 读取文件时，您没有输入代码：

with open(os.path.join(root, csv_file)) as f1:
    pd.read_csv(f1, sep = ';')
    print(f1)

相反你需要做：

# set the file path into a variable to make code readable
filepath = os.path.join(revenue_folder, random_file)
# read the data and store it into a variable of type DataFrame
my_dataframe_from_file = pd.read_csv(filepath,sep=';')

python 函数从子目录读取文件

python function read file from subdirectory

function

os.walk

python-3.x

pandas

其他建议：