如何从结尾字符可以更改的文件夹中导入文件 - python pandas？

Question

我目前有一个文件夹，其中包含多个我试图从中读取的具有相似名称的文件。例如：文件夹包含文件：

apple_2019_08_26_23434.xls
apple_2019_08_25_55345.xls
apple_2019_08_24_99345.xls

文件名格式很简单：

 apple_<date>_<5 random numbers>.xls

如果我不关心末尾的随机 5 位数字，如何将 excel 文件读入 pandas df？

例如

df = pd.read_excel('e:\Document\apple_2019_08_26_<***wildcard***>.xls')

谢谢！

Answer 1

您可以通过 glob.

使用 unix 风格的路径名扩展

import glob

# get .txt files in current directory
txt_files = glob.glob('./*.txt')

# get .xls files in some_dir
xls_files = glob.glob('some_dir/*.xls')

# do stuff with files
# ...

这里的*基本上就是"anything"。

示例pandas：

import glob

for xls_file in glob.glob('e:/Document/apple_2019_08_26_*.xls'):
    df = pd.read_excel(xls_file)

    # do stuff with df
    # ...

Answer 2

使用 os.chdir 更改目录，然后导入 startwith 正确名称的所有文件：

import os
os.chdir(r'e:\Document')
dfs = [pd.read_excel(file) for file in os.listdir() if file.startswith('apple_2019_08')]

现在您可以通过索引访问每个数据帧：

print(dfs[0])

print(dfs[1])

或者如果它们具有相同的格式，则将它们合并为一个大数据框

df_all = pd.concat(dfs, ignore_index=True)

Answer 3

如果你想在代码中改变5位数字部分，你可以尝试这样的事情：

from os import listdir
from os.path import isfile, join
import pandas as pd

mypath = '/Users/username/aPath'
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

fiveDigitNumber = onlyfiles[0][17:22]
filename = onlyfiles[0][:17]+fiveDigitNumber+onlyfiles[0][22:]

df = pd.read_excel(filename)

如何从结尾字符可以更改的文件夹中导入文件 - python pandas？

How to import a file from a folder where the ending characters can change - python pandas?

python

wildcard

pandas