打开具有不同名称的目录列表并从中提取文件?

Open a list of directories with different names and extract a file from it?

我有不同名称(日期)的不同目录,我想从所有这些目录中提取一个带有大洲名称的文件,然后为所有日期合并该文件。谁能告诉我在 python 中最有效的方法吗?

我已经使用 glob 包进入目录。但不知道如何合并它们:

import glob

path = '/home/Data/pb/2014-*/ank.txt.gz'

for file in glob.glob(path):
    file.readlines()

为了读取 .gz 文件,您需要 gzip 模块:

import glob
import gzip

path = '/home/Data/pb/2014-*/ank.txt.gz'

# loop for each file *name* matching the glob pattern
for fname in glob.glob(path):
    # open the file as a gzip compressed file
    with gzip.open(fname, 'rt') as f:
        # for each line of the file
        for data in f:
             # do whatever you need here
             # ...

假设:

pb/2014-01-01/file_of_intereste.txt
pb/2014-02-01/file_of_intereste.txt
pb/2014-03-01/file_of_intereste.txt
...

首先,创建我的测试环境:

# Created 10 files in 10 directories named 
# pb/2014-$i/file_of_interest.txt. Then 
# pushed "contents_of_file_2014-$i" into each file.

jon$ for i in $(seq 1 10); do mkdir -p pb/2014-$i; echo contents_of_file_2014-$i > pb/2014-$i/file_of_interest.txt; done



# Run the merge.py (source below)
jon$ python merge.py
# See the output
jon$ cat output.txt
contents_of_file_2014-1
contents_of_file_2014-10
contents_of_file_2014-2
contents_of_file_2014-3
contents_of_file_2014-4
contents_of_file_2014-5
contents_of_file_2014-6
contents_of_file_2014-7
contents_of_file_2014-8
contents_of_file_2014-9

merge.py

$ cat merge.py
#!/usr/bin/env python

import glob
import gzip

merged_fname = "output.txt"
files = glob.glob('pb/2014-*/file_of_interest.txt')

with open(merged_fname, 'w') as merged_file_handle:
  for fname in files:
    # For gzip, use the gzip opener instead.
    # @sylvain
    #with gzip.open(fname, 'rt') as file_handle:
    with open(fname, 'r') as file_handle:
      merged_file_handle.write(file_handle.read())