打开具有不同名称的目录列表并从中提取文件?
Open a list of directories with different names and extract a file from it?
我有不同名称(日期)的不同目录,我想从所有这些目录中提取一个带有大洲名称的文件,然后为所有日期合并该文件。谁能告诉我在 python 中最有效的方法吗?
我已经使用 glob 包进入目录。但不知道如何合并它们:
import glob
path = '/home/Data/pb/2014-*/ank.txt.gz'
for file in glob.glob(path):
file.readlines()
为了读取 .gz 文件,您需要 gzip 模块:
import glob
import gzip
path = '/home/Data/pb/2014-*/ank.txt.gz'
# loop for each file *name* matching the glob pattern
for fname in glob.glob(path):
# open the file as a gzip compressed file
with gzip.open(fname, 'rt') as f:
# for each line of the file
for data in f:
# do whatever you need here
# ...
假设:
pb/2014-01-01/file_of_intereste.txt
pb/2014-02-01/file_of_intereste.txt
pb/2014-03-01/file_of_intereste.txt
...
首先,创建我的测试环境:
# Created 10 files in 10 directories named
# pb/2014-$i/file_of_interest.txt. Then
# pushed "contents_of_file_2014-$i" into each file.
jon$ for i in $(seq 1 10); do mkdir -p pb/2014-$i; echo contents_of_file_2014-$i > pb/2014-$i/file_of_interest.txt; done
# Run the merge.py (source below)
jon$ python merge.py
# See the output
jon$ cat output.txt
contents_of_file_2014-1
contents_of_file_2014-10
contents_of_file_2014-2
contents_of_file_2014-3
contents_of_file_2014-4
contents_of_file_2014-5
contents_of_file_2014-6
contents_of_file_2014-7
contents_of_file_2014-8
contents_of_file_2014-9
merge.py
$ cat merge.py
#!/usr/bin/env python
import glob
import gzip
merged_fname = "output.txt"
files = glob.glob('pb/2014-*/file_of_interest.txt')
with open(merged_fname, 'w') as merged_file_handle:
for fname in files:
# For gzip, use the gzip opener instead.
# @sylvain
#with gzip.open(fname, 'rt') as file_handle:
with open(fname, 'r') as file_handle:
merged_file_handle.write(file_handle.read())
我有不同名称(日期)的不同目录,我想从所有这些目录中提取一个带有大洲名称的文件,然后为所有日期合并该文件。谁能告诉我在 python 中最有效的方法吗?
我已经使用 glob 包进入目录。但不知道如何合并它们:
import glob
path = '/home/Data/pb/2014-*/ank.txt.gz'
for file in glob.glob(path):
file.readlines()
为了读取 .gz 文件,您需要 gzip 模块:
import glob
import gzip
path = '/home/Data/pb/2014-*/ank.txt.gz'
# loop for each file *name* matching the glob pattern
for fname in glob.glob(path):
# open the file as a gzip compressed file
with gzip.open(fname, 'rt') as f:
# for each line of the file
for data in f:
# do whatever you need here
# ...
假设:
pb/2014-01-01/file_of_intereste.txt
pb/2014-02-01/file_of_intereste.txt
pb/2014-03-01/file_of_intereste.txt
...
首先,创建我的测试环境:
# Created 10 files in 10 directories named
# pb/2014-$i/file_of_interest.txt. Then
# pushed "contents_of_file_2014-$i" into each file.
jon$ for i in $(seq 1 10); do mkdir -p pb/2014-$i; echo contents_of_file_2014-$i > pb/2014-$i/file_of_interest.txt; done
# Run the merge.py (source below)
jon$ python merge.py
# See the output
jon$ cat output.txt
contents_of_file_2014-1
contents_of_file_2014-10
contents_of_file_2014-2
contents_of_file_2014-3
contents_of_file_2014-4
contents_of_file_2014-5
contents_of_file_2014-6
contents_of_file_2014-7
contents_of_file_2014-8
contents_of_file_2014-9
merge.py
$ cat merge.py
#!/usr/bin/env python
import glob
import gzip
merged_fname = "output.txt"
files = glob.glob('pb/2014-*/file_of_interest.txt')
with open(merged_fname, 'w') as merged_file_handle:
for fname in files:
# For gzip, use the gzip opener instead.
# @sylvain
#with gzip.open(fname, 'rt') as file_handle:
with open(fname, 'r') as file_handle:
merged_file_handle.write(file_handle.read())