从子目录 (bash, python) 执行多个 *.dat 文件

Question

我有以下内容：

我有一个目录，其中包含充满文件的子目录。结构如下：/periodic_table/{Element}_lj_dat/lj_dat_sim.dat;
每个文件由两行（第一行是评论）和12列数据组成。
我想要得到的是遍历元素的所有文件夹（例如 Al、Cu 等），打开创建的文件（例如 periodic_table 目录中名为 "mergedlj.dat" 的文件）和将每个文件中的所有数据存储在一个文件中，添加父目录中的元素名称作为合并文件的第一列（或最后一列）。

最好的方法是忽略每个文件中的第一行，只保存第二行的数据。

我在 bash/shell 脚本编写方面经验不足，但我认为这是最好的方法（Python 也是可以接受的！）。不幸的是，我只体验过与脚本位于同一文件夹中的文件，所以这对我来说是一些新体验。

这是用于查找这些文件的代码，但实际上它并没有做我需要的任何事情：

find ../periodic_table/*_lj_dat/ -name lj_dat_sim.dat -print0 | while read -d $'[=12=]' file; do 
    echo "Processing $file"
done

任何帮助将不胜感激！！

Answer 1

这是一个 Python 解决方案。

您可以使用 glob() to get a list of the matching files and then iterate over them with fileinput.input()。 fileinput.filename() 可让您获取当前正在处理的文件的名称，这可用于在新文件开始处理时确定当前元素，由 fileinput.isfirstline() 确定。

当前元素被添加为合并文件的第一列。我假定输入文件中的字段分隔符是单个 space，但您可以通过更改下面的 ' '.join() 来更改它。

import re
import fileinput
from glob import glob

dir_prefix = '.'
glob_pattern = '{}/periodic_table/*_lj_dat/lj_dat_sim.dat'.format(dir_prefix)
element_pattern = re.compile(r'.*periodic_table/(.+)_lj_dat/lj_dat_sim.dat')

with open('mergedlj.dat', 'w') as outfile:
    element = ''
    for line in fileinput.input(glob(glob_pattern)):
        if fileinput.isfirstline():
            # extract the element name from the file name
            element = element_pattern.match(fileinput.filename()).groups()[0]
        else:
            print(' '.join([element, line]), end='', file=outfile)

您可以使用 os.path.join() 构建 glob 和元素正则表达式模式，但我在上面省略了它以避免弄乱答案。

从子目录 (bash, python) 执行多个 *.dat 文件

Execute multiple *.dat files from subdirectories (bash, python)

python

bash

shell

subdirectory

python-3.x