如何合并子目录中的文件并对多个子目录执行此功能

Question

我有一个包含约 2000 个子目录的目录，每个子目录中有 2-10 个 txt 文件。我想打开每个子目录并将内容合并或连接到一个文件中，因此我将有 2000 个目录，每个目录有 1 个 txt 文件。我曾尝试使用 unix 命令来执行此操作，但我似乎无法让命令在特定的子目录中执行，然后更改目录并再次执行该功能。

find . -maxdepth 1 -name "*.faa" -exec cat {}

有没有办法将它变成 bash 脚本并在整个目录中使用它运行，或者我应该寻找更像 python 的东西来尝试完成这个任务。

谢谢，如果有人问我，我深表歉意。

Answer 1

这应该能满足您的需求，并且可以根据您的需要进行定制：

import os

OLD_BASE = '/tmp/so/merge/old'
NEW_BASE = '/tmp/so/merge/new'
NEW_NAME = 'merged.txt'

def merge_files(infiles, outfile):
    with open(outfile, 'wb') as fo:
        for infile in infiles:
            with open(infile, 'rb') as fi:
                fo.write(fi.read())


for (dirpath, dirnames, filenames) in os.walk(OLD_BASE):
    base, tail = os.path.split(dirpath)
    if base != OLD_BASE: continue  # Don't operate on OLD_BASE, only children directories

    # Build infiles list
    infiles = sorted([os.path.join(dirpath, filename) for filename in filenames])

    # Create output directory
    new_dir =  os.path.join(NEW_BASE, tail)
    os.mkdir(new_dir)  # This will raise an OSError if the directory already exists

    # Build outfile name
    outfile = os.path.join(new_dir, NEW_NAME)

    # Merge
    merge_files(infiles, outfile)

最终结果是，对于OLD_BASE中的每个目录，在NEW_BASE中创建一个同名目录。在每个 NEW_BASE 子目录中，将创建一个名为 merged.txt 的文件，其中包含相应 OLD_BASE 子目录中文件的串联内容。

所以

<OLD_BASE>
    DIR_1
        FILE_1
        FILE_2
    DIR_2
        FILE_3
        FILE_4
        FILE_5
    DIR_3
        FILE_6

变成

<NEW_BASE>
    DIR_1
        <NEW_NAME> (=FILE_1 + FILE_2)
    DIR_2
        <NEW_NAME> (=FILE_3 + FILE_4 + FILE_5)
    DIR_3
        <NEW_NAME> (=FILE_6)

我知道你说过文件合并的顺序无关紧要，但是这会按文件名的字母顺序合并它们（区分大小写），以防将来的观众感兴趣。如果你真的不是，你可以删除 sorted() 包装功能。

Answer 2

如果我没理解错的话，可以这样做：

find -maxdepth 1 -type d -exec sh -c 'cd "[=10=]" && cat *.faa > bigfile' {} \;

它找到当前目录中的所有子目录（非递归），cd 进入它们，并将所有 *.faa 文件连接成一个名为 bigfile 的文件（在子目录中).

如何合并子目录中的文件并对多个子目录执行此功能

How to merge files within a sub-directory and perform this function on mutliple sub-directories

python

unix

bash

concatenation