对两个文本文件进行排序,使其缩进文本与其对齐

Sort two text files with its indented text aligned to it

我想比较我在实施前后生成的两个日志文件,看看它是否有任何影响。但是,我得到的日志的顺序一直都不一样。因为,日志文件也有多个缩进行,所以当我尝试排序时,所有内容都已排序。但是,我想保持 child 与 parent 的完整性。缩进行是空格而不是制表符。

如有任何帮助,我们将不胜感激。我可以接受任何 windows 解决方案或 Linux 解决方案。

文件示例:

#这是示例代码

Parent1 to be verified

    Child1 to be verified

    Child2 to be verified
        Child21 to be verified
        Child23 to be verified
        Child22 to be verified
            Child221 to be verified

    Child4 to be verified

    Child5 to be verified
        Child53 to be verified
        Child52 to be verified
            Child522 to be verified
            Child521 to be verified

    Child3 to be verified

简答(Linux解法):

sed ':a;N;$!ba;s/\n /@/g' test.txt | sort | sed ':a;N;$!ba;s/@/\n /g'

测试一下: test.txt

parent2
  child2-1
    child2-1-1
  child2-2
parent1
  child1-1
  child1-2
    child1-2-1
$ sed ':a;N;$!ba;s/\n /@/g' test.txt | sort | sed ':a;N;$!ba;s/@/\n /g'
parent1
  child1-1
  child1-2
    child1-2-1
parent2
  child2-1
    child2-1-1
  child2-2

解释:

的想法是用非换行符替换换行符后跟 indentation/space,这在你的文件中必须是唯一的(例如,我在这里使用 @,如果它不是在您的文件中是唯一的,请使用其他字符甚至字符串),因为我们需要将其转回换行符和 indentation/space 之后。

关于 sed 命令:

  1. :a 创建标签 'a'
  2. N 将下一行附加到模式 space
  3. $! 如果不是最后一行,ba 分支(转到)标签 'a'
  4. s 替换,/\n / 正则表达式换行,后跟 space
  5. /@/一个独特的字符来代替换行符和space
    如果它在您的文件中不是唯一的,请使用其他字符甚至字符串
  6. /g 全局匹配(尽可能多次)

我在这里发布另一个答案,使用 python.

对其进行分层排序

想法是将parents附加到children以确保相同parent下的children被排序在一起。

请参阅下面的 python 脚本:

"""Attach parent to children in an indentation-structured text"""
from typing import Tuple, List
import sys

# A unique separator to separate the parent and child in each line
SEPARATOR = '@'
# The indentation
INDENT = '    '

def parse_line(line: str) -> Tuple[int, str]:
    """Parse a line into indentation level and its content
    with indentation stripped

    Args:
        line (str): One of the lines from the input file, with newline ending

    Returns:
        Tuple[int, str]: The indentation level and the content with
            indentation stripped.

    Raises:
        ValueError: If the line is incorrectly indented.
    """
    # strip the leading white spaces
    lstripped_line = line.lstrip()
    # get the indentation
    indent = line[:-len(lstripped_line)]

    # Let's check if the indentation is correct
    # meaning it should be N * INDENT
    n = len(indent) // len(INDENT)
    if INDENT * n != indent:
        raise ValueError(f"Wrong indentation of line: {line}")

    return n, lstripped_line.rstrip('\r\n')


def format_text(txtfile: str) -> List[str]:
    """Format the text file by attaching the parent to it children

    Args:
        txtfile (str): The text file

    Returns:
        List[str]: A list of formatted lines
    """
    formatted = []
    par_indent = par_line = None

    with open(txtfile) as ftxt:
        for line in ftxt:
            # get the indentation level and line without indentation
            indent, line_noindent = parse_line(line)

            # level 1 parents
            if indent == 0:
                par_indent = indent
                par_line = line_noindent
                formatted.append(line_noindent)

            # children
            elif indent > par_indent:
                formatted.append(par_line +
                                 SEPARATOR * (indent - par_indent) +
                                 line_noindent)

                par_indent = indent
                par_line = par_line + SEPARATOR + line_noindent

            # siblings or dedentation
            else:
                # We just need first `indent` parts of parent line as our prefix
                prefix = SEPARATOR.join(par_line.split(SEPARATOR)[:indent])
                formatted.append(prefix + SEPARATOR + line_noindent)
                par_indent = indent
                par_line = prefix + SEPARATOR + line_noindent

    return formatted

def sort_and_revert(lines: List[str]):
    """Sort the formatted lines and revert the leading parents
    into indentations

    Args:
        lines (List[str]): list of formatted lines

    Prints:
        The sorted and reverted lines
    """
    sorted_lines = sorted(lines)
    for line in sorted_lines:
        if SEPARATOR not in line:
            print(line)
        else:
            leading, _, orig_line = line.rpartition(SEPARATOR)
            print(INDENT * (leading.count(SEPARATOR) + 1) + orig_line)

def main():
    """Main entry"""
    if len(sys.argv) < 2:
        print(f"Usage: {sys.argv[0]} <file>")
        sys.exit(1)

    formatted = format_text(sys.argv[1])
    sort_and_revert(formatted)

if __name__ == "__main__":
    main()

让我们将其保存为 format.py,我们有一个测试文件,比如 test.txt:

parent2
    child2-1
        child2-1-1
    child2-2
parent1
    child1-2
        child1-2-2
        child1-2-1
    child1-1

我们来测试一下:

$ python format.py test.txt
parent1
    child1-1
    child1-2
        child1-2-1
        child1-2-2
parent2
    child2-1
        child2-1-1
    child2-2

如果你想知道 format_text 函数是如何格式化文本的,这里是中间结果,这也解释了为什么我们可以按我们想要的方式对文件进行排序:

parent2
parent2@child2-1
parent2@child2-1@child2-1-1
parent2@child2-2
parent1
parent1@child1-2
parent1@child1-2@child1-2-2
parent1@child1-2@child1-2-1
parent1@child1-1

您可能会看到每个 child 都附加了 parent,一直到根。这样同一个parent下的children排在一起