对嵌套的 Markdown 列表进行排序?

Sort Nested Markdown List?

我正在寻找用于对嵌套降价列表进行排序的脚本、方法或工具。我使用 sublime text,它有一个内置的排序行功能,但这个功能会破坏任何嵌套列表的顺序。例如,如果我想排序:

* Zoo Animals
    * Herbivores
        * Zebra
        * Gazelle
    * Carnivores
        * Tiger
        * Lion
    * Omnivores
        * Gorilla
        * Baboon
        * Chimpanzee
* Domestic Animals
    * Canines
        * German Shepherd
        * Cocker Spaniel

使用 sublime sort lines 函数,我得到:

        * Baboon
        * Chimpanzee
        * Cocker Spaniel
        * Gazelle
        * German Shepherd
        * Gorilla
        * Lion
        * Tiger
        * Zebra
    * Canines
    * Carnivores
    * Herbivores
    * Omnivores
* Domestic Animals
* Zoo Animals

显然,这不是我想要的。我想要的是一个 "scoped sort",它相对于每个项目符号级别进行排序,而不破坏嵌套关系,如下所示:

* Domestic Animals
    * Canines
        * Cocker Spaniel
        * German Shepherd
* Zoo Animals
    * Carnivores
        * Lion
        * Tiger
    * Herbivores
        * Gazelle
        * Zebra
    * Omnivores
        * Baboon
        * Chimpanzee
        * Gorilla

以下是我调查过的一些事情以及我对每件事的看法:

您将如何对大型嵌套降价列表进行排序?

更新#1:

@J4G 创建了一个很棒的 Atom 包,解决了最初的排序问题,请参阅他对 link.

的回答

前面的列表是没有代码块和编号列表的简单列表。然而,在对现实生活中的降价列表进行排序时,我们有代码块和编号列表以及以特殊字符开头的行 - 嵌套在列表中,如下所示:

* Commands
    * Migrations
        * `rake db:migrate` - push all migrations to the database
        * 'STEP=3' - revert the last 3 migrations
    * `Rails`
        * `c` - start rails console, run code from your app!
    * `Rake`
        * Rake Task
        ```ruby
        desc 'process csv'
        task process_csv: :environment do
            Node.process_csv
        end
        ```
* Package Upgrade Status:
    1. Install Package
    2. Attach Plugin
    3. Review Installation
    ~~~
    |Install|Status|
    |Yes|Pending|
    ~~~

排序后,我认为上面的降价列表应该return不变,因为刻度线和引号没有排序意义,代码块/编号列表已经按正确顺序创建。

这是您可以使用 Ruby 实现的一种方法。假设字符串由变量 str.

保存

代码

def sort_indented(str)
  arr = str.lines.map { |s| [indentation(s), s.chomp] }
  indent_offset = arr.map(&:first).uniq.sort.each_with_index.
    with_object({}) { |(indent, i),h| h[indent] = i }
  dim_size = indent_offset.size 
  prev = []
  arr.map do |indent, s|
    a = ['']*dim_size
    offset = indent_offset[indent]
    a[offset] = s
    a[0,offset] = prev.first(offset)
    prev = a
    a
  end.sort.map { |a| a[a.rindex { |s| s != '' }] }.join("\n") 
end

def indentation(s)
  s[/^\s*/].size
end

例子

str =<<THE_END 
* Zoo Animals
    * Herbivores
        * Zebra
        * Gazelle
    * Carnivores
        * Tiger
        * Lion
    * Omnivores
        * Gorilla
        * Baboon
        * Chimpanzee
* Domestic Animals
    * Canines
        * German Shepherd
        * Cocker Spaniel
THE_END

在 Ruby 中,这种用于定义字符串文字的构造称为“here document”,或 "here doc".

puts sort_indented(str)

* Domestic Animals
    * Canines
        * Cocker Spaniel
        * German Shepherd
* Zoo Animals
    * Carnivores
        * Lion
        * Tiger
    * Herbivores
        * Gazelle
        * Zebra
    * Omnivores
        * Baboon
        * Chimpanzee
        * Gorilla

一般方法

当Ruby对数组的数组进行排序时,如:

a = [1,2,4]
b = [4,5,6]
c = [1,2,3,5]]
[a, b, c]

它将首先按每个数组的第一个元素排序。由于 ac 在偏移量零处都有相同的元素 1,而 b 在该偏移量处有一个 4,因此 ac 将排在排序数组中的 b 之前。 Ruby 查看 ac 的第二个元素来打破平局。由于它们都等于 2,Ruby 继续到第三个元素,平局被打破:ca 之前,因为 3 < 4.

我会将 arr 转换为以下数组:

result =     
[["* Zoo Animals"     , ""                , ""],
 ["* Zoo Animals"     , "    * Herbivores", ""],
 ["* Zoo Animals"     , "    * Herbivores", "        * Zebra"],
 ["* Zoo Animals"     , "    * Herbivores", "        * Gazelle"],
 ["* Zoo Animals"     , "    * Carnivores", ""],
 ["* Zoo Animals"     , "    * Carnivores", "        * Tiger"],
 ["* Zoo Animals"     , "    * Carnivores", "        * Lion"], 
 ["* Zoo Animals"     , "    * Omnivores" , ""],
 ["* Zoo Animals"     , "    * Omnivores" , "        * Gorilla"],
 ["* Zoo Animals"     , "    * Omnivores" , "        * Baboon"],
 ["* Zoo Animals"     , "    * Omnivores" , "        * Chimpanzee"],
 ["* Domestic Animals", ""                , ""],
 ["* Domestic Animals", "    * Canines"   , ""],
 ["* Domestic Animals", "    * Canines"   , "        * German Shepherd"],
 ["* Domestic Animals", "    * Canines"   , "        * Cocker Spaniel"]]

在这种形式下,我们可以排序:

result.sort
  #=> [["* Domestic Animals", "", ""],
  #    ["* Domestic Animals", "    * Canines", ""],
  #    ["* Domestic Animals", "    * Canines", "        * Cocker Spaniel"],
  #    ["* Domestic Animals", "    * Canines", "        * German Shepherd"],
  #    ["* Zoo Animals", "", ""], ["* Zoo Animals", "    * Carnivores", ""],
  #    ["* Zoo Animals", "    * Carnivores", "        * Lion"],
  #    ["* Zoo Animals", "    * Carnivores", "        * Tiger"],
  #    ["* Zoo Animals", "    * Herbivores", ""],
  #    ["* Zoo Animals", "    * Herbivores", "        * Gazelle"],
  #    ["* Zoo Animals", "    * Herbivores", "        * Zebra"],
  #    ["* Zoo Animals", "    * Omnivores", ""],
  #    ["* Zoo Animals", "    * Omnivores", "        * Baboon"],
  #    ["* Zoo Animals", "    * Omnivores", "        * Chimpanzee"],
  #    ["* Zoo Animals", "    * Omnivores", "        * Gorilla"]] 

最后一步是从排序数组的每个元素中提取最后一个非空字符串。

详细说明

首先我们定义一个辅助方法来计算字符串的缩进:

def indentation(s)
  s[/^\s*/].size
end

例如,

            #1234
indentation("    * Herbivores")
  #=> 4

现在让我们将字符串转换为行数组:

a = str.lines
  #=> ["* Zoo Animals\n",
  #    "    * Herbivores\n",
  #    "        * Zebra\n",
  #    "        * Gazelle\n",
  #    "    * Carnivores\n",
  #    "        * Tiger\n",
  #    "        * Lion\n",
  #    "    * Omnivores\n",
  #    "        * Gorilla\n",
  #    "        * Baboon\n",
  #    "        * Chimpanzee\n",
  #    "* Domestic Animals\n",
  #    "    * Canines\n",
  #    "        * German Shepherd\n",
  #    "        * Cocker Spaniel\n"]

接下来,我们将 a 转换为对数组,对中的第二个元素是 a 的元素(字符串),换行符从末尾切掉,第一个是它的缩进:

arr = a.map { |s| [indentation(s), s.chomp] }
  # => [[0, "* Zoo Animals"],        [4, "    * Herbivores"],
  #     [8, "        * Zebra"],      [8, "        * Gazelle"],
  #     [4, "    * Carnivores"],     [8, "        * Tiger"],
  #     [8, "        * Lion"],       [4, "    * Omnivores"],
  #     [8, "        * Gorilla"],    [8, "        * Baboon"],
  #     [8, "        * Chimpanzee"], [0, "* Domestic Animals"],
  #     [4, "    * Canines"],        [8, "        * German Shepherd"],
  #     [8, "        * Cocker Spaniel"]] 

事实上,我们将一步执行前两个操作:

arr = str.lines.map { |s| [indentation(s), s.chomp] }

接下来,我们需要知道使用的缩进:

indents = arr.map { |pair| pair.first }
  #=> [0, 4, 8, 8, 4, 8, 8, 4, 8, 8, 8, 0, 4, 8, 8] 

我们可以这样写更经济:

indents = arr.map(&:first)

为了找到我们写的唯一缩进:

unique = indents.uniq
  #=> [0, 4, 8] 

如果顺序不对,我们应该对它们进行排序:

sorted = unique.sort
  #=> [0, 4, 8] 

这三个缩进都会对应我们排序的数组中的offset,这样构造hash很方便:

indent_offset = sorted.each_with_index.with_object({}) do |(indent, i),h|
  h[indent] = i
end
  #=> {0=>0, 4=>1, 8=>2}

同样,我们可以通过组合几个步骤来执行此计算:

indent_offset = arr.map(&:first).uniq.sort.each_with_index.
  with_object({}) { |(indent, i),h| h[indent] = i }

接下来我们将 arr 的每个元素替换为一个三元素字符串数组:

dim_size = indent_offset.size 
  #=> 3
prev = []
result = arr.map do |indent, s|
  a = ['']*dim_size
  offset = indent_offset[indent]
  a[offset] = s
  a[0,offset] = prev.first(offset)
  prev = a
  a
end

这个计算的结果是我在上面一般方法下给出的第一个数组。我们现在可以对 result 进行排序以获得我在 一般方法 :

下给出的第二个数组
sorted = result.sort

最后两步是用最后一个非空字符串替换sorted(三元素数组)的每个元素:

sorted_strings = sorted.map { |a| a[a.rindex { |s| s != '' }] }

然后将这些字符串连接成一个字符串:

sorted_strings.join("\n")

如果您有兴趣使用 Atom(我强烈推荐它作为 Sublime 的免费替代品),我只是制作了一个包来满足您的需求。

https://atom.io/packages/markdown-sort-list

如果有人仍然感兴趣,我为此创建了一个vscode extension

它不仅可以进行范围排序,还可以删除唯一值,可以对嵌套项进行递归排序,可以不区分大小写等。

它也满足了 OP 的另一个要求,即在列表项下包含内容。

代码可以在 github, and here's the file for the actual implementation 上找到。

这里是文本形式,有点过时,因为较新的代码添加了更具体的选项:

// @ts-check
const getValuesRegex = /^(?<indentation>\s*)(?<char>[-*+])/;

/**
 *  @typedef {object} Options
 *  @property {boolean} [recursive]
 *  @property {boolean} [reverse]
 *  @property {boolean} [unique]
 *  @property {boolean} [caseInsensitive]
 */

/**
 * @param {string} a
 * @param {string} b
 */
function stringSortCaseInsensitive(a, b) {
    const lowerA = a.toLowerCase();
    const lowerB = b.toLowerCase();

    if (lowerA < lowerB) {
        return -1;
    } else if (lowerA > lowerB) {
        return 1;
    }

    return 0;
}

/** @param {string} str **/
function calculateSpaceLength(str) {
    return str.replace('\t', '    ').length;
}

/**
 * @param {string[]} sections
 * @param {Options} options
 */
function getModifiedSections(sections, options) {
    if (options.caseInsensitive) {
        sections.sort(stringSortCaseInsensitive);
    } else {
        sections.sort();
    }

    if (options.reverse) {
        sections.reverse();
    }

    if (options.unique) {
        /** @type {Set<string>} */
        const haveSeen = new Set();
        const unique = [];

        for (const section of sections) {
            const adjustedSection = options.caseInsensitive
                ? section.toLowerCase()
                : section;

            if (!haveSeen.has(adjustedSection)) {
                unique.push(section);
                haveSeen.add(adjustedSection);
            }
        }

        return unique;
    }

    return sections;
}

/**
 * @param {string[]} lines
 * @param {number} index
 * @param {Options} options
 */
function sortInnerSection(lines, index, options) {
    /** @type {string[]} */
    const sections = [];
    let currentIndentation = '';
    let amountAdded = 0;

    for (let i = index; i < lines.length; i++) {
        const line = lines[i];
        const match = line.match(getValuesRegex);
        const indentation = match?.groups?.indentation || '';
        const listChar = match?.groups?.char;

        if (!currentIndentation && indentation) {
            currentIndentation = indentation;
        }

        const indentationLength = calculateSpaceLength(indentation);
        const currentIndentationLength =
            calculateSpaceLength(currentIndentation);

        if (!listChar) {
            amountAdded++;
            sections[sections.length - 1] += '\n' + line;
        } else if (indentationLength === currentIndentationLength) {
            amountAdded++;
            sections.push(line);
        } else if (indentationLength > currentIndentationLength) {
            const child = sortInnerSection(lines, i, options);
            sections[sections.length - 1] += '\n' + child.content;
            i += child.amountAdded - 1;
            amountAdded += child.amountAdded;
        } else {
            break;
        }
    }

    return {
        content: getModifiedSections(sections, options).join('\n'),
        amountAdded,
    };
}

/**
 *  @param {string} text
 *  @param {Options} options
 */
function sort(text, options) {
    const lines = text.trimEnd().split(/\r?\n/);
    let sections = [];
    let currentSection = [];
    let currentIndentation = '';

    for (let i = 0; i < lines.length; i++) {
        const line = lines[i];
        const match = line.match(getValuesRegex);
        const indentation = match?.groups?.indentation || '';
        const listChar = match?.groups?.char;

        if (currentSection.length && listChar) {
            if (indentation === currentIndentation) {
                sections.push(currentSection.join('\n'));
                currentSection = [line];
            } else if (options.recursive) {
                const child = sortInnerSection(lines, i, options);
                currentSection.push(child.content);
                i += child.amountAdded - 1;
            } else {
                currentSection.push(line);
            }
        } else {
            currentSection.push(line);
        }
    }

    if (currentSection) {
        sections.push(currentSection.join('\n'));
    }

    return getModifiedSections(sections, options).join('\n');
}

module.exports = sort;