在 Markdown 中稳健地生成锚点

Question

我有一些 Ruby 代码可以在 GitHub Flavored Markdown 中自动生成目录。如果与此问题相关的差异，最好也了解其他 Markdown 风格。

目前，我的这段代码在 99% 的时间内都有效：

  def header_to_anchor
    @header
      .downcase
      .gsub(/[^a-z\d\- ]+/, "")
      .gsub(/ /, "-")
  end

这是基于我在 GitHub 评论 here 中发现的注释。上面写着：

The code that creates the anchors is here: https://github.com/jch/html-pipeline/blob/master/lib/html/pipeline/toc_filter.rb

It downcases the string

remove anything that is not a letter, number, space or hyphen (see the source for how Unicode is handled)

changes any space to a hyphen.

If that is not unique, add "-1", "-2", "-3",... to make it unique

为了我的目的，我不需要解决唯一性问题。

这很棒，直到我发现另一个失败的边缘案例，即我在降价文档中有一个标题：

### shunit2/_shared.sh

我的代码生成的锚是：

* [shunit2/_shared.sh](#shunit2sharedsh)

并创建另一个损坏的 link，至少就 GitHub Flavored Markdown 而言是这样。

我在这里也看到了 this 的回答，但是那里指定的那些规则似乎也不太可靠。

有谁知道解释这些锚点生成规则的权威文档吗？

Answer 1

好吧，这里的混淆出现了¹ GitHub 注释中提到的代码中的 Ruby 正则表达式与评论说。该代码使用此正则表达式：

PUNCTUATION_REGEXP = RUBY_VERSION > '1.9' ? /[^\p{Word}\- ]/u : /[^\w\- ]/

删除"punctuation"。 Ruby 正则表达式已记录 here。

同时，\p{Word}实际上表示字母数字加下划线。

因此，GitHub 问题中的评论，“删除任何不是字母、数字、space 或连字符的内容（请参阅源代码了解如何处理 Unicode )”是对代码的误读。

正确的规则应该是：

It downcases the string

Remove anything that is not a letter, number, space, underscore or hyphen (see the source for how Unicode is handled)

Change any space to a hyphen.

If that is not unique, add "-1", "-2", "-3",... to make it unique

¹ 当然，假设 GitHub 问题中提到的 toc_filter.rb 文件确实是 "source of truth" 而不是实现其他地方定义的规则。

在 Markdown 中稳健地生成锚点

Robustly generate anchors in Markdown

markdown

github-flavored-markdown