通过 python 的 pandoc 库从 markdown 获取 h1
Getting h1 from markdown via python's pandoc library
我正在编写一个 python 批处理脚本来处理许多降价文件以获得类似 h1 的文本以生成 'title' 元数据变量(我忘记将 'title' 添加到 frontmatter 中) .我没有将其用作 pandoc 过滤器。
因此我想通过 pandoc-python 处理这些文件,但我对此不熟悉,我不知道如何只获取 h1。
content = pandoc.read(post.content)
'content' 是 pandoc 原生格式。我看到了这样的东西
(Pdb) content
Pandoc(Meta({}), [Header(1, ('foobar', [], []), [Str('foobar:')]), Para(...
我想将 h1 设为简单文本。
人们也可以尝试配置 pandoc 来为我们做这件事。以下是手册中关于 --shift-heading-level-by
选项的内容:
--shift-heading-level-by=
NUMBER
Shift heading levels by a positive or negative integer.
For example, with --shift-heading-level-by=-1
, level 2
headings become level 1 headings, and level 3 headings
become level 2 headings. Headings cannot have a level
less than 1, so a heading that would be shifted below level 1
becomes a regular paragraph. Exception: with a shift of -N,
a level-N heading at the beginning of the document
replaces the metadata title. --shift-heading-level-by=-1
is a good choice when converting HTML or Markdown documents that
use an initial level-1 heading for the document title and
level-2+ headings for sections. --shift-heading-level-by=1
may be a good choice for converting Markdown documents that
use level-1 headings for sections to HTML, since pandoc uses
a level-1 heading to render the document title.
所以 运行 带有 --shift-heading-level-by=-1
的 pandoc 可能足以满足您的需求。
我有以下代码片段适用于 headers 和 #
或 =======
。
import pandoc
from pandoc.types import *
with open('README.md') as f:
content = pandoc.read(f.read())
# But you can use your content.
headers = []
for elt in pandoc.iter(content):
if isinstance(elt, Header):
if elt[0] == 1: # this is header 1, remove this if statement if you want all headers.
headers.append(elt[1][0])
或者,如果您想要包含大写字母等的确切字符串:
for elt in pandoc.iter(content):
if isinstance(elt, Header):
if elt[0] == 1: # this is header 1, remove this if statement if you want all headers.
header.append(pandoc.write(elt[-1]).strip())
我正在编写一个 python 批处理脚本来处理许多降价文件以获得类似 h1 的文本以生成 'title' 元数据变量(我忘记将 'title' 添加到 frontmatter 中) .我没有将其用作 pandoc 过滤器。
因此我想通过 pandoc-python 处理这些文件,但我对此不熟悉,我不知道如何只获取 h1。
content = pandoc.read(post.content)
'content' 是 pandoc 原生格式。我看到了这样的东西
(Pdb) content
Pandoc(Meta({}), [Header(1, ('foobar', [], []), [Str('foobar:')]), Para(...
我想将 h1 设为简单文本。
人们也可以尝试配置 pandoc 来为我们做这件事。以下是手册中关于 --shift-heading-level-by
选项的内容:
--shift-heading-level-by=
NUMBERShift heading levels by a positive or negative integer. For example, with
--shift-heading-level-by=-1
, level 2 headings become level 1 headings, and level 3 headings become level 2 headings. Headings cannot have a level less than 1, so a heading that would be shifted below level 1 becomes a regular paragraph. Exception: with a shift of -N, a level-N heading at the beginning of the document replaces the metadata title.--shift-heading-level-by=-1
is a good choice when converting HTML or Markdown documents that use an initial level-1 heading for the document title and level-2+ headings for sections.--shift-heading-level-by=1
may be a good choice for converting Markdown documents that use level-1 headings for sections to HTML, since pandoc uses a level-1 heading to render the document title.
所以 运行 带有 --shift-heading-level-by=-1
的 pandoc 可能足以满足您的需求。
我有以下代码片段适用于 headers 和 #
或 =======
。
import pandoc
from pandoc.types import *
with open('README.md') as f:
content = pandoc.read(f.read())
# But you can use your content.
headers = []
for elt in pandoc.iter(content):
if isinstance(elt, Header):
if elt[0] == 1: # this is header 1, remove this if statement if you want all headers.
headers.append(elt[1][0])
或者,如果您想要包含大写字母等的确切字符串:
for elt in pandoc.iter(content):
if isinstance(elt, Header):
if elt[0] == 1: # this is header 1, remove this if statement if you want all headers.
header.append(pandoc.write(elt[-1]).strip())