通过 Panflute 的 Pandoc 过滤器未按预期工作

Pandoc Filter via Panflute not Working as Expected

问题

对于 Markdown 文档,我想过滤掉列表 to_keep 中 header 标题为 而不是 的所有部分。一个部分由 header 和 body 组成,直到下一部分或文档结尾。为简单起见,假设文档只有级别 1 headers.

当我对 to_keep 中当前元素前面是否有 header 进行简单的区分并执行 return Nonereturn [] 时,我得到一个错误。也就是说,对于 pandoc --filter filter.py -o output.pdf input.md,我得到 TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"(代码、示例文件和最后的完整错误消息)。

我使用 Python 3.7.4 和 panflute 1.12.5 以及 pandoc 2.2.3.2。

问题

如果对何时执行 return [] 进行更细粒度的区分,它会起作用(函数 action_working)。 我的问题是,为什么需要这种更细粒度的区分?我的解决方案 似乎 可行,但这很可能是偶然的...我怎样才能让它正常工作?

文件

错误

Traceback (most recent call last):
  File "filter.py", line 42, in <module>
    main()
  File "filter.py", line 39, in main
    return run_filter(action_not_working, doc=doc)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
    return run_filters([action], *args, **kwargs)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
    dump(doc, output_stream=output_stream)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
    raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1

input.md

# English 
Some cool english text this is!

# Deutsch 
Dies ist die deutsche Übersetzung!

# Sources
Some source.

# Priority
**Medium** *[Low | Medium | High]*

# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*

# Interested Persons (mailing list)
- Franz, Heinz, Karl

fiter.py

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

def action_not_working(elem, doc):
    '''For every element we check if it occurs in a section we wish to keep. 
    If it is, we keep it and return None (indicating to keep the element unchanged).
    Otherwise we remove the element (return []).'''
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        return []

def action_working(elem, doc):
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        if isinstance(elem, Header):
            return []
        elif isinstance(elem, Para):
            return []
        elif isinstance(elem, BulletList):
            return []

def update_keep(elem):
    '''if the element is a header we update to_keep.'''
    global to_keep, keep_current
    if isinstance(elem, Header):
        # Keep if the title of a section is in too keep
        keep_current = stringify(elem) in to_keep


def main(doc=None):
    return run_filter(action_not_working, doc=doc) 

if __name__ == '__main__':
    main()

我认为发生的事情是 panflute 在 所有 元素上调用操作,包括 Doc 根元素。如果在遍历 Doc 元素时 keep_currentFalse,它将被列表替换。这会导致您看到错误消息,因为 panflute 期望根节点始终存在。

更新后的过滤器仅作用于HeaderParaBulletList元素,因此Doc根节点将保持不变。您可能想要使用更通用的东西,例如 isinstance(elem, Block)


另一种方法是直接使用 panflute 的 loaddump 元素:将文档加载到 Doc 元素中,手动遍历 args 中的所有块并删除所有不需要的,然后将生成的文档转储回输出流。

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

doc = load()
for top_level_block in doc.args:
    # do things, remove unwanted blocks

dump(doc)