通过 Panflute 的 Pandoc 过滤器未按预期工作

Question

问题

对于 Markdown 文档，我想过滤掉列表 to_keep 中 header 标题为 而不是 的所有部分。一个部分由 header 和 body 组成，直到下一部分或文档结尾。为简单起见，假设文档只有级别 1 headers.

当我对 to_keep 中当前元素前面是否有 header 进行简单的区分并执行 return None 或 return [] 时，我得到一个错误。也就是说，对于 pandoc --filter filter.py -o output.pdf input.md，我得到 TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"（代码、示例文件和最后的完整错误消息）。

我使用 Python 3.7.4 和 panflute 1.12.5 以及 pandoc 2.2.3.2。

问题

如果对何时执行 return [] 进行更细粒度的区分，它会起作用（函数 action_working）。 我的问题是，为什么需要这种更细粒度的区分？我的解决方案似乎可行，但这很可能是偶然的...我怎样才能让它正常工作？

文件

错误

Traceback (most recent call last):
  File "filter.py", line 42, in <module>
    main()
  File "filter.py", line 39, in main
    return run_filter(action_not_working, doc=doc)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
    return run_filters([action], *args, **kwargs)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
    dump(doc, output_stream=output_stream)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
    raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1

input.md

# English 
Some cool english text this is!

# Deutsch 
Dies ist die deutsche Übersetzung!

# Sources
Some source.

# Priority
**Medium** *[Low | Medium | High]*

# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*

# Interested Persons (mailing list)
- Franz, Heinz, Karl

fiter.py

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

def action_not_working(elem, doc):
    '''For every element we check if it occurs in a section we wish to keep. 
    If it is, we keep it and return None (indicating to keep the element unchanged).
    Otherwise we remove the element (return []).'''
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        return []

def action_working(elem, doc):
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        if isinstance(elem, Header):
            return []
        elif isinstance(elem, Para):
            return []
        elif isinstance(elem, BulletList):
            return []

def update_keep(elem):
    '''if the element is a header we update to_keep.'''
    global to_keep, keep_current
    if isinstance(elem, Header):
        # Keep if the title of a section is in too keep
        keep_current = stringify(elem) in to_keep


def main(doc=None):
    return run_filter(action_not_working, doc=doc) 

if __name__ == '__main__':
    main()

Answer 1

我认为发生的事情是 panflute 在所有元素上调用操作，包括 Doc 根元素。如果在遍历 Doc 元素时 keep_current 是 False，它将被列表替换。这会导致您看到错误消息，因为 panflute 期望根节点始终存在。

更新后的过滤器仅作用于Header、Para和BulletList元素，因此Doc根节点将保持不变。您可能想要使用更通用的东西，例如 isinstance(elem, Block)。

另一种方法是直接使用 panflute 的 load 和 dump 元素：将文档加载到 Doc 元素中，手动遍历 args 中的所有块并删除所有不需要的，然后将生成的文档转储回输出流。

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

doc = load()
for top_level_block in doc.args:
    # do things, remove unwanted blocks

dump(doc)

通过 Panflute 的 Pandoc 过滤器未按预期工作

Pandoc Filter via Panflute not Working as Expected

python

filter

pandoc

panflute

问题

问题

文件

错误

input.md

fiter.py