通过 Panflute 的 Pandoc 过滤器未按预期工作
Pandoc Filter via Panflute not Working as Expected
问题
对于 Markdown 文档,我想过滤掉列表 to_keep
中 header 标题为 而不是 的所有部分。一个部分由 header 和 body 组成,直到下一部分或文档结尾。为简单起见,假设文档只有级别 1 headers.
当我对 to_keep
中当前元素前面是否有 header 进行简单的区分并执行 return None
或 return []
时,我得到一个错误。也就是说,对于 pandoc --filter filter.py -o output.pdf input.md
,我得到 TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
(代码、示例文件和最后的完整错误消息)。
我使用 Python 3.7.4 和 panflute 1.12.5 以及 pandoc 2.2.3.2。
问题
如果对何时执行 return []
进行更细粒度的区分,它会起作用(函数 action_working
)。 我的问题是,为什么需要这种更细粒度的区分?我的解决方案 似乎 可行,但这很可能是偶然的...我怎样才能让它正常工作?
文件
错误
Traceback (most recent call last):
File "filter.py", line 42, in <module>
main()
File "filter.py", line 39, in main
return run_filter(action_not_working, doc=doc)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
return run_filters([action], *args, **kwargs)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
dump(doc, output_stream=output_stream)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1
input.md
# English
Some cool english text this is!
# Deutsch
Dies ist die deutsche Übersetzung!
# Sources
Some source.
# Priority
**Medium** *[Low | Medium | High]*
# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*
# Interested Persons (mailing list)
- Franz, Heinz, Karl
fiter.py
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
def action_not_working(elem, doc):
'''For every element we check if it occurs in a section we wish to keep.
If it is, we keep it and return None (indicating to keep the element unchanged).
Otherwise we remove the element (return []).'''
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
return []
def action_working(elem, doc):
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
if isinstance(elem, Header):
return []
elif isinstance(elem, Para):
return []
elif isinstance(elem, BulletList):
return []
def update_keep(elem):
'''if the element is a header we update to_keep.'''
global to_keep, keep_current
if isinstance(elem, Header):
# Keep if the title of a section is in too keep
keep_current = stringify(elem) in to_keep
def main(doc=None):
return run_filter(action_not_working, doc=doc)
if __name__ == '__main__':
main()
我认为发生的事情是 panflute 在 所有 元素上调用操作,包括 Doc
根元素。如果在遍历 Doc
元素时 keep_current
是 False
,它将被列表替换。这会导致您看到错误消息,因为 panflute 期望根节点始终存在。
更新后的过滤器仅作用于Header
、Para
和BulletList
元素,因此Doc
根节点将保持不变。您可能想要使用更通用的东西,例如 isinstance(elem, Block)
。
另一种方法是直接使用 panflute 的 load
和 dump
元素:将文档加载到 Doc
元素中,手动遍历 args
中的所有块并删除所有不需要的,然后将生成的文档转储回输出流。
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
doc = load()
for top_level_block in doc.args:
# do things, remove unwanted blocks
dump(doc)
问题
对于 Markdown 文档,我想过滤掉列表 to_keep
中 header 标题为 而不是 的所有部分。一个部分由 header 和 body 组成,直到下一部分或文档结尾。为简单起见,假设文档只有级别 1 headers.
当我对 to_keep
中当前元素前面是否有 header 进行简单的区分并执行 return None
或 return []
时,我得到一个错误。也就是说,对于 pandoc --filter filter.py -o output.pdf input.md
,我得到 TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
(代码、示例文件和最后的完整错误消息)。
我使用 Python 3.7.4 和 panflute 1.12.5 以及 pandoc 2.2.3.2。
问题
如果对何时执行 return []
进行更细粒度的区分,它会起作用(函数 action_working
)。 我的问题是,为什么需要这种更细粒度的区分?我的解决方案 似乎 可行,但这很可能是偶然的...我怎样才能让它正常工作?
文件
错误
Traceback (most recent call last):
File "filter.py", line 42, in <module>
main()
File "filter.py", line 39, in main
return run_filter(action_not_working, doc=doc)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
return run_filters([action], *args, **kwargs)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
dump(doc, output_stream=output_stream)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1
input.md
# English
Some cool english text this is!
# Deutsch
Dies ist die deutsche Übersetzung!
# Sources
Some source.
# Priority
**Medium** *[Low | Medium | High]*
# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*
# Interested Persons (mailing list)
- Franz, Heinz, Karl
fiter.py
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
def action_not_working(elem, doc):
'''For every element we check if it occurs in a section we wish to keep.
If it is, we keep it and return None (indicating to keep the element unchanged).
Otherwise we remove the element (return []).'''
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
return []
def action_working(elem, doc):
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
if isinstance(elem, Header):
return []
elif isinstance(elem, Para):
return []
elif isinstance(elem, BulletList):
return []
def update_keep(elem):
'''if the element is a header we update to_keep.'''
global to_keep, keep_current
if isinstance(elem, Header):
# Keep if the title of a section is in too keep
keep_current = stringify(elem) in to_keep
def main(doc=None):
return run_filter(action_not_working, doc=doc)
if __name__ == '__main__':
main()
我认为发生的事情是 panflute 在 所有 元素上调用操作,包括 Doc
根元素。如果在遍历 Doc
元素时 keep_current
是 False
,它将被列表替换。这会导致您看到错误消息,因为 panflute 期望根节点始终存在。
更新后的过滤器仅作用于Header
、Para
和BulletList
元素,因此Doc
根节点将保持不变。您可能想要使用更通用的东西,例如 isinstance(elem, Block)
。
另一种方法是直接使用 panflute 的 load
和 dump
元素:将文档加载到 Doc
元素中,手动遍历 args
中的所有块并删除所有不需要的,然后将生成的文档转储回输出流。
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
doc = load()
for top_level_block in doc.args:
# do things, remove unwanted blocks
dump(doc)