Python html-sanitizer 允许 img 标签

Python html-sanitizer allow img tag

大家好,我正在使用 html-sanitizer python 软件包,但我无法启用 img 标签,因为它默认处于禁用状态

我尝试编辑站点包中的 sanitizer.py(如下所示),但仍然没有成功。

DEFAULT_SETTINGS = {
    "tags": {
        "a",
        "h1",
        "h2",
        "h3",
        "strong",
        "em",
        "p",
        "ul",
        "ol",
        "li",
        "br",
        "sub",
        "sup",
        "hr",
        "img"
    },
    "attributes": {"a": ("href", "name", "target", "title", "id", "rel"),"img": ("src")},
    "empty": {"hr", "a", "br"},
    "separate": {"a", "p", "li"},
    "whitespace": {"br"},
    "add_nofollow": False,
    "autolink": False,
    "sanitize_href": sanitize_href,
    "element_preprocessors": [
        # convert span elements into em/strong if a matching style rule
        # has been found. strong has precedence, strong & em at the same
        # time is not supported
        bold_span_to_strong,
        italic_span_to_em,
        tag_replacer("b", "strong"),
        tag_replacer("i", "em"),
        tag_replacer("form", "p"),
        target_blank_noopener,
    ],
    "element_postprocessors": [],
}

谁能帮帮我。我想要 img 标签 只有 src 属性

如果在 settings={} 参数上初始化 Sanitizer() 时提供了不同的设置,Sanitizer 将不会使用 DEFAULT_SETTINGS。这可能会发生在这里,但我怀疑这是错误的 empty 属性。

sanitizer 删除空标签,例如 <em></em> 被清理为 ''。这很好,但是 <img .../> 也会导致一个空标签(即没有子标签),所以消毒剂会清除它。

您需要将 img 添加到 settings['empty'] 集,以及当前的 {"hr", "a", "br"}

当您使用它时,不要编辑 DEFAULT,而是定义您自己的(使用 DEFAULT 的副本)。例如:

# Make a copy
my_settings = dict(html_sanitizer.sanitizer.DEFAULT_SETTINGS)

# Add your changes
my_settings['tags'].add('img')
my_settings['empty'].add('img')
my_settings['attributes'].update({'img': ('src', )})

# Use it
s = html_sanitizer.Sanitizer(settings=my_settings)
s.sanitize('<em><img src="/index.html"/></em>')