Lua 中的 Pandoc 过滤器将跨度插入现有字符串

Pandoc filter in Lua to insert span into existing string

我正在为 pandoc 编写一个 Lua 过滤器,它将词汇表功能添加到降价文件的 HTML 输出。目标是将鼠标悬停文本添加到文档中每次出现的首字母缩略词或键定义。

我希望能够包含出现在列表中的首字母缩略词(被标点符号包围),而不是字母(例如,CO 不会在诸如钴的单词中突出显示)。

我的 MWE 在此计数上失败,因为 Pandoc AST 中的字符串包含相邻的标点符号(例如 Str "CO/DBP/SBP,"Str "CO,",Space,Str "SBP,")。

-- # MWE
-- Parse glossary file (summarised here for brevity)
local glossary = {CO = "Cardiac Output", DBP = "Diastolic Blood Pressure", SBP = "Systolic Blood Pressure"}

-- Substitute glossary term for span with a mouseover link
function Str(elem)
  for key, value in next, glossary do
    if elem.text == key then
      return pandoc.Span (key, {title = value, class = "glossary"})
    end
  end
end

我玩过 string.substring.find 但无法获得任何可行的东西,主要是因为我不确定如何返回新的 Span 和Str(减去它的新 Span)。如有任何帮助,我们将不胜感激!


我的测试降价包含:

# Acronyms: SBP, DBP & CO

Spaced acronyms: CO and SBP and DBP.

In a comma-separated list: CO, SBP, DBP; with backslashes; CO/DBP/SBP, and in bullet points:
  
* CO
* SBP
* DBP

您可以 return 一个包含多个元素的 table。我的想法是寻找第一个分隔符,然后用 span 替换词汇表条目:

-- Parse glossary file (summarised here for brevity)
local glossary = {CO = "Cardiac Output", DBP = "Diastolic Blood Pressure", SBP = "Systolic Blood Pressure"}

local Set = function(list)
    local set = {}
    for i,v in ipairs(list) do
        set[v] = true
    end
    return set
end

local findSeparator = function(text)
    local separator = Set{",", "/", " "}
    for i = 1, #text do
        local s = string.sub(text,i,i)
        if separator[s] then
            return s
        end
    end
end

local separatedList = function(text)
    local found
    local t = {}
    local separator = findSeparator(text)
    if not separator then return end
    for abb in string.gmatch(text, "%P+") do
        if glossary[abb] then
            found = true
            t[#t+1] = pandoc.Span(abb, {title = abb, class = "glossary"})
            t[#t+1] = pandoc.Str(separator)
        end
    end
    if found then
        -- remove last separator if there are more then one elements in the list
        -- because otherwise the seperator is part of the element and needs to stay
        if #t > 2 then t[#t] = nil end
        return t
    end
end

local glossarize = {
    Str = function(el)
        if glossary[el.text] then
            return pandoc.Span(el.text, {title = glossary[el.text], class = "glossary"})
        else
            return separatedList(el.text)
        end
    end
}

function Pandoc(doc)
    local div = pandoc.Div(doc.blocks)
    local blocks = pandoc.walk_block(div, glossarize).content
    return pandoc.Pandoc(blocks, doc.meta)
end