Lua 中的 Pandoc 过滤器将跨度插入现有字符串
Pandoc filter in Lua to insert span into existing string
我正在为 pandoc 编写一个 Lua 过滤器,它将词汇表功能添加到降价文件的 HTML 输出。目标是将鼠标悬停文本添加到文档中每次出现的首字母缩略词或键定义。
我希望能够包含出现在列表中的首字母缩略词(被标点符号包围),而不是字母(例如,CO 不会在诸如钴的单词中突出显示)。
我的 MWE 在此计数上失败,因为 Pandoc AST 中的字符串包含相邻的标点符号(例如 Str "CO/DBP/SBP,"
或 Str "CO,",Space,Str "SBP,"
)。
-- # MWE
-- Parse glossary file (summarised here for brevity)
local glossary = {CO = "Cardiac Output", DBP = "Diastolic Blood Pressure", SBP = "Systolic Blood Pressure"}
-- Substitute glossary term for span with a mouseover link
function Str(elem)
for key, value in next, glossary do
if elem.text == key then
return pandoc.Span (key, {title = value, class = "glossary"})
end
end
end
我玩过 string.sub
和 string.find
但无法获得任何可行的东西,主要是因为我不确定如何返回新的 Span 和Str(减去它的新 Span)。如有任何帮助,我们将不胜感激!
我的测试降价包含:
# Acronyms: SBP, DBP & CO
Spaced acronyms: CO and SBP and DBP.
In a comma-separated list: CO, SBP, DBP; with backslashes; CO/DBP/SBP, and in bullet points:
* CO
* SBP
* DBP
您可以 return 一个包含多个元素的 table。我的想法是寻找第一个分隔符,然后用 span 替换词汇表条目:
-- Parse glossary file (summarised here for brevity)
local glossary = {CO = "Cardiac Output", DBP = "Diastolic Blood Pressure", SBP = "Systolic Blood Pressure"}
local Set = function(list)
local set = {}
for i,v in ipairs(list) do
set[v] = true
end
return set
end
local findSeparator = function(text)
local separator = Set{",", "/", " "}
for i = 1, #text do
local s = string.sub(text,i,i)
if separator[s] then
return s
end
end
end
local separatedList = function(text)
local found
local t = {}
local separator = findSeparator(text)
if not separator then return end
for abb in string.gmatch(text, "%P+") do
if glossary[abb] then
found = true
t[#t+1] = pandoc.Span(abb, {title = abb, class = "glossary"})
t[#t+1] = pandoc.Str(separator)
end
end
if found then
-- remove last separator if there are more then one elements in the list
-- because otherwise the seperator is part of the element and needs to stay
if #t > 2 then t[#t] = nil end
return t
end
end
local glossarize = {
Str = function(el)
if glossary[el.text] then
return pandoc.Span(el.text, {title = glossary[el.text], class = "glossary"})
else
return separatedList(el.text)
end
end
}
function Pandoc(doc)
local div = pandoc.Div(doc.blocks)
local blocks = pandoc.walk_block(div, glossarize).content
return pandoc.Pandoc(blocks, doc.meta)
end
我正在为 pandoc 编写一个 Lua 过滤器,它将词汇表功能添加到降价文件的 HTML 输出。目标是将鼠标悬停文本添加到文档中每次出现的首字母缩略词或键定义。
我希望能够包含出现在列表中的首字母缩略词(被标点符号包围),而不是字母(例如,CO 不会在诸如钴的单词中突出显示)。
我的 MWE 在此计数上失败,因为 Pandoc AST 中的字符串包含相邻的标点符号(例如 Str "CO/DBP/SBP,"
或 Str "CO,",Space,Str "SBP,"
)。
-- # MWE
-- Parse glossary file (summarised here for brevity)
local glossary = {CO = "Cardiac Output", DBP = "Diastolic Blood Pressure", SBP = "Systolic Blood Pressure"}
-- Substitute glossary term for span with a mouseover link
function Str(elem)
for key, value in next, glossary do
if elem.text == key then
return pandoc.Span (key, {title = value, class = "glossary"})
end
end
end
我玩过 string.sub
和 string.find
但无法获得任何可行的东西,主要是因为我不确定如何返回新的 Span 和Str(减去它的新 Span)。如有任何帮助,我们将不胜感激!
我的测试降价包含:
# Acronyms: SBP, DBP & CO
Spaced acronyms: CO and SBP and DBP.
In a comma-separated list: CO, SBP, DBP; with backslashes; CO/DBP/SBP, and in bullet points:
* CO
* SBP
* DBP
您可以 return 一个包含多个元素的 table。我的想法是寻找第一个分隔符,然后用 span 替换词汇表条目:
-- Parse glossary file (summarised here for brevity)
local glossary = {CO = "Cardiac Output", DBP = "Diastolic Blood Pressure", SBP = "Systolic Blood Pressure"}
local Set = function(list)
local set = {}
for i,v in ipairs(list) do
set[v] = true
end
return set
end
local findSeparator = function(text)
local separator = Set{",", "/", " "}
for i = 1, #text do
local s = string.sub(text,i,i)
if separator[s] then
return s
end
end
end
local separatedList = function(text)
local found
local t = {}
local separator = findSeparator(text)
if not separator then return end
for abb in string.gmatch(text, "%P+") do
if glossary[abb] then
found = true
t[#t+1] = pandoc.Span(abb, {title = abb, class = "glossary"})
t[#t+1] = pandoc.Str(separator)
end
end
if found then
-- remove last separator if there are more then one elements in the list
-- because otherwise the seperator is part of the element and needs to stay
if #t > 2 then t[#t] = nil end
return t
end
end
local glossarize = {
Str = function(el)
if glossary[el.text] then
return pandoc.Span(el.text, {title = glossary[el.text], class = "glossary"})
else
return separatedList(el.text)
end
end
}
function Pandoc(doc)
local div = pandoc.Div(doc.blocks)
local blocks = pandoc.walk_block(div, glossarize).content
return pandoc.Pandoc(blocks, doc.meta)
end