Bash 在 TeX 文档中将英式英语转换为美式拼写的脚本/实用程序

Question

我正在寻找一个快速 Bash 脚本来将 TeX 文档中的英国/新西兰拼写转换为美国拼写（用于与美国学者合作和期刊提交）。这是一篇正式的数学生物学论文，几乎没有区域术语或语法：之前的工作以公式而不是引号的形式给出。

例如，

Generalise -> Generalize

Colour -> Color

Centre -> Centre

图中必须有基于 sed 或 awk 的脚本来替代大多数常见的拼写差异。

有关详细信息，请参阅相关的 TeX 论坛问题。

https://tex.stackexchange.com/questions/312138/converting-uk-to-us-spellings

n.b。我目前在 Ubuntu 16.04 或 Elementary OS 0.3 Freya 上使用 kile 编译 PDFLaTeX，但如果其他地方有内置修复程序，我可以使用另一个 TeX compiler/package。

感谢您的协助。

Answer 1

我认为您需要随身携带一份替换列表，并调用它进行翻译。您必须丰富您的词典文件才能有效地翻译文本文件。

sourceFile=
dict=

while read line
    do
     word=$(echo $line |awk '{print }')
     updatedWord=$(grep -i $word $dict|awk '{print }')

     sed -i "s/$word/$updatedWord/g" $sourceFile 2 > /dev/null

   done < $dict

运行上面的脚本如：

./scriptName source.txt dictionary.txt

这是我使用的一本示例词典：

>cat dict
characterize characterise
prioritize prioritise
specialize specialise
analyze analyse
catalyze catalyse
size size
exercise exercise
behavior behaviour
color colour
favor favour
contour contour
center centre
fiber fibre
liter litre
parameter parameter
ameba amoeba
anesthesia anaesthesia
diarrhea diarrhoea
esophagus oesophagus
leukemia leukaemia
cesium caesium
defense defence
practice  practice
license  licence
defensive defensive
advice  advice
aging ageing
acknowledgment acknowledgement
judgment judgement
analog analogue
dialog dialogue
fulfill fulfil
enroll enrol
skill, skillful skill, skilful
labeled labelled
signaling signalling
propelled propelled
revealing revealing

执行结果：

cat source
color of this fiber is great and we should analyze it.

./ScriptName source.txt dict.txt

cat source
colour of this fibre is great and we should analyse it.

Answer 2

这是我使用 awk 的解决方案，我认为它比 sed 更灵活。这个程序。保留 LaTeX 命令（当单词以“\”开头时），它将保留单词的第一个大写字母。 LaTeX 命令（和普通文本）的参数将由字典文件代替。当 [rev] 程序的第三个参数打开时，它将通过相同的字典文件进行反向替换。任何非 alpha-beta 字符都用作单词分隔符（这在 LaTeX 源文件中是必需的）。 prg 将其输出写入屏幕 (stdout)，因此您需要使用重定向到文件 (>output_f)。（我认为你的 LaTeX 源的输入编码是 1 byte/char。）

> cat dic.sh
#!/bin/bash
(($#<2))&& { echo "Usage [=10=] dictionary_file latex_file [rev]"; exit 1; }
((d= $#==3 ? 0:1))
awk -v d=$d '
 BEGIN {cm=fx=0; fn="";}
 fn!=FILENAME {fx++; fn=FILENAME;}
 fx==1 {if(!NF)next; if(d)a[]=; else a[]=; next;} #read dict or rev dict file into an associative array
 fx==2 { for(i=1; i<=length([=10=]); i++)
            {c=substr([=10=],i,1);                            #read characters from a given line of LaTeX source    
             if(cm){printf("%s",c); if(c~"[^A-Za-z0-9\\]")cm=0;}  #LaTeX command is occurred
             else if(c~"[A-Za-z]")w=w c; else{pr(); printf("%s",c); if(c=="\")cm=1;} #collect alpha-bets or handle them
            }
         pr(); printf("\n");                              #handle collected last word in the line 
       }
function pr(  s){   # print collected word or its substitution by dictionary and recreates first letter case
   if(!length(w))return;
   s=tolower(w);
   if(!(s in a))printf("%s",w);
   else printf("%s", s==w ? a[s] : toupper(substr(a[s],1,1)) substr(a[s],2));
   w="";}
'

词典文件：

> cat dictionary
apple      lemon
raspberry  cherry
pear       banana

输入 LaTeX 源代码：

> cat src.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].

Raspberry12Apple,pear.

执行结果：

> ./dic.sh 
Usage ./dic.sh dictionary_file latex_file [rev]

> ./dic.sh dictionary src.txt >out1.txt; cat out1.txt
Lemon123banana,lemon "banana".
\Apple123pear{cherry}{banana}[lemon].

Cherry12Lemon,banana.

> ./dic.sh dictionary out1.txt >out2.txt rev; cat out2.txt
Apple123pear,apple "pear".
\Apple123pear{raspberry}{pear}[apple].

Raspberry12Apple,pear.

> diff src.txt out2.txt   # they are identical

Bash 在 TeX 文档中将英式英语转换为美式拼写的脚本/实用程序

Bash script / utility to convert UK English to US spellings in TeX document

bash

awk

latex

sed

spelling