OpenXML 标签搜索
OpenXML tag search
我正在编写一个 .NET 应用程序,它应该读取接近 200 页长的 .docx 文件(通过 DocumentFormat.OpenXML 2.5)以查找文档应包含的某些标签的所有出现。
明确地说,我不是在寻找 OpenXML 标签,而是应该由文档编写者设置到文档中的标签,作为我需要在第二阶段填写的值的占位符。
(其中 TAG 可以是任意字符序列)。
正如我所说,我必须找到此类标签的所有出现,加上(如果可能的话)找到标签出现的 'page'。
我在网上发现了一些东西,但不止一次基本方法是将文件的所有内容转储到一个字符串中,然后在不考虑 .docx 编码的情况下查看该字符串。这导致误报或根本不匹配(而测试 .docx 文件包含多个标签),其他示例可能有点超出我对 OpenXML 的了解。
可以在整个文档中找到该标签(在 table、文本、段落以及页眉和页脚中)。
我正在 Visual Studio 2013 .NET 4.5 中编码,但如果需要我可以回来。
P.S。我更喜欢不使用 Office Interop API 的代码,因为目标平台不会 运行 Office。
我能生成的最小 .docx 示例将其存储在文档中
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="" xmlns:mc="" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="" xmlns:m="" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="" xmlns:wp="" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="" xmlns:w14="" xmlns:w15="" xmlns:wpg="" xmlns:wpi="" xmlns:wne="" xmlns:wps="" mc:Ignorable="w14 w15 wp14">
<w:p w:rsidR="00CA7780" w:rsidRDefault="00815E5D">
<w:lang w:val="en-GB"/>
<w:lang w:val="en-GB"/>
<w:p w:rsidR="00815E5D" w:rsidRDefault="00815E5D">
<w:lang w:val="en-GB"/>
<w:proofErr w:type="gramStart"/>
<w:lang w:val="en-GB"/>
<w:proofErr w:type="gramEnd"/>
<w:lang w:val="en-GB"/>
<w:p w:rsidR="00815E5D" w:rsidRPr="00815E5D" w:rsidRDefault="00815E5D">
<w:lang w:val="en-GB"/>
<w:lang w:val="en-GB"/>
<w:bookmarkStart w:id="0" w:name="_GoBack"/>
<w:bookmarkEnd w:id="0"/>
<w:sectPr w:rsidR="00815E5D" w:rsidRPr="00815E5D">
<w:pgSz w:w="11906" w:h="16838"/>
<w:pgMar w:top="1417" w:right="1134" w:bottom="1134" w:left="1134" w:header="708" w:footer="708" w:gutter="0"/>
<w:cols w:space="708"/>
<w:docGrid w:linePitch="360"/>
不确定 SDK 是否更好,但这可以工作并生成一个字典,其中包含标签的名称和一个您可以将新值设置为的元素:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Xml.Linq;
namespace ConsoleApplication8
class Program
static void Main(string[] args)
Dictionary<string, XElement> lookupTable = new Dictionary<string, XElement>();
Regex reg = new Regex(@"\<\!(?<TagName>.*)\!\>");
XDocument doc = XDocument.Load("document.xml");
XNamespace ns = doc.Root.GetNamespaceOfPrefix("w");
IEnumerable<XElement> elements = doc.Root.Descendants(ns + "t").Where(x=> x.Value.StartsWith("<!")).ToArray();
foreach (var item in elements)
#region remove the grammar tag
XElement grammar = item.Parent.PreviousNode as XElement;
grammar = item.Parent.NextNode as XElement;
#region merge the two nodes and insert the name and the XElement to the dictionary
XElement next = (item.Parent.NextNode as XElement).Element(ns + "t");
string totalTagName = string.Format("{0}{1}", item.Value, next.Value);
item.Value = totalTagName;
lookupTable.Add(reg.Match(totalTagName).Groups["TagName"].Value, item);
foreach (var item in lookupTable)
Console.WriteLine("The document contains a tag {0}" , item.Key);
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml.Linq;
using System.IO.Compression; //you will have to add a reference to System.IO.Compression.FileSystem(.dll)
using System.IO;
using System.Text.RegularExpressions;
namespace ConsoleApplication28
public class MyWordDocument
#region fields
private string fileName;
private XDocument document;
//todo: create fields for all document xml files that can contain the placeholders
private Dictionary<string, List<XElement>> lookUpTable;
#region properties
public IEnumerable<string> Tags { get { return lookUpTable.Keys; } }
#region construction
public MyWordDocument(string fileName)
this.fileName = fileName;
#region methods
public void ReplaceTagWithValue(string tagName, string value)
foreach (var item in lookUpTable[tagName])
item.Value = item.Value.Replace(string.Format(@"<!{0}!>", tagName),value);
public void Save(string fileName)
//todo: save other parts of document here i.e. footer header or other stuff
ZipFile.CreateFromDirectory("temp", fileName);
private void CreateLookUp()
//todo: make this work for all cases and for all files that can contain the placeholders
//tip: open the raw document in word and replace the tags,
// save the file to different location and extract the xmlfiles of both versions and compare to see what you have to do
lookUpTable = new Dictionary<string, List<XElement>>();
Regex reg = new Regex(@"\<\!(?<TagName>.*)\!\>");
document = XDocument.Load(@"temp\word\document.xml");
XNamespace ns = document.Root.GetNamespaceOfPrefix("w");
IEnumerable<XElement> elements = document.Root.Descendants(ns + "t").Where(NodeGotSplitUpIn2PartsDueToGrammarCheck).ToArray();
foreach (var item in elements)
XElement grammar = item.Parent.PreviousNode as XElement;
grammar = item.Parent.NextNode as XElement;
XElement next = (item.Parent.NextNode as XElement).Element(ns + "t");
string totalTagName = string.Format("{0}{1}", item.Value, next.Value);
item.Value = totalTagName;
string tagName = reg.Match(totalTagName).Groups["TagName"].Value;
if (lookUpTable.ContainsKey(tagName))
lookUpTable.Add(tagName, new List<XElement> { item });
private bool NodeGotSplitUpIn2PartsDueToGrammarCheck(XElement node)
XNamespace ns = node.Document.Root.GetNamespaceOfPrefix("w");
return node.Value.StartsWith("<!") && ((XElement)node.Parent.PreviousNode).Name == ns + "proofErr";
private void ExtractDocument()
if (!Directory.Exists("temp"))
ZipFile.ExtractToDirectory(fileName, "temp");
class Program
static void Main(string[] args)
MyWordDocument doc = new MyWordDocument("somedoc.docx"); //todo: fix path
foreach (string name in doc.Tags) //name would be the extracted name from the placeholder
doc.ReplaceTagWithValue(name, "Example");
doc.Save("output.docx"); //todo: fix path
尝试查找标签的问题在于,单词并不总是以它们在 Word 中显示的格式存在于基础 XML 中。例如,在您的示例 XML 中,<!TAG1!>
<w:lang w:val="en-GB"/>
<w:proofErr w:type="gramEnd"/>
<w:lang w:val="en-GB"/>
处理此问题的一种方法是找到 Paragraph
的 InnerText
并将其与您的 Regex
进行比较。 InnerText
属性 将 return 段落的纯文本,没有任何格式或基础文档中的其他 XML 妨碍。
有了标签后,下一个问题就是替换文本。由于上述原因,您不能只用一些新文本替换 InnerText
,因为不清楚文本的哪些部分属于 Run
。最简单的方法是删除任何现有的 Run
并添加一个新的 Run
和包含新文本的 Text
private static void ReplaceTags(string filename)
Regex regex = new Regex("<!(.)*?!>", RegexOptions.Compiled);
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(filename, true))
//grab the header parts and replace tags there
foreach (HeaderPart headerPart in wordDocument.MainDocumentPart.HeaderParts)
ReplaceParagraphParts(headerPart.Header, regex);
//now do the document
ReplaceParagraphParts(wordDocument.MainDocumentPart.Document, regex);
//now replace the footer parts
foreach (FooterPart footerPart in wordDocument.MainDocumentPart.FooterParts)
ReplaceParagraphParts(footerPart.Footer, regex);
private static void ReplaceParagraphParts(OpenXmlElement element, Regex regex)
foreach (var paragraph in element.Descendants<Paragraph>())
Match match = regex.Match(paragraph.InnerText);
if (match.Success)
//create a new run and set its value to the correct text
//this must be done before the child runs are removed otherwise
//paragraph.InnerText will be empty
Run newRun = new Run();
newRun.AppendChild(new Text(paragraph.InnerText.Replace(match.Value, "some new value")));
//remove any child runs
//add the newly created run
上述方法的一个缺点是您可能拥有的任何样式都将丢失。这些可以从现有的 Run
中复制,但如果有多个 Run
具有不同的属性,您将需要确定哪些需要复制到哪里。如果需要的话,没有什么可以阻止您在上面的代码中创建多个 Run
除了我想使用 ${...}
条目而不是 <!...!>
以下代码适用于 xml 以及打开的 xml 节点。我使用 xml 测试了代码,因为当涉及到 word 文档时,很难控制 word 如何排列段落、运行和文本元素。我想这不是不可能,但这样我有更多的控制权:
static void Main(string[] args)
//FillInValues(FileName("test01.docx"), FileName("test01_out.docx"));
string[,] tests =
{ "<r><t>${abc</t><t>}$</t><t>{tha}</t></r>", "<r><t>ABC</t><t>THA</t><t></t></r>"},
{ "<r><t>$</t><t>{</t><t>abc</t><t>}</t></r>", "<r><t>ABC</t><t></t></r>"},
{"<r><t>${abc}</t></r>", "<r><t>ABC</t></r>" },
{"<r><t>x${abc}</t></r>", "<r><t>xABC</t></r>" },
{"<r><t>x${abc}y</t></r>", "<r><t>xABCy</t></r>" },
{"<r><t>x${abc}${tha}z</t></r>", "<r><t>xABCTHAz</t></r>" },
{"<r><t>x${abc}u${tha}z</t></r>", "<r><t>xABCuTHAz</t></r>" },
{"<r><t>x${ab</t><t>c}u</t></r>", "<r><t>xABC</t><t>u</t></r>" },
{"<r><t>x${ab</t><t>yupeekaiiei</t><t>c}u</t></r>", "<r><t>xABYUPEEKAIIEIC</t><t>u</t></r>" },
{"<r><t>x${ab</t><t>yupeekaiiei</t><t>}</t></r>", "<r><t>xABYUPEEKAIIEI</t><t></t></r>" },
for (int i = 0; i < tests.GetLength(0); i++)
string value = tests[i, 0];
string expectedValue = tests[i, 1];
string actualValue = Test(value);
Console.WriteLine($"{value} => {actualValue} == {expectedValue} = {actualValue == expectedValue}");
public interface ITextReplacer
string ReplaceValue(string value);
public class DefaultTextReplacer : ITextReplacer
public string ReplaceValue(string value) { return $"{value.ToUpper()}"; }
public interface ITextElement
string Value { get; set; }
void RemoveFromParent();
public class XElementWrapper : ITextElement
private XElement _element;
public XElementWrapper(XElement element) { _element = element; }
string ITextElement.Value
get { return _element.Value; }
set { _element.Value = value; }
public XElement Element
get { return _element; }
set { _element = value; }
public void RemoveFromParent()
public class OpenXmlTextWrapper : ITextElement
private Text _text;
public OpenXmlTextWrapper(Text text) { _text = text; }
public string Value
get { return _text.Text; }
set { _text.Text = value; }
public Text Text
get { return _text; }
set { _text = value; }
public void RemoveFromParent() { _text.Remove(); }
private static void FillInValues(string sourceFileName, string destFileName)
File.Copy(sourceFileName, destFileName, true);
using (WordprocessingDocument doc =
WordprocessingDocument.Open(destFileName, true))
var body = doc.MainDocumentPart.Document.Body;
var paras = body.Descendants<Paragraph>();
SimpleStateMachine stateMachine = new SimpleStateMachine();
//stateMachine.TextReplacer = <your implementation object >
ProcessParagraphs(paras, stateMachine);
private static void ProcessParagraphs(IEnumerable<Paragraph> paras, SimpleStateMachine stateMachine)
foreach (var para in paras)
foreach (var run in para.Elements<Run>())
//Console.WriteLine("New run:");
var texts = run.Elements<Text>().ToArray();
for (int k = 0; k < texts.Length; k++)
OpenXmlTextWrapper wrapper = new OpenXmlTextWrapper(texts[k]);
public class SimpleStateMachine
// 0 - outside - initial state
// 1 - $ matched
// 2 - ${ matched
// 3 - } - final state
// 0 -> 1 $
// 0 -> 0 anything other than $
// 1 -> 2 {
// 1 -> 0 anything other than {
// 2 -> 3 }
// 2 -> 2 anything other than }
// 3 -> 0
public ITextReplacer TextReplacer { get; set; } = new DefaultTextReplacer();
public int State { get; set; } = 0;
public List<ITextElement> TextsList { get; } = new List<ITextElement>();
public StringBuilder Buffer { get; } = new StringBuilder();
/// <summary>
/// The index inside the Text element where the $ is found
/// </summary>
public int Position { get; set; }
public void Reset()
State = 0;
public void Add(ITextElement text)
if (TextsList.Count == 0 || TextsList.Last() != text)
public void HandleText(ITextElement text)
// Scan the characters
for (int i = 0; i < text.Value.Length; i++)
char c = text.Value[i];
switch (State)
case 0:
if (c == '$')
State = 1;
Position = i;
case 1:
if (c == '{')
State = 2;
case 2:
if (c == '}')
Console.WriteLine("Found: " + Buffer);
// We are on the final State
// I will use the first text in the stack and discard the others
// Here I am going to distinguish between whether I have only one item or more
if (TextsList.Count == 1)
// Happy path - we have only one item - set the replacement value and then continue scanning
string prefix = TextsList[0].Value.Substring(0, Position) + TextReplacer.ReplaceValue(Buffer.ToString());
// Set the current index to point to the end of the prefix.The program will continue to with the next items
TextsList[0].Value = prefix + TextsList[0].Value.Substring(i + 1);
i = prefix.Length - 1;
// We have more than one item - discard the inbetweeners
for (int j = 1; j < TextsList.Count - 1; j++)
// I will set the value under the first Text item where the $ was found
TextsList[0].Value = TextsList[0].Value.Substring(0, Position) + TextReplacer.ReplaceValue(Buffer.ToString());
// Set the text for the current item to the remaining chars
text.Value = text.Value.Substring(i + 1);
i = -1;
public static string Test(string xml)
XElement root = XElement.Parse(xml);
SimpleStateMachine stateMachine = new SimpleStateMachine();
foreach (XElement element in root.Descendants()
.Where(desc => !desc.Elements().Any()))
XElementWrapper wrapper = new XElementWrapper(element);
return root.ToString(SaveOptions.DisableFormatting);
更新:当 ${...}
占位符放在 table 中时,代码不起作用。这是扫描文档的代码(FillInValues 函数)的问题。
我正在编写一个 .NET 应用程序,它应该读取接近 200 页长的 .docx 文件(通过 DocumentFormat.OpenXML 2.5)以查找文档应包含的某些标签的所有出现。 明确地说,我不是在寻找 OpenXML 标签,而是应该由文档编写者设置到文档中的标签,作为我需要在第二阶段填写的值的占位符。 此类标签应采用以下格式:
(其中 TAG 可以是任意字符序列)。 正如我所说,我必须找到此类标签的所有出现,加上(如果可能的话)找到标签出现的 'page'。 我在网上发现了一些东西,但不止一次基本方法是将文件的所有内容转储到一个字符串中,然后在不考虑 .docx 编码的情况下查看该字符串。这导致误报或根本不匹配(而测试 .docx 文件包含多个标签),其他示例可能有点超出我对 OpenXML 的了解。 查找此类标签的正则表达式模式应该是这样的:
可以在整个文档中找到该标签(在 table、文本、段落以及页眉和页脚中)。
我正在 Visual Studio 2013 .NET 4.5 中编码,但如果需要我可以回来。 P.S。我更喜欢不使用 Office Interop API 的代码,因为目标平台不会 运行 Office。
我能生成的最小 .docx 示例将其存储在文档中
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="" xmlns:mc="" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="" xmlns:m="" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="" xmlns:wp="" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="" xmlns:w14="" xmlns:w15="" xmlns:wpg="" xmlns:wpi="" xmlns:wne="" xmlns:wps="" mc:Ignorable="w14 w15 wp14">
<w:p w:rsidR="00CA7780" w:rsidRDefault="00815E5D">
<w:lang w:val="en-GB"/>
<w:lang w:val="en-GB"/>
<w:p w:rsidR="00815E5D" w:rsidRDefault="00815E5D">
<w:lang w:val="en-GB"/>
<w:proofErr w:type="gramStart"/>
<w:lang w:val="en-GB"/>
<w:proofErr w:type="gramEnd"/>
<w:lang w:val="en-GB"/>
<w:p w:rsidR="00815E5D" w:rsidRPr="00815E5D" w:rsidRDefault="00815E5D">
<w:lang w:val="en-GB"/>
<w:lang w:val="en-GB"/>
<w:bookmarkStart w:id="0" w:name="_GoBack"/>
<w:bookmarkEnd w:id="0"/>
<w:sectPr w:rsidR="00815E5D" w:rsidRPr="00815E5D">
<w:pgSz w:w="11906" w:h="16838"/>
<w:pgMar w:top="1417" w:right="1134" w:bottom="1134" w:left="1134" w:header="708" w:footer="708" w:gutter="0"/>
<w:cols w:space="708"/>
<w:docGrid w:linePitch="360"/>
此致, 麦克
不确定 SDK 是否更好,但这可以工作并生成一个字典,其中包含标签的名称和一个您可以将新值设置为的元素:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Xml.Linq;
namespace ConsoleApplication8
class Program
static void Main(string[] args)
Dictionary<string, XElement> lookupTable = new Dictionary<string, XElement>();
Regex reg = new Regex(@"\<\!(?<TagName>.*)\!\>");
XDocument doc = XDocument.Load("document.xml");
XNamespace ns = doc.Root.GetNamespaceOfPrefix("w");
IEnumerable<XElement> elements = doc.Root.Descendants(ns + "t").Where(x=> x.Value.StartsWith("<!")).ToArray();
foreach (var item in elements)
#region remove the grammar tag
XElement grammar = item.Parent.PreviousNode as XElement;
grammar = item.Parent.NextNode as XElement;
#region merge the two nodes and insert the name and the XElement to the dictionary
XElement next = (item.Parent.NextNode as XElement).Element(ns + "t");
string totalTagName = string.Format("{0}{1}", item.Value, next.Value);
item.Value = totalTagName;
lookupTable.Add(reg.Match(totalTagName).Groups["TagName"].Value, item);
foreach (var item in lookupTable)
Console.WriteLine("The document contains a tag {0}" , item.Key);
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml.Linq;
using System.IO.Compression; //you will have to add a reference to System.IO.Compression.FileSystem(.dll)
using System.IO;
using System.Text.RegularExpressions;
namespace ConsoleApplication28
public class MyWordDocument
#region fields
private string fileName;
private XDocument document;
//todo: create fields for all document xml files that can contain the placeholders
private Dictionary<string, List<XElement>> lookUpTable;
#region properties
public IEnumerable<string> Tags { get { return lookUpTable.Keys; } }
#region construction
public MyWordDocument(string fileName)
this.fileName = fileName;
#region methods
public void ReplaceTagWithValue(string tagName, string value)
foreach (var item in lookUpTable[tagName])
item.Value = item.Value.Replace(string.Format(@"<!{0}!>", tagName),value);
public void Save(string fileName)
//todo: save other parts of document here i.e. footer header or other stuff
ZipFile.CreateFromDirectory("temp", fileName);
private void CreateLookUp()
//todo: make this work for all cases and for all files that can contain the placeholders
//tip: open the raw document in word and replace the tags,
// save the file to different location and extract the xmlfiles of both versions and compare to see what you have to do
lookUpTable = new Dictionary<string, List<XElement>>();
Regex reg = new Regex(@"\<\!(?<TagName>.*)\!\>");
document = XDocument.Load(@"temp\word\document.xml");
XNamespace ns = document.Root.GetNamespaceOfPrefix("w");
IEnumerable<XElement> elements = document.Root.Descendants(ns + "t").Where(NodeGotSplitUpIn2PartsDueToGrammarCheck).ToArray();
foreach (var item in elements)
XElement grammar = item.Parent.PreviousNode as XElement;
grammar = item.Parent.NextNode as XElement;
XElement next = (item.Parent.NextNode as XElement).Element(ns + "t");
string totalTagName = string.Format("{0}{1}", item.Value, next.Value);
item.Value = totalTagName;
string tagName = reg.Match(totalTagName).Groups["TagName"].Value;
if (lookUpTable.ContainsKey(tagName))
lookUpTable.Add(tagName, new List<XElement> { item });
private bool NodeGotSplitUpIn2PartsDueToGrammarCheck(XElement node)
XNamespace ns = node.Document.Root.GetNamespaceOfPrefix("w");
return node.Value.StartsWith("<!") && ((XElement)node.Parent.PreviousNode).Name == ns + "proofErr";
private void ExtractDocument()
if (!Directory.Exists("temp"))
ZipFile.ExtractToDirectory(fileName, "temp");
class Program
static void Main(string[] args)
MyWordDocument doc = new MyWordDocument("somedoc.docx"); //todo: fix path
foreach (string name in doc.Tags) //name would be the extracted name from the placeholder
doc.ReplaceTagWithValue(name, "Example");
doc.Save("output.docx"); //todo: fix path
尝试查找标签的问题在于,单词并不总是以它们在 Word 中显示的格式存在于基础 XML 中。例如,在您的示例 XML 中,<!TAG1!>
<w:lang w:val="en-GB"/>
<w:proofErr w:type="gramEnd"/>
<w:lang w:val="en-GB"/>
处理此问题的一种方法是找到 Paragraph
的 InnerText
并将其与您的 Regex
进行比较。 InnerText
属性 将 return 段落的纯文本,没有任何格式或基础文档中的其他 XML 妨碍。
有了标签后,下一个问题就是替换文本。由于上述原因,您不能只用一些新文本替换 InnerText
,因为不清楚文本的哪些部分属于 Run
。最简单的方法是删除任何现有的 Run
并添加一个新的 Run
和包含新文本的 Text
private static void ReplaceTags(string filename)
Regex regex = new Regex("<!(.)*?!>", RegexOptions.Compiled);
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(filename, true))
//grab the header parts and replace tags there
foreach (HeaderPart headerPart in wordDocument.MainDocumentPart.HeaderParts)
ReplaceParagraphParts(headerPart.Header, regex);
//now do the document
ReplaceParagraphParts(wordDocument.MainDocumentPart.Document, regex);
//now replace the footer parts
foreach (FooterPart footerPart in wordDocument.MainDocumentPart.FooterParts)
ReplaceParagraphParts(footerPart.Footer, regex);
private static void ReplaceParagraphParts(OpenXmlElement element, Regex regex)
foreach (var paragraph in element.Descendants<Paragraph>())
Match match = regex.Match(paragraph.InnerText);
if (match.Success)
//create a new run and set its value to the correct text
//this must be done before the child runs are removed otherwise
//paragraph.InnerText will be empty
Run newRun = new Run();
newRun.AppendChild(new Text(paragraph.InnerText.Replace(match.Value, "some new value")));
//remove any child runs
//add the newly created run
上述方法的一个缺点是您可能拥有的任何样式都将丢失。这些可以从现有的 Run
中复制,但如果有多个 Run
具有不同的属性,您将需要确定哪些需要复制到哪里。如果需要的话,没有什么可以阻止您在上面的代码中创建多个 Run
除了我想使用 ${...}
条目而不是 <!...!>
以下代码适用于 xml 以及打开的 xml 节点。我使用 xml 测试了代码,因为当涉及到 word 文档时,很难控制 word 如何排列段落、运行和文本元素。我想这不是不可能,但这样我有更多的控制权:
static void Main(string[] args)
//FillInValues(FileName("test01.docx"), FileName("test01_out.docx"));
string[,] tests =
{ "<r><t>${abc</t><t>}$</t><t>{tha}</t></r>", "<r><t>ABC</t><t>THA</t><t></t></r>"},
{ "<r><t>$</t><t>{</t><t>abc</t><t>}</t></r>", "<r><t>ABC</t><t></t></r>"},
{"<r><t>${abc}</t></r>", "<r><t>ABC</t></r>" },
{"<r><t>x${abc}</t></r>", "<r><t>xABC</t></r>" },
{"<r><t>x${abc}y</t></r>", "<r><t>xABCy</t></r>" },
{"<r><t>x${abc}${tha}z</t></r>", "<r><t>xABCTHAz</t></r>" },
{"<r><t>x${abc}u${tha}z</t></r>", "<r><t>xABCuTHAz</t></r>" },
{"<r><t>x${ab</t><t>c}u</t></r>", "<r><t>xABC</t><t>u</t></r>" },
{"<r><t>x${ab</t><t>yupeekaiiei</t><t>c}u</t></r>", "<r><t>xABYUPEEKAIIEIC</t><t>u</t></r>" },
{"<r><t>x${ab</t><t>yupeekaiiei</t><t>}</t></r>", "<r><t>xABYUPEEKAIIEI</t><t></t></r>" },
for (int i = 0; i < tests.GetLength(0); i++)
string value = tests[i, 0];
string expectedValue = tests[i, 1];
string actualValue = Test(value);
Console.WriteLine($"{value} => {actualValue} == {expectedValue} = {actualValue == expectedValue}");
public interface ITextReplacer
string ReplaceValue(string value);
public class DefaultTextReplacer : ITextReplacer
public string ReplaceValue(string value) { return $"{value.ToUpper()}"; }
public interface ITextElement
string Value { get; set; }
void RemoveFromParent();
public class XElementWrapper : ITextElement
private XElement _element;
public XElementWrapper(XElement element) { _element = element; }
string ITextElement.Value
get { return _element.Value; }
set { _element.Value = value; }
public XElement Element
get { return _element; }
set { _element = value; }
public void RemoveFromParent()
public class OpenXmlTextWrapper : ITextElement
private Text _text;
public OpenXmlTextWrapper(Text text) { _text = text; }
public string Value
get { return _text.Text; }
set { _text.Text = value; }
public Text Text
get { return _text; }
set { _text = value; }
public void RemoveFromParent() { _text.Remove(); }
private static void FillInValues(string sourceFileName, string destFileName)
File.Copy(sourceFileName, destFileName, true);
using (WordprocessingDocument doc =
WordprocessingDocument.Open(destFileName, true))
var body = doc.MainDocumentPart.Document.Body;
var paras = body.Descendants<Paragraph>();
SimpleStateMachine stateMachine = new SimpleStateMachine();
//stateMachine.TextReplacer = <your implementation object >
ProcessParagraphs(paras, stateMachine);
private static void ProcessParagraphs(IEnumerable<Paragraph> paras, SimpleStateMachine stateMachine)
foreach (var para in paras)
foreach (var run in para.Elements<Run>())
//Console.WriteLine("New run:");
var texts = run.Elements<Text>().ToArray();
for (int k = 0; k < texts.Length; k++)
OpenXmlTextWrapper wrapper = new OpenXmlTextWrapper(texts[k]);
public class SimpleStateMachine
// 0 - outside - initial state
// 1 - $ matched
// 2 - ${ matched
// 3 - } - final state
// 0 -> 1 $
// 0 -> 0 anything other than $
// 1 -> 2 {
// 1 -> 0 anything other than {
// 2 -> 3 }
// 2 -> 2 anything other than }
// 3 -> 0
public ITextReplacer TextReplacer { get; set; } = new DefaultTextReplacer();
public int State { get; set; } = 0;
public List<ITextElement> TextsList { get; } = new List<ITextElement>();
public StringBuilder Buffer { get; } = new StringBuilder();
/// <summary>
/// The index inside the Text element where the $ is found
/// </summary>
public int Position { get; set; }
public void Reset()
State = 0;
public void Add(ITextElement text)
if (TextsList.Count == 0 || TextsList.Last() != text)
public void HandleText(ITextElement text)
// Scan the characters
for (int i = 0; i < text.Value.Length; i++)
char c = text.Value[i];
switch (State)
case 0:
if (c == '$')
State = 1;
Position = i;
case 1:
if (c == '{')
State = 2;
case 2:
if (c == '}')
Console.WriteLine("Found: " + Buffer);
// We are on the final State
// I will use the first text in the stack and discard the others
// Here I am going to distinguish between whether I have only one item or more
if (TextsList.Count == 1)
// Happy path - we have only one item - set the replacement value and then continue scanning
string prefix = TextsList[0].Value.Substring(0, Position) + TextReplacer.ReplaceValue(Buffer.ToString());
// Set the current index to point to the end of the prefix.The program will continue to with the next items
TextsList[0].Value = prefix + TextsList[0].Value.Substring(i + 1);
i = prefix.Length - 1;
// We have more than one item - discard the inbetweeners
for (int j = 1; j < TextsList.Count - 1; j++)
// I will set the value under the first Text item where the $ was found
TextsList[0].Value = TextsList[0].Value.Substring(0, Position) + TextReplacer.ReplaceValue(Buffer.ToString());
// Set the text for the current item to the remaining chars
text.Value = text.Value.Substring(i + 1);
i = -1;
public static string Test(string xml)
XElement root = XElement.Parse(xml);
SimpleStateMachine stateMachine = new SimpleStateMachine();
foreach (XElement element in root.Descendants()
.Where(desc => !desc.Elements().Any()))
XElementWrapper wrapper = new XElementWrapper(element);
return root.ToString(SaveOptions.DisableFormatting);
更新:当 ${...}
占位符放在 table 中时,代码不起作用。这是扫描文档的代码(FillInValues 函数)的问题。