OpenDocumentXML Excel 在插入单元格 sharedstringtable 值时呈指数减慢
OpenDocumentXML Excel exponentitally slows when inserting cell sharedstringtable values
代码示例中的标准方法是迭代共享字符串table 搜索要添加的值。添加;如果找不到,如果字符串存在则使用现有的。然而,这种搜索在几百个值之后会急剧变慢。
我使用 LINQ 查询来搜索字符串,它显示得更快。请参阅代码中注释掉的行。但是,我无法确定找到的物品的位置。有谁知道在共享字符串 table.
中识别项目位置的方法
public static int InsertSharedStringItem(string text, InfoReg.OpenXML.Spreadsheet mySpreadsheet)
{
// If the part does not contain a SharedStringTable, create one.
if (mySpreadsheet.sharedstringtable == null)
{
mySpreadsheet.shareStringPart.SharedStringTable = new SharedStringTable();
}
//SharedStringItem item = mySpreadsheet.sharedstringtable.Elements<SharedStringItem>()
// .Where(t => t.InnerText == text)
// .FirstOrDefault();
//if(item != null)
//{
// return item's location in the shared string table;
//}
// Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
int i = 0;
foreach (SharedStringItem item1 in mySpreadsheet.sharedstringtable.Elements<SharedStringItem>())
{
if (item1.InnerText == text)
{
return i;
}
i++;
}
// The text does not exist in the part. Create the SharedStringItem and return its index.
mySpreadsheet.sharedstringtable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
return i;
}
Sharedstringtable 值添加使用:
index = InfoReg.OpenXML.Spreadsheet.InsertSharedStringItem(string.IsNullOrEmpty(datarow.AssignedTo) ? string.Empty : datarow.AssignedTo, myspreadsheet);
// Insert cell into the new worksheet.
cell = InfoReg.OpenXML.Spreadsheet.InsertCellInWorksheet(InfoReg.OpenXML.Spreadsheet.GetExcelColumnName(colAddr++), rowAddr, myspreadsheet.worksheetPart.Worksheet);
// Set the value of cell.
cell.CellValue = new CellValue(index.ToString());
cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
为了限制问题,我对匹配概率较低的单元格使用非共享字符串方法。我使用此代码:
Cell cell = InfoReg.OpenXML.Spreadsheet.InsertCellInWorksheet(InfoReg.OpenXML.Spreadsheet.GetExcelColumnName(colAddr++), rowAddr, myspreadsheet.worksheetPart.Worksheet);
// Set the value of cell.
cell.CellValue = new CellValue(datarow.ForeName);
cell.DataType = new EnumValue<CellValues>(CellValues.String);
一些文章建议使用带有共享字符串副本的字典 table。我没有看到词典在性能上有任何显着改善。
字典的想法应该会给您带来更好的性能,但是如果您每次要插入元素时都重新创建字典,您将不会节省任何时间,实际上只会花费更长的时间。
我的看法是将 SharedStringTable 包装在一个包装器中,该包装器缓存并跟踪元素,并确保包装器是您访问共享字符串的唯一方式 table。
public class SharedStringWrapper
{
private readonly SharedStringTable _table;
private readonly Lazy<Dictionary<string, (int index, OpenXmlElement element)>> _lazyDict;
private int _maxElement;
public SharedStringWrapper(SharedStringTable table)
{
_table = table;
_maxElement = _table.Elements().Count();
//lazy initialize, this could take a while, no reason to do it until we need it.
//in this case we ignore the casing of the text
_lazyDict = new Lazy<Dictionary<string, (int index, OpenXmlElement element)>>(
_table.Select((element, index) => (element, index)).ToDictionary(k => k.element.InnerText, v => (v.index, itm: v.element)));
}
/// <summary>
/// Inserts text into the shared string table and returns the index of the inserted text.
/// If the text already exists, it returns the index of the existing element
/// </summary>
/// <param name="text">text to insert</param>
/// <returns>index of the element in the shared strings table</returns>
public int InsertTextElement(string text)
{
//this is where you get the huge time saving - first time it will take a while, subsequently it'll be way faster
if (_lazyDict.Value.TryGetValue(text, out var value))
return value.index;
//append child and increment the count.
_table.AppendChild(new SharedStringItem(new Text(text)));
_maxElement++;
return _maxElement;
}
}
在使用 jAnderson 答案中的 Lazy(Dictionary..工作解决方案的路径。
下面的代码在 https://docs.microsoft.com/en-us/office/open-xml/working-with-the-shared-string-table.
记录的标准迭代方法的基础上提供了显着的性能改进
修改后的包装器 class 使用字典来初始镜像共享字符串 table,并且当新条目添加到 table 时,它们也会添加到字典中。因此,可以搜索字典以查找现有条目,而不是遍历共享字符串 table.
我们的一些数据不完整。经常会遇到 Null 和空字符串。因此,添加了空测试和从 SharedStringTable 返回空字符串条目的快速路径。
虽然这是一个巨大的改进,但 SharedStringTable 的内部管理可以使用一些二进制索引或类似的更具可扩展性的解决方案。
修改后的包装器class.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
namespace MyOpenXML
{
public class SharedStringWrapper
{
private readonly SharedStringTable _table;
private readonly Dictionary<string, int> _dict;
private int _maxElement;
private int _emptyStringElement;
public SharedStringWrapper(SharedStringTable table)
{
_table = table; // Gets the table as is in the spreadsheet
_maxElement = _table.Elements().Count(); // Count of elements at the start
_dict = new Dictionary<string, int>(); // Shadow copy of the shared string table
int i = 0;
foreach (SharedStringItem item in _table.Elements<SharedStringItem>()) // _dict is given its initial copy
{
_dict.Add(item.InnerText, i++);
}
// Add an empty string entry if one does not exist.
_emptyStringElement = _maxElement;
try
{
_emptyStringElement = _dict[string.Empty]; // Used for fast location of empty string value
}
catch
{
_table.AppendChild(new SharedStringItem(new Text(string.Empty)));
_dict.Add(string.Empty, _maxElement++);
}
}
/// <summary>
/// Inserts text into the shared string table and returns the index of the inserted text.
/// If the text already exists, it returns the index of the existing element.
/// Also inserts a copy in _dict so that it is a copy of the shared string table.
/// </summary>
/// <param name="text">text to insert</param>
/// <returns>index of the element in the shared strings table</returns>
public int InsertTextElement(string text)
{
if (string.IsNullOrEmpty(text)) return _emptyStringElement; // Treat null as empty string
int indx = _maxElement; // Value for new entry
try
{
indx = _dict[text]; // If the text is found the index is to be returned
}
catch // An entry has not been found, add it to both the shared string table and the dictionary
{
//append child and increment the count.
_table.AppendChild(new SharedStringItem(new Text(text)));
_dict.Add(text, _maxElement++);
}
return indx; // return index.
}
}
}
代码示例中的标准方法是迭代共享字符串table 搜索要添加的值。添加;如果找不到,如果字符串存在则使用现有的。然而,这种搜索在几百个值之后会急剧变慢。
我使用 LINQ 查询来搜索字符串,它显示得更快。请参阅代码中注释掉的行。但是,我无法确定找到的物品的位置。有谁知道在共享字符串 table.
中识别项目位置的方法 public static int InsertSharedStringItem(string text, InfoReg.OpenXML.Spreadsheet mySpreadsheet)
{
// If the part does not contain a SharedStringTable, create one.
if (mySpreadsheet.sharedstringtable == null)
{
mySpreadsheet.shareStringPart.SharedStringTable = new SharedStringTable();
}
//SharedStringItem item = mySpreadsheet.sharedstringtable.Elements<SharedStringItem>()
// .Where(t => t.InnerText == text)
// .FirstOrDefault();
//if(item != null)
//{
// return item's location in the shared string table;
//}
// Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
int i = 0;
foreach (SharedStringItem item1 in mySpreadsheet.sharedstringtable.Elements<SharedStringItem>())
{
if (item1.InnerText == text)
{
return i;
}
i++;
}
// The text does not exist in the part. Create the SharedStringItem and return its index.
mySpreadsheet.sharedstringtable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
return i;
}
Sharedstringtable 值添加使用:
index = InfoReg.OpenXML.Spreadsheet.InsertSharedStringItem(string.IsNullOrEmpty(datarow.AssignedTo) ? string.Empty : datarow.AssignedTo, myspreadsheet);
// Insert cell into the new worksheet.
cell = InfoReg.OpenXML.Spreadsheet.InsertCellInWorksheet(InfoReg.OpenXML.Spreadsheet.GetExcelColumnName(colAddr++), rowAddr, myspreadsheet.worksheetPart.Worksheet);
// Set the value of cell.
cell.CellValue = new CellValue(index.ToString());
cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
为了限制问题,我对匹配概率较低的单元格使用非共享字符串方法。我使用此代码:
Cell cell = InfoReg.OpenXML.Spreadsheet.InsertCellInWorksheet(InfoReg.OpenXML.Spreadsheet.GetExcelColumnName(colAddr++), rowAddr, myspreadsheet.worksheetPart.Worksheet);
// Set the value of cell.
cell.CellValue = new CellValue(datarow.ForeName);
cell.DataType = new EnumValue<CellValues>(CellValues.String);
一些文章建议使用带有共享字符串副本的字典
字典的想法应该会给您带来更好的性能,但是如果您每次要插入元素时都重新创建字典,您将不会节省任何时间,实际上只会花费更长的时间。
我的看法是将 SharedStringTable 包装在一个包装器中,该包装器缓存并跟踪元素,并确保包装器是您访问共享字符串的唯一方式 table。
public class SharedStringWrapper
{
private readonly SharedStringTable _table;
private readonly Lazy<Dictionary<string, (int index, OpenXmlElement element)>> _lazyDict;
private int _maxElement;
public SharedStringWrapper(SharedStringTable table)
{
_table = table;
_maxElement = _table.Elements().Count();
//lazy initialize, this could take a while, no reason to do it until we need it.
//in this case we ignore the casing of the text
_lazyDict = new Lazy<Dictionary<string, (int index, OpenXmlElement element)>>(
_table.Select((element, index) => (element, index)).ToDictionary(k => k.element.InnerText, v => (v.index, itm: v.element)));
}
/// <summary>
/// Inserts text into the shared string table and returns the index of the inserted text.
/// If the text already exists, it returns the index of the existing element
/// </summary>
/// <param name="text">text to insert</param>
/// <returns>index of the element in the shared strings table</returns>
public int InsertTextElement(string text)
{
//this is where you get the huge time saving - first time it will take a while, subsequently it'll be way faster
if (_lazyDict.Value.TryGetValue(text, out var value))
return value.index;
//append child and increment the count.
_table.AppendChild(new SharedStringItem(new Text(text)));
_maxElement++;
return _maxElement;
}
}
在使用 jAnderson 答案中的 Lazy(Dictionary..工作解决方案的路径。
下面的代码在 https://docs.microsoft.com/en-us/office/open-xml/working-with-the-shared-string-table.
记录的标准迭代方法的基础上提供了显着的性能改进修改后的包装器 class 使用字典来初始镜像共享字符串 table,并且当新条目添加到 table 时,它们也会添加到字典中。因此,可以搜索字典以查找现有条目,而不是遍历共享字符串 table.
我们的一些数据不完整。经常会遇到 Null 和空字符串。因此,添加了空测试和从 SharedStringTable 返回空字符串条目的快速路径。
虽然这是一个巨大的改进,但 SharedStringTable 的内部管理可以使用一些二进制索引或类似的更具可扩展性的解决方案。
修改后的包装器class.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
namespace MyOpenXML
{
public class SharedStringWrapper
{
private readonly SharedStringTable _table;
private readonly Dictionary<string, int> _dict;
private int _maxElement;
private int _emptyStringElement;
public SharedStringWrapper(SharedStringTable table)
{
_table = table; // Gets the table as is in the spreadsheet
_maxElement = _table.Elements().Count(); // Count of elements at the start
_dict = new Dictionary<string, int>(); // Shadow copy of the shared string table
int i = 0;
foreach (SharedStringItem item in _table.Elements<SharedStringItem>()) // _dict is given its initial copy
{
_dict.Add(item.InnerText, i++);
}
// Add an empty string entry if one does not exist.
_emptyStringElement = _maxElement;
try
{
_emptyStringElement = _dict[string.Empty]; // Used for fast location of empty string value
}
catch
{
_table.AppendChild(new SharedStringItem(new Text(string.Empty)));
_dict.Add(string.Empty, _maxElement++);
}
}
/// <summary>
/// Inserts text into the shared string table and returns the index of the inserted text.
/// If the text already exists, it returns the index of the existing element.
/// Also inserts a copy in _dict so that it is a copy of the shared string table.
/// </summary>
/// <param name="text">text to insert</param>
/// <returns>index of the element in the shared strings table</returns>
public int InsertTextElement(string text)
{
if (string.IsNullOrEmpty(text)) return _emptyStringElement; // Treat null as empty string
int indx = _maxElement; // Value for new entry
try
{
indx = _dict[text]; // If the text is found the index is to be returned
}
catch // An entry has not been found, add it to both the shared string table and the dictionary
{
//append child and increment the count.
_table.AppendChild(new SharedStringItem(new Text(text)));
_dict.Add(text, _maxElement++);
}
return indx; // return index.
}
}
}