Dot-Net
從 XLSX 導出大量數據 - OutOfMemoryException
我即將以 Excel OpenXML 格式 (xlsx) 導出大量數據(115.000 行 x 30 列)。我正在使用一些庫,如 DocumentFormat.OpenXML、ClosedXML、NPOI。
對於每一個,都會引發 OutOfMemoryException,因為記憶體中工作表的表示會導致記憶體呈指數增長。
同樣每 1000 行關閉文件文件(並釋放記憶體),下一次載入會導致記憶體增加。
有沒有更高效的方式來導出 xlsx 中的數據而不佔用大量記憶體?
OpenXML SDK 是完成這項工作的正確工具,但您需要小心使用SAX(XML 的簡單 API)方法而不是DOM方法。來自 SAX 的連結維基百科文章:
DOM 對整個文件進行操作,而 SAX 解析器則按順序對 XML 文件的每一部分進行操作
這大大減少了處理大型 Excel 文件時消耗的記憶體量。
這裡有一篇很好的文章 - http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/
改編自那篇文章,這是一個輸出 115k 行和 30 列的範例:
public static void LargeExport(string filename) { using (SpreadsheetDocument document = SpreadsheetDocument.Create(filename, SpreadsheetDocumentType.Workbook)) { //this list of attributes will be used when writing a start element List<OpenXmlAttribute> attributes; OpenXmlWriter writer; document.AddWorkbookPart(); WorksheetPart workSheetPart = document.WorkbookPart.AddNewPart<WorksheetPart>(); writer = OpenXmlWriter.Create(workSheetPart); writer.WriteStartElement(new Worksheet()); writer.WriteStartElement(new SheetData()); for (int rowNum = 1; rowNum <= 115000; ++rowNum) { //create a new list of attributes attributes = new List<OpenXmlAttribute>(); // add the row index attribute to the list attributes.Add(new OpenXmlAttribute("r", null, rowNum.ToString())); //write the row start element with the row index attribute writer.WriteStartElement(new Row(), attributes); for (int columnNum = 1; columnNum <= 30; ++columnNum) { //reset the list of attributes attributes = new List<OpenXmlAttribute>(); // add data type attribute - in this case inline string (you might want to look at the shared strings table) attributes.Add(new OpenXmlAttribute("t", null, "str")); //add the cell reference attribute attributes.Add(new OpenXmlAttribute("r", "", string.Format("{0}{1}", GetColumnName(columnNum), rowNum))); //write the cell start element with the type and reference attributes writer.WriteStartElement(new Cell(), attributes); //write the cell value writer.WriteElement(new CellValue(string.Format("This is Row {0}, Cell {1}", rowNum, columnNum))); // write the end cell element writer.WriteEndElement(); } // write the end row element writer.WriteEndElement(); } // write the end SheetData element writer.WriteEndElement(); // write the end Worksheet element writer.WriteEndElement(); writer.Close(); writer = OpenXmlWriter.Create(document.WorkbookPart); writer.WriteStartElement(new Workbook()); writer.WriteStartElement(new Sheets()); writer.WriteElement(new Sheet() { Name = "Large Sheet", SheetId = 1, Id = document.WorkbookPart.GetIdOfPart(workSheetPart) }); // End Sheets writer.WriteEndElement(); // End Workbook writer.WriteEndElement(); writer.Close(); document.Close(); } } //A simple helper to get the column name from the column index. This is not well tested! private static string GetColumnName(int columnIndex) { int dividend = columnIndex; string columnName = String.Empty; int modifier; while (dividend > 0) { modifier = (dividend - 1) % 26; columnName = Convert.ToChar(65 + modifier).ToString() + columnName; dividend = (int)((dividend - modifier) / 26); } return columnName; }