Dot-Net

從 XLSX 導出大量數據 - OutOfMemoryException

  • January 7, 2020

我即將以 Excel OpenXML 格式 (xlsx) 導出大量數據(115.000 行 x 30 列)。我正在使用一些庫,如 DocumentFormat.OpenXML、ClosedXML、NPOI。

對於每一個,都會引發 OutOfMemoryException,因為記憶體中工作表的表示會導致記憶體呈指數增長。

同樣每 1000 行關閉文件文件(並釋放記憶體),下一次載入會導致記憶體增加。

有沒有更高效的方式來導出 xlsx 中的數據而不佔用大量記憶體?

OpenXML SDK 是完成這項工作的正確工具,但您需要小心使用SAX(XML 的簡單 API)方法而不是DOM方法。來自 SAX 的連結維基百科文章:

DOM 對整個文件進行操作,而 SAX 解析器則按順序對 XML 文件的每一部分進行操作

大大減少了處理大型 Excel 文件時消耗的記憶體量。

這裡有一篇很好的文章 - http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/

改編自那篇文章,這是一個輸出 115k 行和 30 列的範例:

public static void LargeExport(string filename)
{
   using (SpreadsheetDocument document = SpreadsheetDocument.Create(filename, SpreadsheetDocumentType.Workbook))
   {
       //this list of attributes will be used when writing a start element
       List<OpenXmlAttribute> attributes;
       OpenXmlWriter writer;

       document.AddWorkbookPart();
       WorksheetPart workSheetPart = document.WorkbookPart.AddNewPart<WorksheetPart>();

       writer = OpenXmlWriter.Create(workSheetPart);            
       writer.WriteStartElement(new Worksheet());
       writer.WriteStartElement(new SheetData());

       for (int rowNum = 1; rowNum <= 115000; ++rowNum)
       {
           //create a new list of attributes
           attributes = new List<OpenXmlAttribute>();
           // add the row index attribute to the list
           attributes.Add(new OpenXmlAttribute("r", null, rowNum.ToString()));

           //write the row start element with the row index attribute
           writer.WriteStartElement(new Row(), attributes);

           for (int columnNum = 1; columnNum <= 30; ++columnNum)
           {
               //reset the list of attributes
               attributes = new List<OpenXmlAttribute>();
               // add data type attribute - in this case inline string (you might want to look at the shared strings table)
               attributes.Add(new OpenXmlAttribute("t", null, "str"));
               //add the cell reference attribute
               attributes.Add(new OpenXmlAttribute("r", "", string.Format("{0}{1}", GetColumnName(columnNum), rowNum)));

               //write the cell start element with the type and reference attributes
               writer.WriteStartElement(new Cell(), attributes);
               //write the cell value
               writer.WriteElement(new CellValue(string.Format("This is Row {0}, Cell {1}", rowNum, columnNum)));

               // write the end cell element
               writer.WriteEndElement();
           }

           // write the end row element
           writer.WriteEndElement();
       }

       // write the end SheetData element
       writer.WriteEndElement();
       // write the end Worksheet element
       writer.WriteEndElement();
       writer.Close();

       writer = OpenXmlWriter.Create(document.WorkbookPart);
       writer.WriteStartElement(new Workbook());
       writer.WriteStartElement(new Sheets());

       writer.WriteElement(new Sheet()
       {
           Name = "Large Sheet",
           SheetId = 1,
           Id = document.WorkbookPart.GetIdOfPart(workSheetPart)
       });

       // End Sheets
       writer.WriteEndElement();
       // End Workbook
       writer.WriteEndElement();

       writer.Close();

       document.Close();
   }
}

//A simple helper to get the column name from the column index. This is not well tested!
private static string GetColumnName(int columnIndex)
{
   int dividend = columnIndex;
   string columnName = String.Empty;
   int modifier;

   while (dividend > 0)
   {
       modifier = (dividend - 1) % 26;
       columnName = Convert.ToChar(65 + modifier).ToString() + columnName;
       dividend = (int)((dividend - modifier) / 26);
   }

   return columnName;
}

引用自:https://stackoverflow.com/questions/32690851