流式生成Excel文件

当我们要导出数据库数据到Excel文件时,如果数据量特别大,那么可能需要耗费较多内存造成OOM。即使没有OOM,也有可能因为生成Excel文件的时间太久导致请求超时。这时候就需要POI的SXSSF(org.apache.poi.xssf.streaming)功能了。

Excel两种格式

  • Excel 97(-2007) file format

  • Excel 2007 OOXML (.xlsx) file format

HSSF is the POI Project's pure Java implementation of the Excel '97(-2007) file format. XSSF is the POI Project's pure Java implementation of the Excel 2007 OOXML (.xlsx) file format.

HSSF and XSSF provides ways to read spreadsheets create, modify, read and write XLS spreadsheets. They provide:

  • low level structures for those with special needs
  • an eventmodel api for efficient read-only access
  • a full usermodel api for creating, reading and modifying XLS files

Since 3.8-beta3, POI provides a low-memory footprint SXSSF API built on top of XSSF.

SXSSF is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

In auto-flush mode the size of the access window can be specified, to hold a certain number of rows in memory. When that value is reached, the creation of an additional row causes the row with the lowest index to to be removed from the access window and written to disk. Or, the window size can be set to grow dynamically; it can be trimmed periodically by an explicit call to flushRows(int keepRows) as needed.

Due to the streaming nature of the implementation, there are the following limitations when compared to XSSF:

  • Only a limited number of rows are accessible at a point in time.
  • Sheet.clone() is not supported.
  • Formula evaluation is not supported

SXSSF

SXSSF是如何减小内存消耗的呢?它通过将数据写到临时文件来减少内存使用,降低发生OOM错误的概率。

// turn off auto-flushing and accumulate all rows in memory
SXSSFWorkbook wb = new SXSSFWorkbook(-1); 
复制代码

你也可以在构造方法里,指定-1来关闭自动写入数据到文件,将所有数据内容保持在内存里。

扫描二维码关注公众号,回复: 2382212 查看本文章

虽然这里处理了内存OOM的问题,但是还是必须将全部数据写到一个临时文件之后才能响应请求,请求超时的问题没有解决。

流式生成

Excel 2007 OOXML (.xlsx) 文件格式其实本质上是一个zip文件,我们可以把.xlsx文件后缀名改为.zip,然后解压:

$ mv output.xlsx output.zip
$ unzip output.zip
$ tree output/
output/
├── [Content_Types].xml
├── _rels
├── docProps
│   ├── app.xml
│   └── core.xml
└── xl
    ├── _rels
    │   └── workbook.xml.rels
    ├── sharedStrings.xml
    ├── styles.xml
    ├── workbook.xml
    └── worksheets
        └── sheet1.xml

5 directories, 8 files
复制代码

我们可以看到这个Excel文件解压后包含了上面那些文件,其中styles是我们定义的样式格式(包括字体、文字大小、颜色、居中等属性),worksheets目录下是我们的数据内容。

通过具体分析数据格式,我们可以自己控制xlsx文件的写入过程,将数据直接写到响应流上而非临时文件就可以完美解决请求超时的问题。

示例代码

XSSFWorkbook wb = new XSSFWorkbook()
XSSFCellStyle headerStyle = genHeaderStyle(wb)
sheets.each { sheet ->
    def xssfSheet = wb.createSheet(sheet.name)
    sheet.setXSSFSheet(xssfSheet)
    sheet.setHeaderStyle(headerStyle)
}
File template = genTemplateFile(wb)
ZipOutputStream zos = new ZipOutputStream(responseStream);
ZipFile templateZip = new ZipFile(template);
Enumeration<ZipEntry> templateEntries = templateZip.entries();
try {
  while (templateEntries.hasMoreElements()) {
    // copy all template content to the ZipOutputStream zos
    // except the sheet itself
  }
  zos.putNextEntry(new ZipEntry(sheetName)); // now the sheet
  OutputStreamWriter sheetOut = new OutputStreamWriter(zos, "UTF-8");
  try {
    sheetOut.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
    sheetOut.write("<worksheet><sheetData>");
    // write the content – rows and cells
    sheetOut.write("</sheetData></worksheet>");
  } finally { sheetOut.close(); }
} finally { zos.close(); }
复制代码

其中,template包含了一些索引信息,比如建了哪些样式、几个sheet等,这些信息是放到ZIP文件的最前面的,最后才是sheet内容数据。

我的博客原文地址blog.yu000hong.com/2018/07/24/…

猜你喜欢

转载自juejin.im/post/5b590fbb51882561da21616e
今日推荐