GithubHelp home page GithubHelp logo

Continuously stream rows about fastexcel HOT 6 CLOSED

dhatim avatar dhatim commented on August 12, 2024
Continuously stream rows

from fastexcel.

Comments (6)

ochedru avatar ochedru commented on August 12, 2024

Yes, this is working as designed.
We keep all rows in the Worksheet in memory using compact data structures (not an XML DOM). This has the advantage to enable a proper optimization of shared strings and styles.
I guess we could do better and achieve actual streaming by keeping in memory only shared data. This would restrict addressing of upper rows though, because the worksheet would have to be filled from top to bottom.
Did you hit an out of memory condition? With how many cells?

from fastexcel.

johanhaleby avatar johanhaleby commented on August 12, 2024

I guess we could do better and achieve actual streaming by keeping in memory only shared data. This would restrict addressing of upper rows though, because the worksheet would have to be filled from top to bottom.

This is my use case, I just want to stream row by row.

Did you hit an out of memory condition? With how many cells?

I haven't measured it but at least when using POI (non-streaming) we run into memory limits which severely impacts our server. I'm targeting something like hundred thousand rows.

from fastexcel.

ochedru avatar ochedru commented on August 12, 2024

100k rows should be ok with the current design, provided you do not have hundreds of columns. Can I suggest you try the library first? It was created in the first place to overcome the same POI shortcomings you are experiencing.

from fastexcel.

johanhaleby avatar johanhaleby commented on August 12, 2024

We have about 20 columns so I suppose I could probably give it a try.

But I still thinking streaming would be very beneficial. We're actually streaming data from a database that we use as a foundation for the Excel file. So regardless of how memory efficient fastexcel is it would be better for us to just pipe the data to the client with as little memory consumption as possible.

Given that streaming doesn't currently exists then another feature that I would find useful would be to return the data as an inputstream instead of writing to an outputstream. If everything is stored in-memory before writing to the outputstream I think that it would make sense to be able to do this. I.e. instead of doing:

try (OutputStream os = ...) {
    Workbook wb = new Workbook(os, "MyApplication", "1.0");
    ...
    wb.finish();
}

it would be nice to be able to do:

try (OutputStream os = ...) {
    Workbook wb = new Workbook("MyApplication", "1.0");...
    ...
    wb.finish(outputStream);
}

or call finish without any arguments and then make it return an InputStream:

Workbook wb = new Workbook("MyApplication", "1.0");
...
InputStream is = wb.finish();

Would that make sense? If so I could open a new issue.

from fastexcel.

ochedru avatar ochedru commented on August 12, 2024

Indeed, the OutputStream passed to the constructor is not used until the workbook is serialized in the finish() method.

from fastexcel.

rzymek avatar rzymek commented on August 12, 2024

Continuously streaming rows to OutputStream is now possible with periodical calls to flush():

try (OutputStream os = ...) {
    Workbook wb = new Workbook(os, "MyApplication", "1.0");
    Worksheet ws = wb.newWorksheet("Sheet 1");
    for(int row=0;...){
       // ....
      ws.value(0, 1, ...);
      if(row % 100 == 0){  //optionally, you could flush after every row
        ws.flush();
      }
    }
    wb.finish();
}

from fastexcel.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.