GithubHelp home page GithubHelp logo

Comments (13)

Arkanosis avatar Arkanosis commented on September 24, 2024

I think you meant the opposite: “sequential access and not random access”.

Anyway, since fastQt is dealing with gzip files only (as opposed to gzip streams), it's possible to consider this as random access (seeking wouldn't be efficient but that's not needed there, so nevermind). It's just a matter of patching QuaZIP a little (or replacing it with a custom thin wrapper).

The current position in a gzip file is given by gztell() (position in uncompressed data) or gzoffset() (position in compressed data). As there's no (reliable) way to know the size of the uncompressed data without decompressing it first, the best to do here is probably to use gzoffset() as pos() and lseek(SEEK_END) as size(): that gives the current progress in the compressed fastq file, and that's fine.

from fastqt.

dridk avatar dridk commented on September 24, 2024

got it !
And I will use https://github.com/vic20/QZip !

from fastqt.

natir avatar natir commented on September 24, 2024

Use QZip isn't a good idea :

  • it's write by one people
  • the last commit is old (13 Jul 2015)
  • author didn't have any activity on github after this project
  • isn't package in any distribution

from fastqt.

dridk avatar dridk commented on September 24, 2024

from fastqt.

dridk avatar dridk commented on September 24, 2024

But I agree , I should use KArchive then . But this one is based on CMakeLists. Maybe we switch to cmake for the next release.

from fastqt.

dridk avatar dridk commented on September 24, 2024

@natir So You was true ... QZip is a fork of KArchive, but it works only for gzip file. Using xz or bz2 crash the application.
I try then KArchive , and it works for all . But it's not easy to install. I mean, on windows it will probably difficult. I m going now to create a branch

from fastqt.

dridk avatar dridk commented on September 24, 2024

I fixed by reading the whole file at the begining.

from fastqt.

Arkanosis avatar Arkanosis commented on September 24, 2024

Doesn't that mean wasting several GiBs of memory? Per file? For a simple progress bar?

from fastqt.

dridk avatar dridk commented on September 24, 2024

No ! It's really fast.
I m doing like this now :

device.readAll();
totalSize = device.pos()

Then reset the device

 device.seek(0)

from fastqt.

Arkanosis avatar Arkanosis commented on September 24, 2024

Sorry, but I don't get it:

QByteArray QIODevice::readAll()
Reads all remaining data from the device, and returns it as a byte array.

So, the returned byte array is at least:

  • if you read a WES fastq with a 60x coverage depth, 30 Mbp × 60x × 2 = 3.6 GiB uncompressed ;
  • if you read a WGS fastq with a 30x coverage depth, 3 Gbp × 30x × 2 = 180 GiB uncompressed.
    Multiply it by the number of concurrently loaded fastqs, by two for paired-end, and add whatever else is running on the machine.

How does it fit in a typical desktop machine memory?

from fastqt.

Arkanosis avatar Arkanosis commented on September 24, 2024

… and if there's a seek(0) at the end, I guess that means the file is read twice from disk (unless there's enough memory to have it remain in the page cache — which is unlikely if you've a 180 GiB fastq in memory already) and uncompressed twice as well.

from fastqt.

dridk avatar dridk commented on September 24, 2024

Ok, I understand the problem. So what about reading line instead reading all ?
In other case, I don't find any way to count the size of the file.
I will try something else . Compute the size by opening the file as a binary random access.

from fastqt.

Arkanosis avatar Arkanosis commented on September 24, 2024

Reading it line per line would fix the memory issue, but not the performance issue. Same goes for considering gzip / bzip2 / xz files as random access files: it always implies uncompressing the file twice.

On the other hand, if you rely on the compressed size and position, you get a arguably less accurate estimation of the progress, but that's probably fine enough given how fast the computation is (who cares if it's 48% instead of 49% for a few ms?).

from fastqt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.