Reading gz file, means random access and not sequencial access. So, actually, I ca

Use QZip isn't a good idea : it's write by one people <li

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

No ! It's really fast. I m doing like this now : <div class="snippet-clipboard

Sorry, but I don't get it: <div class="snippet-clipboard-content notranslate posit

process fastq.gz file doesn't show percent progress about fastqt HOT 13 CLOSED

labsquare commented on September 24, 2024

process fastq.gz file doesn't show percent progress

from fastqt.

Comments (13)

Arkanosis commented on September 24, 2024

I think you meant the opposite: “sequential access and not random access”.

Anyway, since fastQt is dealing with gzip files only (as opposed to gzip streams), it's possible to consider this as random access (seeking wouldn't be efficient but that's not needed there, so nevermind). It's just a matter of patching QuaZIP a little (or replacing it with a custom thin wrapper).

The current position in a gzip file is given by gztell() (position in uncompressed data) or gzoffset() (position in compressed data). As there's no (reliable) way to know the size of the uncompressed data without decompressing it first, the best to do here is probably to use gzoffset() as pos() and lseek(SEEK_END) as size(): that gives the current progress in the compressed fastq file, and that's fine.

from fastqt.

dridk commented on September 24, 2024

got it !
And I will use https://github.com/vic20/QZip !

from fastqt.

natir commented on September 24, 2024

Use QZip isn't a good idea :

it's write by one people
the last commit is old (13 Jul 2015)
author didn't have any activity on github after this project
isn't package in any distribution

from fastqt.

dridk commented on September 24, 2024

That what I thought. But it seems to be the same code than karchive from kde. He just replace the K by a Q :) .

from fastqt.

dridk commented on September 24, 2024

But I agree , I should use KArchive then . But this one is based on CMakeLists. Maybe we switch to cmake for the next release.

from fastqt.

dridk commented on September 24, 2024

@natir So You was true ... QZip is a fork of KArchive, but it works only for gzip file. Using xz or bz2 crash the application.
I try then KArchive , and it works for all . But it's not easy to install. I mean, on windows it will probably difficult. I m going now to create a branch

from fastqt.

dridk commented on September 24, 2024

I fixed by reading the whole file at the begining.

from fastqt.

Arkanosis commented on September 24, 2024

Doesn't that mean wasting several GiBs of memory? Per file? For a simple progress bar?

from fastqt.

dridk commented on September 24, 2024

No ! It's really fast.
I m doing like this now :

device.readAll();
totalSize = device.pos()

Then reset the device

 device.seek(0)

from fastqt.

Arkanosis commented on September 24, 2024

Sorry, but I don't get it:

QByteArray QIODevice::readAll()
Reads all remaining data from the device, and returns it as a byte array.

So, the returned byte array is at least:

if you read a WES fastq with a 60x coverage depth, 30 Mbp × 60x × 2 = 3.6 GiB uncompressed ;
if you read a WGS fastq with a 30x coverage depth, 3 Gbp × 30x × 2 = 180 GiB uncompressed.
Multiply it by the number of concurrently loaded fastqs, by two for paired-end, and add whatever else is running on the machine.

How does it fit in a typical desktop machine memory?

from fastqt.

Arkanosis commented on September 24, 2024

… and if there's a seek(0) at the end, I guess that means the file is read twice from disk (unless there's enough memory to have it remain in the page cache — which is unlikely if you've a 180 GiB fastq in memory already) and uncompressed twice as well.

from fastqt.

dridk commented on September 24, 2024

Ok, I understand the problem. So what about reading line instead reading all ?
In other case, I don't find any way to count the size of the file.
I will try something else . Compute the size by opening the file as a binary random access.

from fastqt.

Arkanosis commented on September 24, 2024

Reading it line per line would fix the memory issue, but not the performance issue. Same goes for considering gzip / bzip2 / xz files as random access files: it always implies uncompressing the file twice.

On the other hand, if you rely on the compressed size and position, you get a arguably less accurate estimation of the progress, but that's probably fine enough given how fast the computation is (who cares if it's 48% instead of 49% for a few ms?).

from fastqt.

process fastq.gz file doesn't show percent progress about fastqt HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs