Comments (13)
I think you meant the opposite: “sequential access and not random access”.
Anyway, since fastQt is dealing with gzip files only (as opposed to gzip streams), it's possible to consider this as random access (seeking wouldn't be efficient but that's not needed there, so nevermind). It's just a matter of patching QuaZIP a little (or replacing it with a custom thin wrapper).
The current position in a gzip file is given by gztell()
(position in uncompressed data) or gzoffset()
(position in compressed data). As there's no (reliable) way to know the size of the uncompressed data without decompressing it first, the best to do here is probably to use gzoffset()
as pos()
and lseek(SEEK_END)
as size()
: that gives the current progress in the compressed fastq file, and that's fine.
from fastqt.
got it !
And I will use https://github.com/vic20/QZip !
from fastqt.
Use QZip isn't a good idea :
- it's write by one people
- the last commit is old (13 Jul 2015)
- author didn't have any activity on github after this project
- isn't package in any distribution
from fastqt.
from fastqt.
But I agree , I should use KArchive then . But this one is based on CMakeLists. Maybe we switch to cmake for the next release.
from fastqt.
@natir So You was true ... QZip is a fork of KArchive, but it works only for gzip file. Using xz or bz2 crash the application.
I try then KArchive , and it works for all . But it's not easy to install. I mean, on windows it will probably difficult. I m going now to create a branch
from fastqt.
I fixed by reading the whole file at the begining.
from fastqt.
Doesn't that mean wasting several GiBs of memory? Per file? For a simple progress bar?
from fastqt.
No ! It's really fast.
I m doing like this now :
device.readAll();
totalSize = device.pos()
Then reset the device
device.seek(0)
from fastqt.
Sorry, but I don't get it:
QByteArray QIODevice::readAll()
Reads all remaining data from the device, and returns it as a byte array.
So, the returned byte array is at least:
- if you read a WES fastq with a 60x coverage depth, 30 Mbp × 60x × 2 = 3.6 GiB uncompressed ;
- if you read a WGS fastq with a 30x coverage depth, 3 Gbp × 30x × 2 = 180 GiB uncompressed.
Multiply it by the number of concurrently loaded fastqs, by two for paired-end, and add whatever else is running on the machine.
How does it fit in a typical desktop machine memory?
from fastqt.
… and if there's a seek(0)
at the end, I guess that means the file is read twice from disk (unless there's enough memory to have it remain in the page cache — which is unlikely if you've a 180 GiB fastq in memory already) and uncompressed twice as well.
from fastqt.
Ok, I understand the problem. So what about reading line instead reading all ?
In other case, I don't find any way to count the size of the file.
I will try something else . Compute the size by opening the file as a binary random access.
from fastqt.
Reading it line per line would fix the memory issue, but not the performance issue. Same goes for considering gzip / bzip2 / xz files as random access files: it always implies uncompressing the file twice.
On the other hand, if you rely on the compressed size and position, you get a arguably less accurate estimation of the progress, but that's probably fine enough given how fast the computation is (who cares if it's 48% instead of 49% for a few ms?).
from fastqt.
Related Issues (20)
- Citing fastqt HOT 4
- Create or test flatpak
- FastQt as command line displays too much HOT 1
- fastqt command line does not work: QXcbConnection: Could not connect to display HOT 7
- sequence length distribution out of range HOT 1
- X Scale is wrong for PerBaseAnalysis and PerBaseNAalysis HOT 2
- Translation bug when compiling
- Deployment for Windows
- Pb with .gz files size HOT 1
- Sequence quality error
- Barplot error
- N content and Per base content lost some bp
- CMake support has been dropped HOT 1
- Replace all reader by seqan lib reader
- Convert Fastqt in subdirs Qt project
- Arch Installer Broken HOT 2
- format_detection finds `ubam` for bgzipped fastq file
- YAML: trailing quotes
- Replace git://anongit.kde.org with https://... HOT 9
- [Packaging/AppImage] undefined symbol: FT_Done_MM_Var
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastqt.