gatb / simka Goto Github PK

View Code? Open in Web Editor NEW

44.0 44.0 10.0 5.62 MB

Simka and SimkaMin are comparative metagenomics methods dedicated to NGS datasets.

Home Page: https://gatb.inria.fr/software/simka/

License: GNU Affero General Public License v3.0

CMake 1.22% Shell 4.58% Python 9.50% R 2.60% C++ 80.38% C 0.24% Dockerfile 1.48%

simka's People

Contributors

Stargazers

Watchers

Forkers

aboffin ysard gatouresearch frederic-mahe tlemane pythseq clemaitre jianshu93 gaetanbenoitdev mancheron

simka's Issues

Issue with test_simkaMin.py

Hello there,

I'm running into an issue when executing test_simkaMin.py after the package builds. I have tested this both with the debian package I am building as well as a fresh copy of simka untouched from this github repository. I am getting the following issue that does not result in a 100% success rate of this test:

...
python  ../../simkaMin/simkaMin.py -in  ../../example/simka_input.txt  -out __results__/k21_filter_0-1000_n1 -nb-cores 1 -max-memory 100  -kmer-size 21 -nb-kmers 1000 -bin ../../build/bin/simkaMinCore  -max-reads 0 -filter 
	- TEST ERROR:    mat_presenceAbsence_jaccard.csv
res
;A;B;C;D;E
A;0.000000;0.780808;0.940741;0.780808;0.446000
B;0.000000;0.000000;0.733333;0.000000;0.873737
C;0.000000;0.000000;0.000000;0.733333;0.970370
D;0.000000;0.000000;0.000000;0.000000;0.873737
E;0.000000;0.000000;0.000000;0.000000;0.000000

truth
;A;B;C;D;E
A;0.000000;0.783000;0.984000;0.783000;0.446000
B;0.000000;0.000000;0.918000;0.000000;0.875000
C;0.000000;0.000000;0.000000;0.918000;0.992000
D;0.000000;0.000000;0.000000;0.000000;0.875000
E;0.000000;0.000000;0.000000;0.000000;0.000000

This is currently blocking the completion of the debian package1. If there are any recommendations or remedies of how this can be patched or fixed upstream, I'd greatly appreciate it.

Many thanks,
Shayan Doust

simple test stuck

My test is stuck since one hour at:

[Merging datasets ] 81.8 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 0.0 % mem: [ 11, 11, 12] MB

I tried the command max-merge 4 but now it is stuck in another spot:
[Merging datasets ] 86.4 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 0.0 % mem: [ 10, 10, 10] MB 21 already merged (remove file /home/bio/Desktop/simka/example/simka_temp_output/simka_output_temp//merge_synchro/21.ok to merge again)

Simka k-mers frequencies table

Hello :)
Thanks to maintain Simka!
Is it possible to obtain the k-mers table counts for each sample analyzed with Simka, to then use it in R to calculate alfa diversity metrics?
Thanks,
L

Several plist problems: It installs the third party executable 'h5cc', etc.

It reinstalls h5cc that is installed by hdf5.
It installs an unconventional file lib/libhdf5.settings that is most likely misplaced or not needed.
It installs directories that mimic the build directory
Headers are installed into my build directory: /usr/ports/biology/simka/work/.build/ext/gatb-core/include/Release/hdf5/H5ACpublic.h

For example, it installs /usfr/ports/biology/simka/work/.build/ext/gatb-core/include/Release/hdf5 - a directory where I built simka.

Include <hdf5/hdf5.h> is wrong, hdf5.h is installed into ${PREFIX}/include/hdf5.h

Cannot compile source archive from release page

As it misses the .git folder in the release archive of the source, the commands git submodule init and git submodule update from the INSTALL file fail each with the following error:

fatal: Not a git repository (or any of the parent directories): .git

Visualising subsample of simka results

Hi,

I am wondering if it is possible to visualise a subsample of the simka results instead of all of the output.

Thanks,
George

convert output matrix to triangular format in R

greetings, i enjoy using simka but i have a question regarding manipulating the output files.
specifically, is there a way to convert the 'mat_abundance_braycurtis.csv' into a triangular lower matrix.
i'm really interested in working with a triangular matrix, much the same as that produced by vegan.
for example:

library(vegan)
mat <- matrix(1:9, 3, 3)
mat.dis<-vegdist(mat)
mat.dis
           1          2
2 0.11111111           
3 0.20000000 0.09090909

Reuse simka merge data on subset of samples

Hello, thanks for your work on this tool! I wanted to ask whether it was okay to reuse previous simka merge results on a subset of the data? Eg., I have already run simka on a large set of samples, if I rerun simka using the same temporary directory, but only pass a subset of the original files as the files of interest, will this give the correct distance metrics for this subset of samples?

Rscripts fail to generate heatmap images

Hi
I have made a simple comparison between two fasta files. Simka performed well, here is one of the matrix files :

;id1;id2
id1;0.000000;0.991121
id2;0.991121;0.000000

but the scripts fail to create the heatmaps :
python create_heatmaps.py ../build/bin/test.txt/

Error in graphics:::plotHclust(n1, merge, height, order(x$order), hang, :
entrée de dendrogramme incorrecte
Calls: plot -> plot.hclust ->
Exécution arrêtée

Second question :
Concerning the results, is it a factor or a percentage ?

Thanks for the help.

run does not finish

Hi,

I'm running simka on around 1000 samples with a total of 45 billion reads.

simka -in simka_input.txt -out results_simka -out-tmp temp_output -simple-dist -max-count 6 -max-merge 18 -nb-cores 112 -max-memory 100000

In the first two days it creates the following folders:

drwxr-xr-t   2 28039 Jun 17 14:32 input
drwxr-xr-t   2     0 Jun 17 14:32 merge_synchro
drwxr-xr-t   2     0 Jun 17 14:32 stats
drwxr-xr-t   2     0 Jun 17 14:32 job_count
drwxr-xr-t   2     0 Jun 17 14:32 job_merge
-rw-r--r--   1 10989 Jun 17 14:32 datasetIds
-rw-r--r--   1 46824 Jun 17 14:38 config.h5
drwxr-xr-t 344  8782 Jun 17 14:38 solid
drwxr-xr-t   2 38016 Jun 19 06:13 log
drwxr-xr-t   7   140 Jun 19 06:15 temp
drwxr-xr-t   2 31850 Jun 19 06:15 kmercount_per_partition
drwxr-xr-t   2 30854 Jun 19 06:15 count_synchro

But since June 19th nothing has happened, but the job is still running. Is this normal? Should I keep waiting?

Job does not end

Sometimes, especially when we have a lot of input files, the job stops at the merge stage (v1.3.2 and v1.4.0).

Inches

Dear developers,

I find really strange to set the figures in inches. The SI system is in meters, even NASA performs science in meters.
Can we have at least the option to set it in cm ?

.fastq.gz support

Hi, I run into this error via using this input (build from the latest version)

WP1310: /condo/ieg/qiqi/Haibei_metaG/WP1310_paired_1.fastq.gz ; /condo/ieg/qiqi/Haibei_metaG/WP1310_paired_2.fastq.gz

the error is: ERROR: Can't open dataset: WP1310

Any idea why?

Testing ran successfully.

It takes me half a day to find out why. Support for fastq.gz is not ready?

Thanks,

Jianshu

Test regression on Arm64 (Debian Med)

Hello,

Simka has been packaged1 from the Debian Med Team. However, there is a regression only on Arm64 architecture that is stopping unstable to testing migration. I'd rather sort this out before the next Debian release freeze.

Here is the log. The autopkgtest script is here (this is causing a regression and is what gets executed). Any ideas?

Thanks!

Which kmers are used?

Can the kmers corresponding to a partition be recovered?

-max-reads 0

Hello! I hope you are well. simka looks like a great tool!

How are the samples normalized with the -max-reads 0 flag? I did not see a description of this in the paper.

Have you considered normalization options such as suggested here:
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531 ?

Or transformation options suggested here:
https://www.frontiersin.org/articles/10.3389/fmicb.2017.02224/full ?

Since the default is not to normalize, is the intended workflow to subset all samples to the same number of reads prior to running simka?

Have you tested how much the size discrepancies actually affect the various distance metrics?

Thanks for the clarification.

best,
Roth

Bioconda package

I'd like to congratulate the developers on a great metagenomics software tool. I'd like to recommend the developers add simka to the bioconda channel as a package as it would enable easier installation and adoption by the community.

Best wishes,
Muslih.

Add option to trim all reads to a given length

Hi,

I would like to run Simka on multiple samples with the -max-reads option to deal with various sequencing depth.
However, the samples have also various read length.
I guess this may slighlty bias the results as longer reads increase the total number of kmers.
Would it be possible to add an option to trim all reads to a given length?

Florian

Issues during linking stage

Hello,

I am packaging simka as a debian package1.

Cloning and compiling directly from source completes successfully. However, with regards to debian policy, gatb-core is already available as a debian package in the repository. A debian package library takes precedence over gatb-core being built with simka. As a result, I have made a patch2 to which cmake is adapted to use the system available gatb-core instead of that in thirdparty. This seemingly brings up issues during the linking stage with SimkaAlgorithm.cpp3.

I would be grateful if this patch is looked at or I am pointed into the correct path. I am unsure if gatb-core or some other cmake in gatb-core injects something into simka that causes a successful build.

Thanks,
Shayan Doust

Installs hdf headers

include/hdf5/H5ACpublic.h
etc.

are installed by hdf5.

SimkaMin output file symmetry

Dear SimkaMin Dev,

I recently stumbled upon your Simkamin tool and tried to use it to compare my 4000 datasets against each other to get information on the similarity of these samples.

I found something odd in the output matrices. They don’t seem to be symmetric. Where the upper triangular contains mostly values between 0.0 and 1, the lower triangular matrix contains mostly but not exclusively zeros. I would like understand if the lower triangular matrix would be empty but a non-symmetric output is strange.

In fact it seems that there is always a subpart that is symmetric but its mostly not.

I attached a screenshot of parts of the matrix.

Do you know what to do with this information? Should I only use the column-based distances?

Best and thanks,
Hans

multifasta file as input

--Hi,

i have to compare a multifasta file (200000 sequences) with a chromosomic region. I have already done that with kmer-db tool (https://doi.org/10.1093/bioinformatics/bty610) and i need to compare the results of kmer-db with other tool such as simka.
But i don't find the correct command to do this.
Kmer-db compute a list of distances between each sequence of the multifasta file and the chromosomic region.

this is my input file:

simka_input.txt:
A: multifasta.fasta
B: chr1_region.fasta

the command line i used:
simka -in ./simka_input.txt -out ./simka_results/ -out-tmp ./simka_temp_output -max-memory 128000 -nb-cores 24

in the simka_results directory: zcat mat_abundance_jaccard.csv.gz
;A;B
A;0.000000;0.999993
B;0.999993;0.000000

i have only 2 values whereas i have 200000 sequences in my input file, i don't understand. it seems that simka concatenates all the sequences of the multifasta file and then compares with the other file. How to avoid that ?

thank you --

simple test stuck

Hi,
The simple test has been stuck for some hours and all files in the the temporary folder merge_synchro/ are empty. Nothing has changed in my folders since 5 hours ago.
Here is the error file :
^M[Counting datasets ] 0 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: -1.0 % mem: [ 12, 12, 12] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 83.3 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 80.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 80.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 80.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 80.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 80.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 80.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 80.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 20 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 80.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 40 % elapsed: 0 min 1 sec remaining: 0 min 2 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 60 % elapsed: 0 min 1 sec remaining: 0 min 1 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 80 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Counting datasets ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 16.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 0 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: -1.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 2.08 % elapsed: 0 min 0 sec remaining: 0 min 10 sec cpu: 19.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 2.08 % elapsed: 0 min 0 sec remaining: 0 min 10 sec cpu: 19.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 4.17 % elapsed: 0 min 0 sec remaining: 0 min 5 sec cpu: 19.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 4.17 % elapsed: 0 min 0 sec remaining: 0 min 5 sec cpu: 19.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 6.25 % elapsed: 0 min 0 sec remaining: 0 min 3 sec cpu: 19.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 6.25 % elapsed: 0 min 0 sec remaining: 0 min 3 sec cpu: 19.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 8.33 % elapsed: 0 min 0 sec remaining: 0 min 2 sec cpu: 19.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 8.33 % elapsed: 0 min 0 sec remaining: 0 min 2 sec cpu: 19.0 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 10.4 % elapsed: 0 min 0 sec remaining: 0 min 2 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 10.4 % elapsed: 0 min 0 sec remaining: 0 min 2 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 12.5 % elapsed: 0 min 0 sec remaining: 0 min 2 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 12.5 % elapsed: 0 min 0 sec remaining: 0 min 2 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 14.6 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 14.6 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 16.7 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 16.7 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 18.8 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 18.8 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 20.8 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 20.8 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 22.9 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 22.9 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 25 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 25 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 25 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 27.1 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 27.1 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 29.2 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 29.2 % elapsed: 0 min 0 sec remaining: 0 min 1 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 31.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 31.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 33.3 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 33.3 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 35.4 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 35.4 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 37.5 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 37.5 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 39.6 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 39.6 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 41.7 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 41.7 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 43.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 43.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 45.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 45.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 47.9 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 47.9 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 50 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 50 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 50 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 52.1 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 52.1 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 54.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 54.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 56.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 56.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 58.3 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 58.3 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 60.4 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 60.4 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 18.2 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 62.5 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 17.4 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 62.5 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 17.4 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 64.6 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 17.4 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 64.6 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 17.4 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 66.7 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 17.4 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 66.7 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 17.4 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 68.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 68.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 70.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 70.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 72.9 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 72.9 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 75 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 75 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 75 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 77.1 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 77.1 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 79.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 79.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 81.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 81.2 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 83.3 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 83.3 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 85.4 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 85.4 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 87.5 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 87.5 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 89.6 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 89.6 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 91.7 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 91.7 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 93.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 93.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 95.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ^M[Merging datasets ] 95.8 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 21.7 % mem: [ 13, 13, 13] MB ~

and the log file :
`
Creating input
Nb input datasets: 5

Reads per sample used: all

Maximum ressources used by Simka:
- 5 simultaneous processes for counting the kmers (per job: 9 cores, 1000 MB memory)
- 48 simultaneous processes for merging the kmer counts (per job: 1 cores, memory undefined)

Nb partitions: 48 partitions

Counting k-mers... (log files are ./simka_temp_output/simka_output_temp//log/count_*)

Kmer repartition
0: 270
1: 294
2: 300
3: 323
4: 344
5: 324
6: 275
7: 239
8: 296
9: 298
10: 297
11: 304
12: 278
13: 312
14: 311
15: 320
16: 362
17: 320
18: 303
19: 340
20: 293
21: 269
22: 358
23: 293
24: 291
25: 274
26: 344
27: 305
28: 323
29: 308
30: 302
31: 277
32: 300
33: 309
34: 338
35: 278
36: 310
37: 300
38: 284
39: 330
40: 303
41: 280
42: 317
43: 324
44: 286
45: 388
46: 294
47: 282

Merging k-mer counts and computing distances... (log files are /simka_temp_output/simka_output_temp//log/merge_*)
`

How long should this test take ?

Thank you very much in advance,
Best regards

Can't open object can't open my reads

Hi there,
I built an input file of my reads. It said can't read my reads with the same error when I ran simka. The example tested successfully. Could someone show me where the problem is? Thanks very much!

Here is the error:

Creating input

Nb input datasets: 1

HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:

#000:

/scratch/fwang/simka-v1.5.3-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c

line 425 in H5Aopen(): unable to load attribute info from object header

for attribute: 'version'

 major: Attribute

 minor: Can't open object

#1:

/scratch/fwang/simka-v1.5.3-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Aint.c

line 433 in H5A__open(): unable to load attribute info from object

header for attribute: 'version'

 major: Attribute

 minor: Can't open object

#2:

/scratch/fwang/simka-v1.5.3-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Oattribute.c

line 515 in H5O__attr_open_by_name(): can't locate attribute: 'version'

 major: Attribute

 minor: Object not found

HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:

#000:

/scratch/fwang/simka-v1.5.3-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c

line 704 in H5Aget_space(): not an attribute

 major: Invalid arguments to routine

 minor: Inappropriate type

HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:

#000:

/scratch/fwang/simka-v1.5.3-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5S.c

line 1013 in H5Sget_simple_extent_dims(): not a dataspace

 major: Invalid arguments to routine

 minor: Inappropriate type

HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:

#000:

/scratch/fwang/simka-v1.5.3-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5A.c

line 662 in H5Aread(): not an attribute

 major: Invalid arguments to routine

 minor: Inappropriate type

ERROR: Can't open dataset: ID1

Here are the IDs of my reads as input:

ID1: 2018031910_paired_1.fasta ; 2018031910_paired_2.fasta

ID2: 201803193_paired_1.fasta ; 201803193_paired_2.fasta

ID3: 20180319_paired_1.fasta ; 20180319_paired_2.fasta

ID4: 20180403_paired_1.fasta ; 20180403_paired_2.fasta

ID5: 20180405_paired_1.fasta ; 20180405_paired_2.fasta

ID6: 20180410_paired_1.fasta ; 20180410_paired_2.fasta

ID7: 2018041210_paired_1.fasta ; 2018041210_paired_2.fasta

ID8: 201804123_paired_1.fasta ; 201804123_paired_2.fasta

ID9: 20180412_paired_1.fasta ; 20180412_paired_2.fasta

ID10: 2018041710_paired_1.fasta ; 2018041710_paired_2.fasta

ID11: 201804173_paired_1.fasta ; 201804173_paired_2.fasta

ID12: 20180417_paired_1.fasta ; 20180417_paired_2.fasta

ID13: 20180419_paired_1.fasta ; 20180419_paired_2.fasta

ID14: 20180424_paired_1.fasta ; 20180424_paired_2.fasta

ID15: 2018042610_paired_1.fasta ; 2018042610_paired_2.fasta

ID16: 201804263_paired_1.fasta ; 201804263_paired_2.fasta

ID17: 20180426_paired_1.fasta ; 20180426_paired_2.fasta

ID18: 20180502_paired_1.fasta ; 20180502_paired_2.fasta

ID19: 20180503_paired_1.fasta ; 20180503_paired_2.fasta

ID20: 2018050810_paired_1.fasta ; 2018050810_paired_2.fasta

ID21: 201805083_paired_1.fasta ; 201805083_paired_2.fasta

ID22: 20180508_paired_1.fasta ; 20180508_paired_2.fasta

ID23: 2018051110_paired_1.fasta ; 2018051110_paired_2.fasta

ID24: 201805113_paired_1.fasta ; 201805113_paired_2.fasta

ID25: 20180511_paired_1.fasta ; 20180511_paired_2.fasta

ID26: 20180515_paired_1.fasta ; 20180515_paired_2.fasta

ID27: 20180517_paired_1.fasta ; 20180517_paired_2.fasta

ID28: 2018052210_paired_1.fasta ; 2018052210_paired_2.fasta

ID29: 201805223_paired_1.fasta ; 201805223_paired_2.fasta

ID30: 20180522_paired_1.fasta ; 20180522_paired_2.fasta

ID31: 20180524_paired_1.fasta ; 20180524_paired_2.fasta

ID32: 2018052910_paired_1.fasta ; 2018052910_paired_2.fasta

ID33: 201805293_paired_1.fasta ; 201805293_paired_2.fasta

ID34: 20180529_paired_1.fasta ; 20180529_paired_2.fasta

Paired reads input file outputting duplicate rows

Hi, thanks for this useful tool! I'm running into an error with a paired reads file, organized as:

1_1b28d4: 1_1b28d4-t_1.fq.gz ; 1_1b28d4-t_2.fq.gz 1_89e808: 1_89e808-t_1.fq.gz ; 1_89e808-t_2.fq.gz
...

I would expect for my output matrices to have one row for each sample, like:

	1_1b28d4	1_89e808
1_1b28d4	0	0.774246
1_89e808	0.774246	0

However, instead my table has duplicate rows for each of the paired ends, which cannot be distinguished:

	1_1b28d4	1_89e808
1_1b28d4	0	0.774246
1_89e808	0.774246	0
1_1b28d4	0	0.774246
1_89e808	0.774246	0

Is there a way to work around this?

run for multiple samples on different node on super computer

Hi,
I am not sure how to run a lot of samples on different computer node at the same time and then pool those result to calculate distance matrix any idea how?

Thanks,

Jianshu

simka crashes with empty files

Hi,

Simka crashes with a segfault when using an empty file.

$touch sample.fastq
$echo 'sample: sample.fastq' > simka_in.txt
$/usr/local/bin/simka -in simka_in.txt -out-tmp /tmp

Creating input
        Nb input datasets: 1

Reads per sample used: all


Maximum ressources used by Simka:
         - 1 simultaneous processes for counting the kmers (per job: 16 cores, 5000 MB memory)
         - 16 simultaneous processes for merging the kmer counts (per job: 1 cores, memory undefined)

Segmentation fault

An explicit error message would be welcomed.

Thanks,
Florian

Fails to run

lala
[0.0%] Computing k-mer spectrums [Time: 0:00:00]A job failed (simka_test_tmp/simka_database/kmer_spectrums/A/), exiting simka

test scripts hang with high cores count

Greetings,

running the test scripts of simka while stabilizing the upcoming Debian 11, the CI team noted that the program hangs under certain circumstances. Further investigations on my end seemed to reveal that the test was hanging past 9 cores, so as a work around, we are capping the test suite to use no further than 8 cores for the moment. You can refer to Debian bug #986256 for more details.

Do you think this would be an issue within simka, or more something intrinsic to the test data topology?

Kind Regards,
Étienne.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble