czbiohub-sf / midas2 Goto Github PK

View Code? Open in Web Editor NEW

31.0 31.0 9.0 323.14 MB

Metagenomic Intra-Species Diversity Analysis 2

License: MIT License

Dockerfile 1.02% Python 92.58% Makefile 0.15% Shell 6.25%

midas2's People

Contributors

Stargazers

Watchers

Forkers

bsmith89 djkt604 francklejzerowicz alienzj ncezid-biome uel3 zhaoc1 noriakis wook2014

midas2's Issues

Errors while take a more customized approach to database loading the database

run_species error

Hi! I download one selected species database, using code midas2 database --download --midasdb_name gtdb --midasdb_dir my_midasdb_Catellicoccus --species_list species_list.txt (only GTDB have the species i want ,and there is only one species number in species_list.txt )
After that , i wanna run SPECIES flow.But it appear 'Processing 64000 queries' constantly

1654588151.2: Single sample abundant species profiling in subcommand run_species with args
1654588151.2: {
1654588151.2: "subcommand": "run_species",
1654588151.2: "force": false,
1654588151.2: "debug": true,
1654588151.2: "zzz_worker_mode": false,
1654588151.2: "batch_branch": "master",
1654588151.2: "batch_memory": 378880,
1654588151.2: "batch_vcpus": 48,
1654588151.2: "batch_queue": "pairani",
1654588151.2: "batch_ecr_image": "pairani:latest",
1654588151.2: "midas_outdir": "midas2_output",
1654588151.2: "sample_name": "DYG1",
1654588151.2: "r1": "reads/DYG1.decon_1.fastq.gz",
1654588151.2: "r2": "reads/DYG1.decon_2.fastq.gz",
1654588151.2: "midasdb_name": "gtdb",
1654588151.2: "midasdb_dir": "my_midasdb_Catellicoccus",
1654588151.2: "word_size": 28,
1654588151.2: "aln_mapid": null,
1654588151.2: "aln_cov": 0.75,
1654588151.2: "marker_reads": 2,
1654588151.2: "marker_covered": 2,
1654588151.2: "max_reads": null,
1654588151.2: "num_cores": 8
1654588151.2: }
1654588151.2: Create OUTPUT directory for DYG1.
1654588151.2: 'rm -rf midas2_output/DYG1/species'
1654588151.2: 'mkdir -p midas2_output/DYG1/species'
1654588151.2: Create TEMP directory for DYG1.
1654588151.2: 'rm -rf midas2_output/DYG1/temp/species'
1654588151.2: 'mkdir -p midas2_output/DYG1/temp/species'
1654588151.3: MIDAS2::fetch_midasdb_files::start
1654588161.3: MIDAS2::fetch_midasdb_files::finish
1654588161.5: MIDAS2::map_reads_hsblastn::start
[HS-BLASTN] Loading database.
Loading /media/atm3/user02/MIDAS2.0/sample/my_midasdb_Catellicoccus/markers/phyeco/phyeco.fa.sequence, size = 0.4GB
Loading /media/atm3/user02/MIDAS2.0/sample/my_midasdb_Catellicoccus/markers/phyeco/phyeco.fa.bwt, size = 0.8GB
Loading /media/atm3/user02/MIDAS2.0/sample/my_midasdb_Catellicoccus/markers/phyeco/phyeco.fa.sa, size = 0.8GB
[HS-BLASTN] done. Time elapsed: 1.70 secs.

[HS-BLASTN] Processing /dev/stdin.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.
Processing 64000 queries.

UHGG database download error

Hi! I am trying to download the UHGG database for all species and I am receiving an error. I already ran
midas2 database --init --midasdb_name uhgg --midasdb_dir /wynton/protected/scratch/clairedubin/midasdb_uhgg with no errors.

midas2 database --download --midasdb_name uhgg  --midasdb_dir /wynton/protected/scratch/clairedubin/midasdb_uhgg --species all`
1689099642.8:    Downloading MIDAS database for sliced species 3 with 12 cores in total::start
1689099642.8:    Downloading MIDAS database for sliced species 10 with 12 cores in total::start
1689099643.0:    Downloading MIDAS database for sliced species 4 with 12 cores in total::start
1689099643.1:    Downloading MIDAS database for sliced species 2 with 12 cores in total::start
1689099643.2:    Downloading MIDAS database for sliced species 1 with 12 cores in total::start
1689099643.5:    Downloading MIDAS database for sliced species 11 with 12 cores in total::start
1689099643.5:    Downloading MIDAS database for sliced species 8 with 12 cores in total::start
1689099643.6:    Downloading MIDAS database for sliced species 0 with 12 cores in total::start
1689099643.6:    Downloading MIDAS database for sliced species 7 with 12 cores in total::start
1689099643.7:    Downloading MIDAS database for sliced species 9 with 12 cores in total::start
1689099643.7:    Downloading MIDAS database for sliced species 5 with 12 cores in total::start
1689099643.7:    Downloading MIDAS database for sliced species 6 with 12 cores in total::start

Traceback (most recent call last):
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/site-packages/midas2/__main__.py", line 28, in <module>
    main()
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/site-packages/midas2/__main__.py", line 24, in main
    return subcommand_main(subcommand_args)
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/site-packages/midas2/subcommands/database.py", line 148, in main
    download_midasdb(args)
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/site-packages/midas2/subcommands/database.py", line 37, in download_midasdb
    download_midasdb_worker(args)
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/site-packages/midas2/subcommands/database.py", line 91, in download_midasdb_worker
    midasdb.fetch_files("pangenome", species_id_list)
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/site-packages/midas2/models/midasdb.py", line 167, in fetch_files
    return self.fetch_tarball(filename, list_of_species)
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/site-packages/midas2/models/midasdb.py", line 192, in fetch_tarball
    md5_fetched = file_md5sum(_fetched_file)
  File "/wynton/protected/home/lynchlab/clairedubin/anaconda3/envs/midas2/lib/python3.7/site-packages/midas2/models/midasdb.py", line 341, in file_md5sum
    return md5(open(local_file, "rb").read()).hexdigest()
FileNotFoundError: [Errno 2] No such file or directory: '/wynton/protected/scratch/clairedubin/midasdb_uhgg/pangenomes_filtered/100007/centroids.ffn'

Here is the output of ls /wynton/protected/scratch/clairedubin/midasdb_uhgg/:

chunks		  
genomes.tsv 
markers_models 
metadata.tsv
gene_annotations  
markers      
md5sum.json     
pangenomes

So there is no pangenomes_filtered directory, but there is a pangenomes directory. I didn't have this error with an older version of MIDAS2, so I'm wondering if a recent update is having an issue with directory creation or naming. I also receive the same error when attempting to download for select species instead of all species.

Use bining data as ref-genome

Thanks for you remarkable work. I use midas2 to analyze CNV，but i found that not all the sample can detect CNV. So i want to use my data after bining as ref-database, is that feasible for CNV analyze？

exception handling in multiprocessing

During the multiprocessing_map(), the exception happened in the (children) process wouldn't be raised until all processes finished. I set up four ValueErrors in the branch 2021-01-13-exception to reproduce this issue.

Ideally, we want to terminate the multiprocessing, once there is an exception in the process.

`gene_info.txt` only for centroid genes for the future (

In the pan-genome workflow, need to figure out where does the gene_info.txt used? For species with large number of genomes (e.g. Species 102506 with 8288 total genomes).

There seems like no usage of this gene_info.txt (for all the genes) in the gene workflow of MIDAS.

For future build up the database, we can generate the gene_info.txt for only the centroids.

handle 99% clusters that do not contain their own centroids

Let cX and cY be 99% clusters with centroids X and Y, respectively. Normally X is an element of cX and does not belong to any other 99% clusters. In some rare degenerate cases, X is also a member of cY.

Subsequent coarser reclustering at 95, 90, ... ANI for the elements of cX would then produce incorrect results. We need to modify the reclustering assignments to handle this case correctly.

There is a hypothesis this case occurs primarily when contig IDs clash between genomes, so ongoing work to rename contigs during import so as to prevent such clashes could possibly address this problem. The hypothesis is just a wild guess at this point.

Install from source needs additional channel

I was successfully able to install MIDAS2 using the "From Source" instructions only after modifying the midas2.yml file to include your anaconda channel "zhaoc1"

https://midas2.readthedocs.io/en/latest/installation.html#from-source

Without the zhaoc1 channel, I get the following error beacuse midas2 conda package is only available from zhaoc1

Solving environment: failed

ResolvePackageNotFound: 
  - midas2=1.0.9

Additionally, the install provided under "Quickstart" didn't work for me. This was related to an issue with Python version (python 3.7.9 was installed, but python 3.9 was required for something). This may have something to do with v1.0.0 (which is indicated in these instructions) and v1.0.9.

https://midas2.readthedocs.io/en/latest/quickstart.html#install-midas2

Finally, the installation instructions under "Conda" didn't practically work for me as the "solving environment" step hung for several hours before I killed it.

https://midas2.readthedocs.io/en/latest/installation.html#conda

I look forward to using the tool.

Mike

support AWS instance types other than r5.12xlarge

This should be easy with the recently released update to aegea. Involves removing the magic numbers 838 and 1715518 from aws_batch_init.

add genome_info when clean import genomes

the snps workflow needs metedata of the representative, e.g. genome_length, genome_name (if we have it), contig_counts etc. It's better to have this piece of information when we import the genomes into iggtools.

add MIDAS shared analysis code to analysis/midas.py

In future perhaps we can group together the MIDAS shared analysis code in analysis/midas.py with the view of separating code paths that are used for analysis from those used for DB construction --- because analysis paths would likely run outside of AWS at some point. [from pull requests #30 ]

merge_snp issue still exist

Hi,

From your command line history midas2_output/LRDYA/snps/145629.snps.tsv.lz4, can you make sure the the sample name provided to the merge_snps command --samples_list list_of_samples.tsv is LRDYA?

Thanks,
Chunyu

Originally posted by @zhaoc1 in #88 (comment)

i reference this issue but this kind of bug still exist.

command：
midas2 merge_snps --samples_list list_of_samples.tsv --midasdb_name uhgg --midasdb_dir ~/database/midas2/ midas2/ --debug

issue：
1661742862.8: Across samples population SNV calling in subcommand merge_snps with args
1661742862.8: {
1661742862.8: "subcommand": "merge_snps",
1661742862.8: "force": false,
1661742862.8: "debug": true,
1661742862.8: "zzz_worker_mode": false,
1661742862.8: "batch_branch": "master",
1661742862.8: "batch_memory": 378880,
1661742862.8: "batch_vcpus": 48,
1661742862.8: "batch_queue": "pairani",
1661742862.8: "batch_ecr_image": "pairani:latest",
1661742862.8: "midas_outdir": "midas2/",
1661742862.8: "samples_list": "list_of_samples.tsv",
1661742862.8: "midasdb_name": "uhgg",
1661742862.8: "midasdb_dir": "/home/lbl/database/midas2/",
1661742862.8: "species_list": null,
1661742862.8: "genome_depth": 5.0,
1661742862.8: "genome_coverage": 0.4,
1661742862.8: "sample_counts": 2,
1661742862.8: "site_depth": 5,
1661742862.8: "site_ratio": 3.0,
1661742862.8: "site_prev": 0.9,
1661742862.8: "snv_type": "common",
1661742862.8: "snp_pooled_method": "prevalence",
1661742862.8: "snp_maf": 0.05,
1661742862.8: "snp_type": "bi, tri, quad",
1661742862.8: "locus_type": "any",
1661742862.8: "num_cores": 16,
1661742862.8: "chunk_size": 1000000,
1661742862.8: "advanced": false,
1661742862.8: "robust_chunk": false
1661742862.8: }
1661742863.7: 248 species pass the filter
1661742863.7: Create OUTPUT directory.
1661742863.7: 'rm -rf midas2/snps'
1661742863.7: 'mkdir -p midas2/snps'
1661742863.7: Create TEMP directory.
1661742863.7: 'rm -rf midas2/temp/snps'
1661742863.7: 'mkdir -p midas2/temp/snps'
1661742870.0: MIDAS2::write_species_summary::start
1661742870.0: MIDAS2::write_species_summary::finish
1661742870.6: MIDAS2::design_chunks::start
Traceback (most recent call last):
File "/home/lbl/miniconda3/envs/midas2.0/bin/midas2", line 8, in
sys.exit(main())
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/main.py", line 24, in main
return subcommand_main(subcommand_args)
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 664, in main
merge_snps(args)
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 658, in merge_snps
raise error
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 639, in merge_snps
arguments_list = design_chunks(species_ids_of_interest, midas_db)
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 220, in design_chunks
all_site_chunks = multithreading_map(design_chunks_per_species, [(sp, midas_db) for sp in dict_of_species.values()], num_cores) #<---
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/common/utils.py", line 540, in multithreading_map
return _multi_map(func, items, num_threads, ThreadPool)
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/common/utils.py", line 520, in _multi_map
return p.map(func, items, chunksize=1)
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 205, in design_chunks_per_species
return sp.compute_snps_chunks(midas_db, chunk_size, "merge")
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/models/species.py", line 84, in compute_snps_chunks
chunks_of_sites = load_chunks_cache(local_file)
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/models/species.py", line 181, in load_chunks_cache
chunks_dict = json.load(stream)
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/json/init.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/json/init.py", line 348, in loads
return _default_decoder.decode(s)
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/lbl/miniconda3/envs/midas2.0/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
how to fix it？

merge_snps error

Hello，i downloaded one database for selected species of '145629' and finished run_snps step in following code
midas2 run_snps --sample_name ${line}-1 reads/${line}.decon_1.fastq.gz -2 reads/${line}.decon_2.fastq.gz --midasdb_name gtdb --midasdb_dir my_midasdb_Catellicoccus --species_list 145629 --select_threshold=-1 --num_cores 10 --advanced --ignore_ambiguous midas2_output
But there is somthing wrong with merge_snps:
midas2 merge_snps --samples_list list_of_samples.tsv --midasdb_name gtdb --midasdb_dir my_midasdb_Catellicoccus --genome_coverage 0.7 --num_cores 10 midas2_output/merge

1654743241.2: Across samples population SNV calling in subcommand merge_snps with args
1654743241.2: {
1654743241.2: "subcommand": "merge_snps",
1654743241.2: "force": false,
1654743241.2: "debug": false,
1654743241.2: "zzz_worker_mode": false,
1654743241.2: "batch_branch": "master",
1654743241.2: "batch_memory": 378880,
1654743241.2: "batch_vcpus": 48,
1654743241.2: "batch_queue": "pairani",
1654743241.2: "batch_ecr_image": "pairani:latest",
1654743241.2: "midas_outdir": "midas2_output/merge",
1654743241.2: "samples_list": "list_of_samples.tsv",
1654743241.2: "midasdb_name": "gtdb",
1654743241.2: "midasdb_dir": "my_midasdb_Catellicoccus",
1654743241.2: "species_list": null,
1654743241.2: "genome_depth": 5.0,
1654743241.2: "genome_coverage": 0.7,
1654743241.2: "sample_counts": 2,
1654743241.2: "site_depth": 5,
1654743241.2: "site_ratio": 3.0,
1654743241.2: "site_prev": 0.9,
1654743241.2: "snv_type": "common",
1654743241.2: "snp_pooled_method": "prevalence",
1654743241.2: "snp_maf": 0.05,
1654743241.2: "snp_type": "bi, tri, quad",
1654743241.2: "locus_type": "any",
1654743241.2: "num_cores": 10,
1654743241.2: "chunk_size": 1000000,
1654743241.2: "advanced": false,
1654743241.2: "robust_chunk": false
1654743241.2: }
1654743241.6: 1 species pass the filter
1654743241.6: Create OUTPUT directory.
1654743241.6: 'rm -rf midas2_output/merge/snps'
1654743241.6: 'mkdir -p midas2_output/merge/snps'
1654743241.6: Create TEMP directory.
1654743241.6: 'rm -rf midas2_output/merge/temp/snps'
1654743241.6: 'mkdir -p midas2_output/merge/temp/snps'
1654743241.7: MIDAS2::write_species_summary::start
1654743241.7: MIDAS2::write_species_summary::finish
1654743242.7: MIDAS2::design_chunks::start
1654743242.7: ================= Total number of compute chunks: 2
1654743242.7: MIDAS2::design_chunks::finish
1654743242.7: MIDAS2::multiprocessing_map::start
1654743242.8: MIDAS2::process::145629-0::start snps_worker
1654743242.8: MIDAS2::chunk_worker::145629-0::start accumulate_samples
1654743242.8: MIDAS2::process::145629-1::start snps_worker
1654743242.8: MIDAS2::process::145629--1::wait collect_chunks
1654743242.8: MIDAS2::chunk_worker::145629-1::start accumulate_samples
1654743242.8: WARNING: Non-zero exit code 141 from reader of midas2_output/LRDYA/snps/145629.snps.tsv.lz4.
1654743243.1: WARNING: Non-zero exit code 141 from reader of midas2_output/LRDYA/snps/145629.snps.tsv.lz4.
1654743243.1: MIDAS2::process::145629--1::start collect_chunks
cat: midas2_output/merge/temp/snps/145629/cid.0_snps_info.tsv.lz4: No such file or directory
cat: midas2_output/merge/temp/snps/145629/cid.1_snps_info.tsv.lz4: No such file or directory
1654743243.1: Bugs in the codes, keep the outputs for debugging purpose.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 270, in process
snps_worker(species_id, chunk_id)
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 293, in snps_worker
chunk_worker(chunks_of_sites[chunk_id][0])
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 345, in chunk_worker
accumulate(accumulator, proc_args)
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 378, in accumulate
for row in select_from_tsv(stream, schema=curr_schema, selected_columns=snps_pileup_basic_schema, result_structure=dict):
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/common/utils.py", line 392, in select_from_tsv
assert False, f"Line {i + j} has {len(values)} columns; was expecting {len(headers)}."
AssertionError: Line 0 has 13 columns; was expecting 8.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/media/home/user02/miniconda3/envs/midas2.0/bin/midas2", line 8, in
sys.exit(main())
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/main.py", line 24, in main
return subcommand_main(subcommand_args)
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 664, in main
merge_snps(args)
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 653, in merge_snps
raise error
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 643, in merge_snps
proc_flags = multiprocessing_map(process, arguments_list, args.num_cores)
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/common/utils.py", line 532, in multiprocessing_map
return _multi_map(func, items, num_procs, multiprocessing.Pool)
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/site-packages/midas2/common/utils.py", line 520, in _multi_map
return p.map(func, items, chunksize=1)
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/media/home/user02/miniconda3/envs/midas2.0/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
AssertionError: Line 0 has 13 columns; was expecting 8.

Thank you so much

[clean up] combine build_gene_features with annotate_genes

and test it over a small set of species.

bwa-meme

From the MIDAS2 paper:

with database customization and Bowtie2 alignment taking up to 75% of run time

Given the apparent computational bottleneck of bowtie2, are there any plans to add bwa-meme as an alternative aligner?

merge function writing chunks in database folder which may not be writeable

Hello,

We are running MIDAS2 on our computer cluster, where I was given temporary rights to install the database in a shared location, for many users to enjoy your tool. Then, the admins removed my rights to write in the install location (which also relieves my storage quota ;) ).

The problem is that when running the merge command, MIDAS2 is trying to write chunks at this database location, where a user may not have write access rights.

See below the stderr for the job affected by this (and the command):

Command

midas2 merge_snps \
    --num_cores 12 \
    --midasdb_name gtdb \
    --midasdb_dir /cluster/shared/databases/MIDSA2/latest/gtdb \
    --genome_depth 5.0 \
    --sample_counts 2 \
    --site_depth 2 \
    --site_ratio 3.0 \
    --site_prev 0.9 \
    --snv_type common \
    --snp_pooled_method prevalence \
    --snp_maf 0.1 \
    --snp_type {bi,tri,quad} \
    --locus_type any \
    --force \
    --samples_list \
    ${SCRATCH_FOLDER}/cluster/projects/nn8075k/federica/outputs/midas2_merge/after_midas2_78f86426289cf46f1cc5/illumina/gtdb/all_species/sample_list.txt \
    ${SCRATCH_FOLDER}/cluster/projects/nn8075k/federica/outputs/midas2_merge/after_midas2_78f86426289cf46f1cc5/illumina/gtdb/all_species

stderr

$ cat outputs/midas2_merge/after_midas2_78f86426289cf46f1cc5/jobs/output/slurm-midas2_merge.fdprf.mds2_78f86426289cf46f1cc5._illumina-midas2_snps_7425188.e
The following modules were not unloaded:
  (Use "module --force purge" to unload all):

  1) StdEnv
1673959603.1:  Across samples population SNV calling in subcommand merge_snps with args
1673959603.1:  {
1673959603.1:      "subcommand": "merge_snps",
1673959603.1:      "force": true,
1673959603.1:      "debug": false,
1673959603.1:      "zzz_worker_mode": false,
1673959603.1:      "batch_branch": "master",
1673959603.1:      "batch_memory": 378880,
1673959603.1:      "batch_vcpus": 48,
1673959603.1:      "batch_queue": "pairani",
1673959603.1:      "batch_ecr_image": "pairani:latest",
1673959603.1:      "midas_outdir": "/cluster/work/jobs/7425188/cluster/projects/nn8075k/federica/outputs/midas2_merge/after_midas2_78f86426289cf46f1cc5/illumina/gtdb/all_species",
1673959603.1:      "samples_list": "/cluster/work/jobs/7425188/cluster/projects/nn8075k/federica/outputs/midas2_merge/after_midas2_78f86426289cf46f1cc5/illumina/gtdb/all_species/sample_list.txt",
1673959603.1:      "midasdb_name": "gtdb",
1673959603.1:      "midasdb_dir": "/cluster/shared/databases/MIDSA2/latest/gtdb",
1673959603.1:      "species_list": null,
1673959603.1:      "genome_depth": 5.0,
1673959603.1:      "genome_coverage": 0.4,
1673959603.1:      "sample_counts": 2,
1673959603.1:      "site_depth": 2,
1673959603.1:      "site_ratio": 3.0,
1673959603.1:      "site_prev": 0.9,
1673959603.1:      "snv_type": "common",
1673959603.1:      "snp_pooled_method": "prevalence",
1673959603.1:      "snp_maf": 0.1,
1673959603.1:      "snp_type": [
1673959603.1:          "bi",
1673959603.1:          "tri",
1673959603.1:          "quad"
1673959603.1:      ],
1673959603.1:      "locus_type": [
1673959603.1:          "any"
1673959603.1:      ],
1673959603.1:      "num_cores": 12,
1673959603.1:      "chunk_size": 1000000,
1673959603.1:      "advanced": false,
1673959603.1:      "robust_chunk": false
1673959603.1:  }
1673959603.5:  98 species pass the filter
1673959603.5:  Create OUTPUT directory.
1673959603.5:  'rm -rf /cluster/work/jobs/7425188/cluster/projects/nn8075k/federica/outputs/midas2_merge/after_midas2_78f86426289cf46f1cc5/illumina/gtdb/all_species/snps'
1673959603.5:  'mkdir -p /cluster/work/jobs/7425188/cluster/projects/nn8075k/federica/outputs/midas2_merge/after_midas2_78f86426289cf46f1cc5/illumina/gtdb/all_species/snps'
1673959603.5:  Create TEMP directory.
1673959603.5:  'rm -rf /cluster/work/jobs/7425188/cluster/projects/nn8075k/federica/outputs/midas2_merge/after_midas2_78f86426289cf46f1cc5/illumina/gtdb/all_species/temp/snps'
1673959603.5:  'mkdir -p /cluster/work/jobs/7425188/cluster/projects/nn8075k/federica/outputs/midas2_merge/after_midas2_78f86426289cf46f1cc5/illumina/gtdb/all_species/temp/snps'
1673959606.4:  MIDAS2::write_species_summary::start
1673959606.4:  MIDAS2::write_species_summary::finish
1673959607.9:  MIDAS2::design_chunks::start
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/120476’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/110537’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/144385’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/106379’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/126839’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/113335’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/131364’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/123321’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/141587’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/102787’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/117262’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/108799’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/141985’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/101791’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/106238’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/116023’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/101899’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/123465’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/113950’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/143698’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/125635’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/130996’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/118769’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/110920’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/139883’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/121846’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/127445’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/122185’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/141780’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/147354’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/102854’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/114718’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/110932’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/108804’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/136285’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/137039’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/128710’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/126638’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/142444’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/114432’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/147309’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/116721’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/134149’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/138056’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/111522’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/128099’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/117326’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/106410’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/109726’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/130446’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/135207’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/125475’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/140833’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/104567’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/136688’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/116478’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/140766’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/113916’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/107040’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/116797’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/142790’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/117786’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/136029’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/120329’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/103143’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/140084’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/106335’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/144859’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/129716’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/110859’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/143588’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/131067’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/138633’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/121261’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/127200’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/139834’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/130720’: Permission denied
mkdir: cannot create directory ‘/cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/140619’: Permission denied
1673959648.2:  Deleting untrustworthy outputs due to error. Specify --debug flag to keep.
Traceback (most recent call last):
  File "/cluster/projects/nn8075k/conda_envs/midas2/bin/midas2", line 10, in <module>
    sys.exit(main())
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/__main__.py", line 24, in main
    return subcommand_main(subcommand_args)
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 664, in main
    merge_snps(args)
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 658, in merge_snps
    raise error
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 639, in merge_snps
    arguments_list = design_chunks(species_ids_of_interest, midas_db)
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 220, in design_chunks
    all_site_chunks = multithreading_map(design_chunks_per_species, [(sp, midas_db) for sp in dict_of_species.values()], num_cores) #<---
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/common/utils.py", line 540, in multithreading_map
    return _multi_map(func, items, num_threads, ThreadPool)
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/common/utils.py", line 520, in _multi_map
    return p.map(func, items, chunksize=1)
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 205, in design_chunks_per_species
    return sp.compute_snps_chunks(midas_db, chunk_size, "merge")
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/models/species.py", line 103, in compute_snps_chunks
    command(f"mkdir -p {os.path.dirname(loc_fp)}")
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/site-packages/midas2/common/utils.py", line 246, in command
    return subprocess.run(cmd, shell=shell, **subproc_args)
  File "/cluster/projects/nn8075k/conda_envs/midas2/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'mkdir -p /cluster/shared/databases/MIDSA2/latest/gtdb/temp/chunksize.1000000/120476' returned non-zero exit status 1.

Is there a way to tell MIDAS2 to write elsewhere, maybe in a $TMPDIR location?

Thanks! F

centroids.ffn clean up

we didn't clean up the centroids.ffn using our standard (one seq per line); instead we just copied the results from vsearch cluster.

midas2 database: does not list and download MIDAS DB

When I enter this command there is no action or output

download failed

$ midas2 run_species --sample_name ${sample_name} -1 reads/${sample_name}_R1.fastq.gz --midasdb_name uhgg --midasdb_dir my_midasdb_uhgg --num_cores 8 my_midas2_output

1652982326.0: Species abundance estimation in subcommand run_species with args
1652982326.0: {
1652982326.0: "subcommand": "run_species",
1652982326.0: "force": false,
1652982326.0: "debug": false,
1652982326.0: "zzz_worker_mode": false,
1652982326.0: "batch_branch": "master",
1652982326.0: "batch_memory": 378880,
1652982326.0: "batch_vcpus": 48,
1652982326.0: "batch_queue": "pairani",
1652982326.0: "batch_ecr_image": "pairani:latest",
1652982326.0: "midas_outdir": "my_midas2_output",
1652982326.0: "sample_name": "sample1",
1652982326.0: "r1": "reads/sample1_R1.fastq.gz",
1652982326.0: "r2": null,
1652982326.0: "midasdb_name": "uhgg",
1652982326.0: "midasdb_dir": "my_midasdb_uhgg",
1652982326.0: "word_size": 28,
1652982326.0: "aln_mapid": null,
1652982326.0: "aln_cov": 0.75,
1652982326.0: "marker_reads": 2,
1652982326.0: "marker_covered": 2,
1652982326.0: "max_reads": null,
1652982326.0: "num_cores": 8
1652982326.0: }
1652982326.0: Create OUTPUT directory for sample1.
1652982326.0: 'rm -rf my_midas2_output/sample1/species'
1652982326.0: 'mkdir -p my_midas2_output/sample1/species'
1652982326.0: Create TEMP directory for sample1.
1652982326.0: 'rm -rf my_midas2_output/sample1/temp/species'
1652982326.0: 'mkdir -p my_midas2_output/sample1/temp/species'
1652982326.0: MIDAS2::fetch_midasdb_files::start
download failed: s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4 to - An error occurred (403) when calling the HeadObject operation: Forbidden
1652982332.0: Sleeping 4.433524636219189 seconds before retry 1 of <function download_reference at 0x155553e93440> with ('s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4', '/global/u1/s/snayfach/test/my_midasdb_uhgg'), {}.
download failed: s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4 to - An error occurred (403) when calling the HeadObject operation: Forbidden
1652982337.9: Sleeping 11.755280753849886 seconds before retry 2 of <function download_reference at 0x155553e93440> with ('s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4', '/global/u1/s/snayfach/test/my_midasdb_uhgg'), {}.
download failed: s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4 to - An error occurred (403) when calling the HeadObject operation: Forbidden
1652982351.2: Deleting untrustworthy outputs due to error. Specify --debug flag to keep.
Traceback (most recent call last):
File "/global/homes/s/snayfach/.conda/envs/midas2/bin/midas2", line 10, in
sys.exit(main())
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/main.py", line 24, in main
return subcommand_main(subcommand_args)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/subcommands/run_species.py", line 498, in main
run_species(args)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/subcommands/run_species.py", line 492, in run_species
raise error
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/subcommands/run_species.py", line 443, in run_species
midas_db = MIDAS_DB(os.path.abspath(args.midasdb_dir), args.midasdb_name)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/models/midasdb.py", line 60, in init
self.local_toc = self.fetch_files("table_of_contents")
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/models/midasdb.py", line 118, in fetch_files
return _fetch_file_from_s3((s3_path, local_path))
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/models/midasdb.py", line 165, in _fetch_file_from_s3
return download_reference(s3_path, local_dir)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/common/utils.py", line 467, in wrapped_operation
return operation(*args, **kwargs)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/common/utils.py", line 643, in download_reference
command(f"set -o pipefail; aws s3 cp --only-show-errors --no-sign-request {ref_path} - | {uncompress_cmd} > {local_path}")
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/site-packages/midas2/common/utils.py", line 245, in command
return subprocess.run(cmd, shell=shell, **subproc_args)
File "/global/homes/s/snayfach/.conda/envs/midas2/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'set -o pipefail; aws s3 cp --only-show-errors --no-sign-request s3://microbiome-pollardlab/uhgg_v1/genomes.tsv.lz4 - | lz4 -dc > /global/u1/s/snayfach/test/my_midasdb_uhgg/genomes.tsv' returned non-zero exit status 1.

How to keep the alignment?

Hello,

In this ReadTheDocs page, it is shown in the anatomy of the per-sample run_snps outputs, that the alignments are present:

|- temp
     |- snps
        |- repgenomes.bam              run_snps        Rep-genome alignment file
        |- {species}/snps_XX.tsv.lz4
  |- bt2_indexes
     |- snps/repgenomes.*              run_snps        Sample-specific rep-genome database

However, both the temp and bt_indexes folder turn out empty after a successful run of run_snps.

I do not remove these files and I do see in the stdout:

lines such as:

1695133677.6:  'samtools index -@ 4 /<my_path>/temp/snps/111210/111210.sorted.bam'

but this file is nowhere to be found after completion.

I consider hacking the tool for not-removing these files that would be useful to my work.

Please, is there a way/flag to tell midas2 to keep these output files?

Thanks,
Franck

merge_snps

i have finished the run_species and run_snps steps. there is something wrong with merge_snps

I used midas2 merge_snps --samples_list list_of_samples.tsv --midasdb_name gtdb --midasdb_dir my_midasdb_Escherichia_coli --num_cores 10 --genome_coverage 0.0 --force midas2_output/merge

Here come:
Traceback (most recent call last):
File "/home/luoqingqing/.conda/envs/midas2.0/bin/midas2", line 8, in
sys.exit(main())
File "/home/luoqingqing/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/main.py", line 24, in main
return subcommand_main(subcommand_args)
File "/home/luoqingqing/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 664, in main
merge_snps(args)
File "/home/luoqingqing/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 653, in merge_snps
raise error
File "/home/luoqingqing/.conda/envs/midas2.0/lib/python3.7/site-packages/midas2/subcommands/merge_snps.py", line 617, in merge_snps
assert species_ids_of_interest, f"No (specified) species pass the genome_coverage filter across samples, please adjust the genome_coverage or species_list"
AssertionError: No (specified) species pass the genome_coverage filter across samples, please adjust the genome_coverage or species_list

but i have adjusted the genome_coverage to 0, how can i make it more suitable to get the merge results.
waiting for your kind reply, many thanks

midas2 v1.0.9

after download the v1.0.9 source code, and get into the directory
conda env create -n midas2 -f midas2.yml
return error message:
Retrieving notices: ...working... done
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

midas2=1.0.9

need to apply `temp` dir consistently

After we have the knowledge of exactly which files are needed for MIDAS analysis, we need to rename the MIDAS-DB built by iggtools to make sure the /temp dir is only valid for build DB purpose, and won't be needed in the MIDAS analysis step. Therefore, when users need to download the 2.0 version database, we can skip the temp dirs.

how to build ughh database?

use command midas2 database can not download uhgg database on local. Can you write a workflow show how to build uhgg database？

bowtie2-build non-zero exit does not propagate up to midas_run_genes subcommand

I currently have a problematic installation of Bowtie2, resulting in a bowtie2-build failure

$ bowtie2-build --threads 10 data/SS01120.m.proc.iggtools/genes/temp_sc3.0/pangenomes.fa data/SS01120.m.proc.iggtools/genes/temp_sc3.0/pangenomes > data/SS01120.m.proc.iggtools/genes/temp_sc3.0/bowtie2-build.log
/pollard/home/bsmith/anaconda3/envs/ucfmt4/bin/bowtie2-build-s: /pollard/home/bsmith/anaconda3/envs/ucfmt4/bin/../lib/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by /pollard/home/bsmith/anaconda3/envs/ucfmt4/bin/../lib/libtbb.so.2)

which includes an exit code of 1.

However, iggtools exit code is still 0.

$ iggtools midas_run_genes --threads 10 --debug -1 data/SS01120.m.proc.r1.fq.gz -2 data/SS01120.m.proc.r2.fq.gz data/SS01120.m.proc.iggtools
1580506602.8:  Doing important work in subcommand midas_run_genes with args
[....debug output removed....]
1580506645.8:  'bowtie2-build --threads 10 data/SS01120.m.proc.iggtools/genes/temp_sc3.0/pangenomes.fa data/SS01120.m.proc.iggtools/genes/temp_sc3.0/pangenomes > data/SS01120.m.proc.iggtools/genes/temp_sc3.0/bowtie2-build.log'
/pollard/home/bsmith/anaconda3/envs/ucfmt4/bin/bowtie2-build-s: /pollard/home/bsmith/anaconda3/envs/ucfmt4/bin/../lib/libstdc++.so.6: version `CXXABI_1.3.11' not found (required by /pollard/home/bsmith/anaconda3/envs/ucfmt4/bin/../lib/libtbb.so.2)
$ echo $?
0

As a result, pipelines that include this iggtools call do not fail, and try to run the next step.

The problem is that the iggtools.subcommands.midas_run_genes.midas_run_genes function catches ALL exceptions and performs cleanup without a final non-zero exit.

This same (IMO anti-) pattern is also used in midas_run_snps, but doesn't seem to be used in midas_run_species.

The expected behavior is for errors in subcommands to result in iggtools exiting with a non-zero exit code.

czbiohub-sf / midas2 Goto Github PK

midas2's People

Contributors

Stargazers

Watchers

Forkers

midas2's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs