christophertbrown / fix_assembly_errors Goto Github PK
View Code? Open in Web Editor NEWscript for fixing assembly scaffolding errors
License: GNU General Public License v2.0
script for fixing assembly scaffolding errors
License: GNU General Public License v2.0
This repository contains scripts for microbial genome sequence curation. The key scripts are re_assemble_errors.py, ra2.py, and mapped.py. The scripts require python3 and the following programs: velvet Bowtie2 shrinksam (https://github.com/bcthomas/shrinksam) The basic method for identifying and resolving assembly errors is described in: C. T. Brown, L. A. Hug, B. C. Thomas, I. Sharon, and C. J. Castelle, “Unusual biology across a group comprising more than 15% of domain Bacteria,” Nature, 2015. The general strategy is to identify assembly errors as regions with zero coverage by stringently mapped reads, and then attempt to corrected them by re-assembling the region of the genome with the error using velvet. Reads (and their pairs) that mapped to the region containing the error are used in the re-assembly. The re-assembled fragments are then checked for errors and incorporated into the scaffold if possible. See docs/ra2_assembly_curation.pdf for more information. The original version of the software (re_assemble_errors.py) would break scaffolds at errors if they could not be corrected. The updated version (ra2.py) will only break scaffolds at errors when 1) the error could not be corrected and 2) the region is not spanned by paired read sequences. ra2.py output will be in <genome>.curated directory: * re_assembled.fa: curated genome * re_assembled.report.txt: report (e = extended, n = error that was not fixed, f = error that was fixed, b = scaffold broken at error, removed = scaffold removed due to having no coverage) See genome_curation_using_ra2.pdf for more information and example usage in the context of genome curation. mapped.py is a script for filtering SAM files based on the number of mismatches in the read mapping. Note: These scripts have been tested on Mac and Linux systems, but may not work with Windows. Chris Brown [email protected]
Hello,
I am trying to get ra2.py up and running and I have installed the dependencies. When I try to execute the command however, I get the following message:
$ python ra2.py -h
Traceback (most recent call last):
File "ra2.py", line 31, in <module>
import ctbBio.mapped as map_tool
ModuleNotFoundError: No module named 'ctbBio'
I have already also installed your ctbBio repo with pip install ctbBio
, but I'm not sure if I need to do some other step so that ra2.py can access it...
Any help you could provide would be greatly appreciated, thanks!
Now I get this below new error (I'm using python version 3.6.2; is the error related to it?):
(virtualenv3) user@XXXXX:~/tools/fix_assembly_errors/ctbRA$ ./ra2.py -i contigs.fasta -1 read_1.fastq.gz -2 read_2.fastq.gz
Building a SMALL index
Traceback (most recent call last):
File "./ra2.py", line 1119, in
window = window)
File "./ra2.py", line 975, in curate_assembly
allow_orphan = False, allow_orphan_ends = False, save_mapping = save_mapping)
File "./ra2.py", line 915, in check_assembly
mapping, pr_split = map_reads(assembly, scaffolds, pr, threads, multiple, pr_split = pr_split)
File "./ra2.py", line 86, in map_reads
return run_bowtie(assembly, sam, pr, pr_split, sr, threads, multiple, bt_dir) # run bowtie, return sam file
File "./ra2.py", line 66, in run_bowtie
% (matches_command, threads, bt_dir, assembly.rsplit('/', 1)[-1], pr_command, sr_command, sam)
TypeError: not all arguments converted during string formatting
Originally posted by @Anto007 in #3 (comment)
I get the below error when I try to run the ra2.py script. Also is it ok if I leave out shrinksam (difficulty in getting the tool installed and no support from the developer) and modify the bowtie2 command-line to include the --no-unal flag. If I'm not mistaken, this will circumvent the need to use shrinksam? Many thanks for your kind help!
(virtualenv3) user@XXXX:~/tools/fix_assembly_errors/ctbRA$ ./ra2.py
Traceback (most recent call last):
File "./ra2.py", line 36, in
from ctbRA.assemble import velvet as velvet
ModuleNotFoundError: No module named 'ctbRA'
I am getting the following KeyError for some fasta files(1 in 4-5 files, the rest run fine):
Building a SMALL index
Traceback (most recent call last):
File "/opt/bin/bio/ra2.py", line 1117, in
add_Ns = args['add_Ns'], mask = args['mask'], ignore_insert = args['ignore_insert_cov'], window = window)
File "/opt/bin/bio/ra2.py", line 984, in curate_assembly
add_Ns, mask, ignore_insert)
File "/opt/bin/bio/ra2.py", line 782, in merge_assemblies
merged = patch_contig(seq, s2c[id], errors, re_assembled[id], cov_thresh)
KeyError: 'XXXXX(scaffold name)'
Any idea why this is the case? I am running this with the parameters -m 5 -c 6 for read lengths of 250bp. Thanks in advance!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.