GithubHelp home page GithubHelp logo

fix_assembly_errors's Introduction

This repository contains scripts for microbial genome sequence curation. The key scripts are re_assemble_errors.py, ra2.py, and mapped.py. 

The scripts require python3 and the following programs:
velvet
Bowtie2
shrinksam (https://github.com/bcthomas/shrinksam)

The basic method for identifying and resolving assembly errors is described in:

C. T. Brown, L. A. Hug, B. C. Thomas, I. Sharon, and C. J. Castelle, “Unusual biology across a group comprising more than 15% of domain Bacteria,” Nature, 2015.

The general strategy is to identify assembly errors as regions with zero coverage by stringently mapped reads, and then attempt to corrected them by re-assembling the region of the genome with the error using velvet. Reads (and their pairs) that mapped to the region containing the error are used in the re-assembly. The re-assembled fragments are then checked for errors and incorporated into the scaffold if possible. See docs/ra2_assembly_curation.pdf for more information. 

The original version of the software (re_assemble_errors.py) would break scaffolds at errors if they could not be corrected. The updated version (ra2.py) will only break scaffolds at errors when 1) the error could not be corrected and 2) the region is not spanned by paired read sequences.

ra2.py output will be in <genome>.curated directory: 
* re_assembled.fa: curated genome
* re_assembled.report.txt: report (e = extended, n = error that was not fixed, f = error that was fixed, b = scaffold broken at error, removed = scaffold removed due to having no coverage)

See genome_curation_using_ra2.pdf for more information and example usage in the context of genome curation. 

mapped.py is a script for filtering SAM files based on the number of mismatches in the read mapping. 

Note: These scripts have been tested on Mac and Linux systems, but may not work with Windows.

Chris Brown
[email protected]

fix_assembly_errors's People

Contributors

christophertbrown avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

fix_assembly_errors's Issues

ModuleNotFoundError: No module named 'ctbBio'

Hello,
I am trying to get ra2.py up and running and I have installed the dependencies. When I try to execute the command however, I get the following message:

$ python ra2.py -h
Traceback (most recent call last):
  File "ra2.py", line 31, in <module>
    import ctbBio.mapped as map_tool
ModuleNotFoundError: No module named 'ctbBio'

I have already also installed your ctbBio repo with pip install ctbBio, but I'm not sure if I need to do some other step so that ra2.py can access it...
Any help you could provide would be greatly appreciated, thanks!

TypeError: not all arguments converted during string formatting

Now I get this below new error (I'm using python version 3.6.2; is the error related to it?):
(virtualenv3) user@XXXXX:~/tools/fix_assembly_errors/ctbRA$ ./ra2.py -i contigs.fasta -1 read_1.fastq.gz -2 read_2.fastq.gz
Building a SMALL index
Traceback (most recent call last):
File "./ra2.py", line 1119, in
window = window)
File "./ra2.py", line 975, in curate_assembly
allow_orphan = False, allow_orphan_ends = False, save_mapping = save_mapping)
File "./ra2.py", line 915, in check_assembly
mapping, pr_split = map_reads(assembly, scaffolds, pr, threads, multiple, pr_split = pr_split)
File "./ra2.py", line 86, in map_reads
return run_bowtie(assembly, sam, pr, pr_split, sr, threads, multiple, bt_dir) # run bowtie, return sam file
File "./ra2.py", line 66, in run_bowtie
% (matches_command, threads, bt_dir, assembly.rsplit('/', 1)[-1], pr_command, sr_command, sam)
TypeError: not all arguments converted during string formatting

Originally posted by @Anto007 in #3 (comment)

ModuleNotFoundError: No module named 'ctbRA'

I get the below error when I try to run the ra2.py script. Also is it ok if I leave out shrinksam (difficulty in getting the tool installed and no support from the developer) and modify the bowtie2 command-line to include the --no-unal flag. If I'm not mistaken, this will circumvent the need to use shrinksam? Many thanks for your kind help!

(virtualenv3) user@XXXX:~/tools/fix_assembly_errors/ctbRA$ ./ra2.py
Traceback (most recent call last):
File "./ra2.py", line 36, in
from ctbRA.assemble import velvet as velvet
ModuleNotFoundError: No module named 'ctbRA'

KeyError while running ra2.py

I am getting the following KeyError for some fasta files(1 in 4-5 files, the rest run fine):

Building a SMALL index
Traceback (most recent call last):
File "/opt/bin/bio/ra2.py", line 1117, in
add_Ns = args['add_Ns'], mask = args['mask'], ignore_insert = args['ignore_insert_cov'], window = window)
File "/opt/bin/bio/ra2.py", line 984, in curate_assembly
add_Ns, mask, ignore_insert)
File "/opt/bin/bio/ra2.py", line 782, in merge_assemblies
merged = patch_contig(seq, s2c[id], errors, re_assembled[id], cov_thresh)
KeyError: 'XXXXX(scaffold name)'

Any idea why this is the case? I am running this with the parameters -m 5 -c 6 for read lengths of 250bp. Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.