GithubHelp home page GithubHelp logo

soybase / map-markers-to-assembly Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 25 KB

Given a file of marker locations in one genome, report the locations of those markers in a second genome.

Shell 43.58% Awk 22.34% Perl 34.08%

map-markers-to-assembly's Introduction

map-markers-to-assembly

This little workflow is driven by the shell script map-markers-to-assembly.sh.

It extracts flanking sequence around a SNP marker from one genome assembly, then searches for the best corresponding sequence in a second assembly, and reports the locations of the SNP marker in the second assembly.

The following are the main steps:

  • Create an index on the uncompressed FROM genome;
  • Put the marker information into four-column BED format, with 1000 bases on each side of the SNP;
  • Extract the sequences from the FROM genome;
  • Run BLAST against the TO genome;
  • Filter BLAST output and writes new marker file (as a tsv file).

The dependencies are:

  • samtools
  • bedtools
  • blast+

and three scripts in the bin directory:

  • filter_marker_blast_data.awk
  • marker_gff_to_bed.pl
  • top_line.awk

OPERANDS

  Paths to four files (including the new marker-locations file to be created),
  and number of threads to use in blast search
    marker_locs    - Path to file of marker names and locations on Genome assembly 1; in gff3 format
    genome_from    - Path to genome assembly 1, corresponding with the coordinates in the marker_locs file
    genome_to      - Path to genome assembly 2, to which the markers will be projected
    marker_to_file - Path for the marker-locations file to be created; may include a directory path
    threads        - Number of threads to use in blast search

Note that the genome files will be uncompressed (gunzip) if they are in a compressed state, as the bedtools getfasta command sometimes fails on compressed data.

In a multiprocessor machine with job management, the script should be called with a batch script, for example like the following:

  module load samtools bedtools blast+

  marker_locs=markers/FILE.gff3.gz
  genome_from=genomes/GENOME1/FILE.fna.gz
  genome_to=genomes/GENOME2/FILE.fna.gz
  marker_to_file=markers/FILE_NEW.tsv
  threads=20

  ./map-markers-to-assembly.sh $marker_locs $genome_from $genome_to $marker_to_file $threads

map-markers-to-assembly's People

Contributors

stevencannon-usda avatar

Watchers

 avatar Jacqueline Campbell avatar Rex Nelson avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.