GithubHelp home page GithubHelp logo

gear-genomics / dicey Goto Github PK

View Code? Open in Web Editor NEW
40.0 6.0 3.0 352 KB

In-silico PCR, primer design and padlock design for in-situ sequencing

Home Page: https://www.gear-genomics.com/

License: GNU General Public License v3.0

Makefile 0.13% C++ 99.44% Dockerfile 0.13% Python 0.30%
gear-genomics pcr primer primer-design in-silico sanger-sequencing amplicon amplicon-sequencing padlock in-situ-sequencing

dicey's Introduction

install with bioconda Anaconda-Server Badge C/C++ CI Docker CI GitHub license GitHub Releases

Installing dicey

Dicey is available as a Bioconda package, as a pre-compiled statically linked binary from Dicey's github release page, as a singularity container SIF file or as a minimal Docker container.

apt-get install -y build-essential g++ cmake zlib1g-dev libbz2-dev liblzma-dev libboost-all-dev autoconf

git clone --recursive https://github.com/gear-genomics/dicey.git

cd dicey/

make all

make install

This will generate the binary bin/dicey.

Running Dicey

dicey -h

Sequence search in an indexed reference genome

Searching a large reference genome requires a pre-built index on a bgzip compressed genome.

dicey index -o hg19.fa.fm9 hg19.fa.gz

samtools faidx hg19.fa.gz

The indexing step is only required once. You can then search nucleotide sequences at a user-defined edit or hamming distance.

dicey hunt -g hg19.fa.gz TCTCTGCACACACGTTGT | python scripts/json2txt.py

You can also redirect the output in JSON format to a file.

dicey hunt -g hg19.fa.gz -o out.json.gz TCTCTGCACACACGTTGT

Pre-built genome indices for commonly used reference genomes are available for download here.

In-silico PCR for a set of primers

Dicey can search for multiple primer pairs, show off-target products and determine PCR amplicons.

echo -e ">FGA_f\nGCCCCATAGGTTTTGAACTCA\n>FGA_r\nTGATTTGTCTGTAATTGCCAGC" > primers.fa

dicey search -c 45 -g hg19.fa.gz primers.fa | python scripts/json2txt.py

The default output is a JSON file that can also be stored in a file.

dicey search -c 45 -o out.json.gz -g hg19.fa.gz primers.fa

Padlock probe design

Dicey can design padlock probes for imaging mRNA in single cells. You need to download an indexed reference genome and a matching GTF file, e.g., for GRCh38:

wget http://ftp.ensembl.org/pub/release-107/gtf/homo_sapiens/Homo_sapiens.GRCh38.107.gtf.gz

With these files, you can then design padlock probes for a given gene using

dicey padlock -g Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz -t Homo_sapiens.GRCh38.107.gtf.gz -b data/bar.fa.gz ENSG00000136997

Graphical user interface

You can search primers interactively using our web application silica.

FAQ

  • Dicey cannot find the primer3 config directory
    The primer3 config directory is included in the repository. Just clone the repository git clone --recursive https://github.com/gear-genomics/dicey.git and then use the cloned config directory dicey search -i dicey/src/primer3_config/ -g hg19.fa.gz primers.fa.

  • The script json2txt.py is not found
    The json2txt.py python script is included in the repository. Just clone the repository git clone --recursive https://github.com/gear-genomics/dicey.git and then you will find the script in the ./scripts/ subdirectory.

Citation

Dicey is part of the GEAR genomics framework which is described in the below publication.

Rausch, T., Fritz, M.H., Untergasser, A. and Benes, V.
Tracy: basecalling, alignment, assembly and deconvolution of sanger chromatogram trace files.
BMC Genomics 21, 230 (2020).
https://doi.org/10.1186/s12864-020-6635-8

License

Dicey is distributed under the GPL license. Consult the accompanying LICENSE file for more details.

dicey's People

Contributors

gtbil avatar mhyfritz avatar tobiasrausch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dicey's Issues

Understanding the -k parameter

I am trying to understand how to adjust -k since it seems to impact the results dramatically.
For example with primers of length 25 and a -k of 15 I get no amplicons.
I think this is the same as changing the Number of bp from the 3' End of the Primer Used to Seach for Matches (>14) param on the Silica website.
https://www.gear-genomics.com/silica/index.html?UUID=0101f455-249c-4e7f-beb5-63f2f50a7501
image

If I change this to -k 20 or the same param on the website I get an amplicon
https://www.gear-genomics.com/silica/index.html?UUID=c40f28ba-5abd-420a-8d45-6667d72de0fe
image

How long should indexing take?

I'm trying to dicey index a 2.4Gb plant genome assembly. How long should I expect creating the FM-Index to take, roughly?

Most efficient way to multi-thread dicey searches

Hi,

I want to perform multiple in silico PCR's using Dicey. But every time I start a new search it seems like it has to load the full reference genome. Is there a more efficient way to load the reference genome only once and perform multiple searches against that reference genome?

Thanks!
Dion

Issues compiling dicey on Mac M1

I am unable to compile dicey on a Mac (MacBook Pro, M1 Max chip, 32 GB RAM, Ventura 13.4).

I have installed it using Biocanda in a conda environment. When trying to compile it I run into the following issue:

(probe_dicey) boehm0002@bz-rgmr01-plm02 dicey % make all
if [ -r src/htslib/Makefile ]; then cd src/htslib && autoreconf -i && ./configure --disable-s3 --disable-gcs --disable-libcurl --disable-plugins && /Library/Developer/CommandLineTools/usr/bin/make && /Library/Developer/CommandLineTools/usr/bin/make lib-static && cd ../../ && touch .htslib; fi
Can't exec "aclocal": No such file or directory at /opt/homebrew/Cellar/autoconf/2.71/share/autoconf/Autom4te/FileUtils.pm line 274.
autoreconf: error: aclocal failed with exit status: 2
make: *** [.htslib] Error 2

I was able to figure out that I should install automake as this contains aclocal, but I have a University managed computer and using brew to do this does not work well. Even after installation in admin mode I get the same message.

I then also tried to run dicey using docker afterwards and I run into the following issue when pulling it (I used Docker Desktop):
no matching manifest for linux/arm64/v8 in the manifest list entries

After some digging in stack-overflow I found that this is due to an issue with M1 chips and after resolving that I still received the same error as above. I am now stuck and unable to track down if this is actually a Mac issue or something else.

Many thanks for your help!!

Dicey Chop of Human reference genome

Hi there,

Any idea how long Dicey takes to process a human reference genome with the chop function? My Macbook Pro has been running for 30 hours straight. Also, what size can I expect for the read1 and read2 files?

Thanks a million.

Niel

Are the releases up-to-date?

The last release (0.1.7) was from last year, and there have been a few commits since. Should that binary release continue to be used?

Error opening params file

I have an error when I run dicey..
my commands are:

script="/media/bioadmin/24e12c01-7941-49f5-9647-eeebabd15082/packages/dicey/scripts/json2txt.py"
config="/media/bioadmin/24e12c01-7941-49f5-9647-eeebabd15082/packages/dicey/src/primer3_config"

echo -e ">566F\CAGCAGCCGCGGTAATTCC\n>1200R\nCCCGTGTTGAGTCAAATTAAGC" > primers.fa

dicey search -c 45 -l 1000 -g ../18S_All.fasta primers.fa -i $config | python $script

I get the error:

Error opening params file
Error opening params file
Segmentation fault (core dumped)

dicey chop arguments for creating reads for mappability map

Hello! I'm interested in generating a mappability map for a genome for Delly germinal CNV calling. There are several arguments for dicey chop including insert size, read length etc. Would I need to set those arguments to whatever match the reads that I'll be using or should they just remind default? Thank you so much for your time.

Add option to provide fasta of sequences to hunt command

Hello! Thanks for this great tool.

I'm using dicey hunt to check for uniqueness of primers in my target genome, and would like to do it in routine on a set of sequences, instead of one by one - could you/we add this as an input CLI parameter?

Best,
Brice

Support symlinks?

Dicey could support symlinks for some regular inputs instead of just regular files. For example, this would allow a symlinked assembly or indices for that assembly to be used for dicey search, and also process substitution could be used to send the input sequences fasta from standard input or the output of another application:

dicey search  -i /etc/primer3_config -g symlink.fa.gz <(echo "FASTA here)

Relevant lines:
https://github.com/gear-genomics/dicey/blob/main/src/silica.h#L300
https://github.com/gear-genomics/dicey/blob/main/src/silica.h#L363

Error with FM-Index

I got an error with "FM-Index cannot be loaded!"

Here is my command

dicey search -c 45 -g Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz primers.fa -i /dicey/src/primer3_config/

I am not familar with C++, but i search for the error for a long time and i think there is somthing wrong with the genomeindex.json
file. But I have load all the reference file from the recommend webset as following: fa.gz file, fa.gz.fai file, fa.gz.gzi file and the fa.fm9 file.

I really struggled with the error for many days, so it will be very appriciate for any advise!

Thank a ton!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.