GithubHelp home page GithubHelp logo

dcouvin / getsequenceinfo Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 9.0 15.79 MB

A Perl script allowing to get sequence information from GenBank, RefSeq or ENA sequence repositories

License: GNU General Public License v3.0

Perl 98.04% Batchfile 1.18% Shell 0.78%

getsequenceinfo's Introduction

Logo getSequenceInfo

A simple Perl script allowing to get sequence information from GenBank, RefSeq or ENA sequence repositories.

Requirements

Perl (version 5.26 or greater) must be available in your system to run getSequenceInfo. If your Operating System (OS) is Windows, you can get Perl by installing Strawberry Perl. If necessary, please see information on how to launch or how to use the Command Prompt in Windows. When using Unix OS (Linux or Mac), Perl is generally already installed. But if it is not the case, you can see this page for its installation. You can follow this wiki page for information about the Shell Prompt. You can then check the installation by typing the following command:

perl -v

Quick Installation

Please first verify that Perl is installed in your system by following the above requirments. You probably need to install the X11 development package first. On Debian or Ubuntu, this is the package libx11-dev: sudo apt-get install libx11-dev
On CentOS, RedHat, or Fedora, this is the package libX11-devel.
MacOS users may need Xcode/XQuartz and Fink programs.

Linux or MacOS (Unix)

git clone https://github.com/dcouvin/getSequenceInfo.git
cd getSequenceInfo/
bash install/installer_Unix.sh

Windows

Users can also install the tool by running the installer_Windows.bat file (double-click)

install\installer_Windows.bat

How to use

The tool can be used directly with the command line or using a graphical user interface (GUI).

GUI version

The user can launch the GUI version of the tool (getSequenceInfoGUI.pl) either by executing it (double click) or by typing the following command:

perl getSequenceInfoGUI.pl

Command line version

We can type the following command to display the help message:

perl getSequenceInfo.pl -h

Help message:

	Name: 
		getSequenceInfo.pl
	
	Synopsis:
		A Perl script allowing to get sequence information from GenBank RefSeq or ENA repositories.
		
	Usage:
	  perl getSequenceInfo.pl [options]
	  examples: 
	     perl getSequenceInfo.pl -k bacteria -s "Helicobacter pylori" -l "Complete Genome" -date 2019-06-01 
	     perl getSequenceInfo.pl -k viruses -n 5 -date 2019-06-01
	     perl getSequenceInfo.pl -k "bacteria" -taxid 9,24 -n 10 -c plasmid -dir genbank -o Results
	     perl getSequenceInfo.pl -ena BN000065
	     perl getSequenceInfo.pl -fastq ERR818002
	     perl getSequenceInfo.pl -fastq ERR818002,ERR818004
						 	
	Kingdoms:
		archaea
		bacteria
		fungi
		invertebrate
		plant
		protozoa
		vertebrate_mammalian
		vertebrate_other
		viral
	
	Assembly levels:
		"Complete Genome"
		Chromosome
		Scaffold
		Contig 
	
	General:
		-help or -h			displays this help 	
		-version or -v			displays the current version of the program
		
	Options ([XXX] represents the expected value):
		-directory or -dir [XXX]	allows to indicate the NCBIs nucleotide sequences repository (default: genbank)
		-get or -getSummaries [XXX]	allows to obtain a new assembly summary files in function of given kingdoms (bacteria,fungi,protozoa...)	
		-k or -kingdom [XXX]		allows to indicate kingdom of the organism (see the examples above)
		-s or -species [XXX]		allows to indicate the species (must be combined with -k option)
		-taxid [XXX]			allows to indicate a specific taxid (must be combined with -k option)
		-assembly_or_project [XXX]	allows to indicate a specific assembly accession or bioproject (must be combined with -k option)
		-date [XXX]			indicates the release date (with format yyyy-mm-dd) from which sequence information are available
		-l or -level [XXX]		allows to select a specific assembly level (e.g. "Complete Genome")
		-o or -output [XXX]		allows users to name the output result folder
		-n or -number [XXX]		allows to limit the total number of assemblies to be downloaded
		-c or -components [XXX]		allows to select specific components of the assembly (e.g. plasmid, chromosome, ...)
		-ena [XXX] 			allows to download report and fasta file given a ENA sequence ID 
		-fastq [XXX]			allows to download FASTQ sequences from ENA given a run accession (https://ena-docs.readthedocs.io/en/latest/faq/archive-generated-files.html)
		-log				allows to create a log file

getsequenceinfo's People

Contributors

dcouvin avatar vincentmoco avatar

Stargazers

Andreas Solberg Sagen avatar  avatar  avatar  avatar

Watchers

James Cloos avatar  avatar Suresh Kumar M avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.