GithubHelp home page GithubHelp logo

hdevillers / go-fannot Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 368 KB

Functional annotation transfer tool in Golang.

License: MIT License

Makefile 2.89% Go 97.11%
genome-annotation protein-function homology

go-fannot's Introduction

go-FAnnoT: Functional Annotation Transfer tool in Golang

About

go-FAnnoT is functional annotion transfer tool based on protein homology. Our motivations to develop this tools were manyfold:

  • Defining a precise strategy to build reference datasets. Indeed, most of time, transfer tools consider the annotation of one closely related species annotation as reference, copying possible errors. While it is necessary to adapt reference proteins to the organisms, a more robust strategy is required to ensure the quality of functional annotations.
  • Evaluating homology from global alignment and not from a local alignment. Most of the existing tools identify matches on a basis of BLAST search. Unfortunatly, measuring homology on BLAST alignment is not sufficient and sequences should be realigned with a global alignment tool.
  • Allowing a flexible thresold setting. In addition to reference datasets, homology thresholds should depends on the organism to annotate. Hence, for example, it can be necessary to lower threshold for species that does not have closely related species in reference databases.
  • Standardizing functional annotation in sequence files. This latter aspect is critical to facilitate annotation comparisons.

Hence, go-FAnnoT broadly consists in the following steps:

  1. Extracting reference datasets from rich and high quality databases. We decided to use Uniprot and TrEMBL.
  2. Building a hierarchy between the different reference datasets.
  3. Defining rules (different levels of homolgy) to transfer annotation.
  4. Process each input proteins iteratively against each datasets until finding a suitable annotation.
  5. (optional) Complete annotation with InterProScan functional domain prediction.
  6. Produce standardized functional annotations.

Requierments

Download Uniprot and TrEMBL databases

Our tool has been design to use Uniprot databases (SwissProt or TrEMBL). The complete SwissProt database can be downloaded here (choose the file uniprot_sprot.dat.gz)

Concerning the TrEMBL data, it is recommanded to download only a subset of the database as the complete one is too loarge. Thus, taxon level subsets are available here.

External tools

To run go-FAnnoT, it is necessary to have NCBI-BLAST+ tool suite and NEEDLE (from EMBOSS tool suite) in the system PATH. To do so, there are several solutions:

  • Use a conda environment with these two tools.
  • (Or) Install these tools. Binaries are available at the following urls:
  • (Or, for linux only) Most of the recent distributions have these tools available directly in there repositories:
# Example with Ubuntu
apt-get install ncbi-blast+ emboss

Install go-FAnnoT

Build the project from source (github)

To build the project you will have to install Go (see instructions here).

Then clone this repository:

git clone https://github.com/hdevillers/go-fannot.git

Enter the go-fannotdirectory and build the project with make instructions:

cd go-fannot
make
make test

For linux and macos, binary can be installed by running make install with administrator rights. The default installation path is /usr/local/bin/. It is possible to indicate a different installation path as follow:

make install -prefix my/install/path

Download binaries

Precompiled binaries for all platforms will be available soon.

Licence

MIT

go-fannot's People

Contributors

hdevillers avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.