GithubHelp home page GithubHelp logo

shunyiyang / ensembllite Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cogent3/ensembllite

0.0 0.0 0.0 951 KB

A new approach to obtaining local copies of ensembl data

Python 99.62% Jinja 0.38%

ensembllite's Introduction

CI CodeQL Coverage Status

EnsemblLite

Warning EnsemblLite is not ready for use! We will remove this notice when we are ready to post to PyPi at which point it will be ready for trialling. In the meantime, you can check the project progress towards being usable via the EnsemblLite roadmap.

A screencast of an early prototype

๐ŸŽฌ Very early proof-of-concept demo and plan for a new style terminal user interface
demo-tui.mp4

NOTE: the command line name has changed since this early version. See text below for the new name.

Developer installs

Fork the repo and clone your fork to your local machine. In the terminal, create either a python virtual environment or a new conda environment and activate it. In that virtual environment

$ pip install flit

Then do the flit version of a "developer install". (It is basically creating a symlink to the repos source directory.)

$ flit install -s --python `which python`

Installation

Suggest creating a conda environment or a python virtual environment, using python3.11. Then install directly into that environment from the GitHub repo as

$ python -m pip install "ensembl_lite @ git+https://github.com/cogent3/EnsemblLite.git@develop"

Then run for the first time using

$ elt tui

The first start takes a while as, behind the scenes, cogent3 is transpiling various functions into C and compiling them. Eventually, you get a very neat terminal interface you can click around in. To exit, make sure the "root" is selected on the left panel then ^+r.

Usage

The setup is (for now) controlled using a config file, defined in ini format. To get a starting template use the exportrc subcommand.

Usage: elt exportrc [OPTIONS]

  exports sample config and species table to the nominated path

  setting an environment variable ENSEMBLDBRC with this path will force its
  contents to override the default ensembl_lite settings

Options:
  -o, --outpath PATH  path to directory to export all rc contents
  --help              Show this message and exit.

Click to see a sample config file I've been using for development

Using this config, it takes approximately 16' to download (over a ~200MB/s WiFi connection) and ~45' to install on my M2 Macbook Pro (note the install is incomplete). (Note this step uses up to 10 CPU cores.)

[remote path]
host=ftp.ensembl.org
path=pub
[local path]
staging_path=~/Desktop/Outbox/ensembl_download
install_path=~/Desktop/Outbox/ensembl_install
[release]
release=110
[Mouse Lemur]
db=core
[Macaque]
db=core
[Gibbon]
db=core
[Orangutan]
db=core
[Bonobo]
db=core
[Human]
db=core
[Chimp]
db=core
[Gorilla]
db=core
[compara]
align_names=10_primates.epo

Download

Downloads the species indicated in the config file:

  • genomes sequences as fasta format
  • annotations as gff3
  • gene homologies for individual genomes in tsv format

Alignments indicated in the config file will be downloaded in .maf format.

Downloads are written to a local directory, specified in the config file. Downloads are done in parallel (using threads).

Install

"Installation" presently involves transforming downloaded files into local sqlite3 databases which are saved to the location specified in the config file.

From the maf alignment files, the "ancestral" sequences are discarded and for every aligned sequence only the gap data is stored (i.e. gap position and length) along with the genomic coordinates. These alignments will be reconstructable by combining this information with the whole genome sequence. (This approach reduces storage requirements ~5-fold).

Installation is done in parallel on multiple CPUs (since the data need to be decompressed on the fly).

ensembllite's People

Contributors

gavinhuttley avatar huaemilyying avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.