GithubHelp home page GithubHelp logo

lcsb-biocore / cobrexa.jl Goto Github PK

View Code? Open in Web Editor NEW
41.0 7.0 8.0 26.38 MB

Constraint-Based Reconstruction and EXascale Analysis

Home Page: http://bit.ly/COBREXA

License: Apache License 2.0

Julia 99.63% Dockerfile 0.03% TeX 0.33%
julia exascale high-performance hpc constraint-based-modeling cobra exascale-analysis

cobrexa.jl's Introduction


COnstraint-Based Reconstruction and EXascale Analysis

Documentation Tests Coverage How to contribute? Project status
docs-img-stable docs-img-dev CI codecov contrib repostatus-img

This package provides constraint-based reconstruction and analysis tools for exa-scale metabolic modeling in Julia.

How to get started

Prerequisites and requirements

  • Operating system: Use Linux (Debian, Ubuntu or centOS), MacOS, or Windows 10 as your operating system. COBREXA has been tested on these systems.
  • Julia language: In order to use COBREXA, you need to install Julia 1.0 or higher. Download and follow the installation instructions for Julia here.
  • Hardware requirements: COBREXA runs on any hardware that can run Julia, and can easily use resources from multiple computers interconnected on a network. For processing large datasets, you are required to ensure that the total amount of available RAM on all involved computers is larger than the data size.
  • Optimization solvers: COBREXA uses JuMP.jl to formulate optimization problems and is compatible with all JuMP supported solvers. However, to perform analysis at least one of these solvers needs to be installed on your machine. For a pure Julia implementation, you may use e.g. Tulip.jl, but other solvers (GLPK, Gurobi, ...) work just as well.

๐Ÿ’ก If you are new to Julia, it is advisable to familiarize yourself with the environment first. Use the Julia documentation to solve various language-related issues, and the Julia package manager docs to solve installation-related difficulties. Of course, the Julia channel is another fast and easy way to find answers to Julia specific questions.

Quick start

COBREXA.jl documentation is available online (also for development version of the package).

You can install COBREXA from Julia repositories. Start julia, press ] to switch to the Packaging environment, and type:

add COBREXA

You also need to install your favorite solver supported by JuMP.jl (such as Gurobi, Mosek, CPLEX, GLPK, Clarabel, etc., see a list here). For example, you can install Tulip.jl solver by typing:

add Tulip

Alternatively, you may use prebuilt Docker and Apptainer images.

If you are running COBREXA.jl for the first time, it is very likely that upon installing and importing the packages, your Julia installation will need to precompile their source code from the scratch. In fresh installations, the precompilation process should take less than 5 minutes.

When the packages are installed, switch back to the "normal" julia shell by pressing Backspace (the prompt should change color back to green). After that, you can download a SBML model from the internet and perform a flux balance analysis as follows:

using COBREXA   # loads the package
using Tulip     # loads the optimization solver

# download the model
download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml")

# open the SBML file and load the contents
model = load_model("e_coli_core.xml")

# run a FBA
fluxes = flux_balance_analysis_dict(model, Tulip.Optimizer)

The variable fluxes will now contain a dictionary of the computed optimal flux of each reaction in the model:

Dict{String,Float64} with 95 entries:
  "R_EX_fum_e"    => 0.0
  "R_ACONTb"      => 6.00725
  "R_TPI"         => 7.47738
  "R_SUCOAS"      => -5.06438
  "R_GLNS"        => 0.223462
  "R_EX_pi_e"     => -3.2149
  "R_PPC"         => 2.50431
  "R_O2t"         => 21.7995
  "R_G6PDH2r"     => 4.95999
  "R_TALA"        => 1.49698
  โ‹ฎ               => โ‹ฎ

The main feature of COBREXA.jl is the ability to easily specify and process a huge number of analyses in parallel. You. You can have a look at a longer guide that describes the parallelization and screening functionality, or dive into the example analysis workflows.

Testing the installation

If you run a non-standard platform (e.g. a customized operating system), or if you added any modifications to the COBREXA source code, you may want to run the test suite to ensure that everything works as expected:

] test COBREXA

Prebuilt images docker

Docker image is available from the docker hub as lcsbbiocore/cobrexa.jl, and from GitHub container repository. Download and use them as usual with docker:

docker run -ti --rm lcsbbiocore/cobrexa.jl:latest

# or alternatively from ghcr.io
docker run -ti --rm ghcr.io/lcsb-biocore/docker/cobrexa.jl:latest

In the container, you should get a julia shell with the important packages already installed, and you may immediately continue the above tutorial from using COBREXA.

Apptainer (aka Singularity) images are available from GitHub container repository. To start one, run:

singularity run oras://ghcr.io/lcsb-biocore/apptainer/cobrexa.jl:latest

...which gives you a running Julia session with COBREXA.jl loaded.

If you require precise reproducibility, use a tag like v1.2.2 instead of latest (all releases since 1.2.2 are tagged this way).

Acknowledgements

COBREXA.jl is developed at the Luxembourg Centre for Systems Biomedicine of the University of Luxembourg (uni.lu/lcsb), cooperating with the Institute for Quantitative and Theoretical Biology at the Heinrich Heine University in Dรผsseldorf (qtb.hhu.de).

The development was supported by European Union's Horizon 2020 Programme under PerMedCoE project (permedcoe.eu) agreement no. 951773.

If you use COBREXA.jl and want to refer to it in your work, use the following citation format (also available as BibTeX in cobrexa.bib):

Miroslav Kratochvรญl, Laurent Heirendt, St Elmo Wilken, Taneli Pusa, Sylvain Arreckx, Alberto Noronha, Marvin van Aalst, Venkata P Satagopam, Oliver Ebenhรถh, Reinhard Schneider, Christophe Trefois, Wei Gu, COBREXA.jl: constraint-based reconstruction and exascale analysis, Bioinformatics, Volume 38, Issue 4, 15 February 2022, Pages 1171โ€“1172, https://doi.org/10.1093/bioinformatics/btab782

COBREXA logoโ€‡โ€‡โ€‡Uni.lu logoโ€‡โ€‡โ€‡LCSB logoโ€‡โ€‡โ€‡HHU logoโ€‡โ€‡โ€‡QTB logoโ€‡โ€‡โ€‡PerMedCoE logo

cobrexa.jl's People

Contributors

bertonoronha avatar cylon-x avatar exaexa avatar github-actions[bot] avatar hettiec avatar htpusa avatar josepereiro avatar laurentheirendt avatar marvinvanaalst avatar stelmo avatar syarra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cobrexa.jl's Issues

Bounds should not be sparse vectors, maybe a new data type...

Bound vectors are typically not sparse in the usual sense. They are populated with max/min bounds and not very many zeros. It might make sense to make our own "sparse" vector format where the zeros are actually the max or min bounds. This might save significant storage at the exa-scale...

Overview issue: documentation structure

The docs are more like a collection of ideas now, we should give it a clear tutorial structure.

My idea for the structure is as follows:

  • Introduction, what does COBREXA solve and what it does not solve
  • Totally primitive example ((down)load 1 model and do 1 FBA)
  • Section with tutorials
    • Model loading and conversion
    • Running FBA, FVA and sampling on small models
    • Running FVA on HPCs
    • Modifying and saving the models (Serialize, "export" through conversion to JSON/MAT and saving.)
  • Section with advanced tutorials
    • Using modifications to scan for many model properties at once (this still needs to be implemented, but it's hopefully 50LOC :D )
    • Using a custom model structure
    • Writing own reconstructions and modifications and running them on HPC
    • Using the extra information in StdModel to screen through many model variants, e.g. knockouts or something (I guess some of the original tutorials may sink in here)
  • Examples (aka backing notebooks)
    • Loading and saving
    • Simple FBA and FVA
    • Sampling and seeing the results
    • Parallel FVA
    • Knockouts
    • Custom models
  • Function documentation (REFERENCE, this should correspond to the structure in src/ as much as possible. I'd still separate it into some roughly consistent functions ordered by kinda bottom-up structure)
    • Types
    • IO
    • Analysis functions
    • Sampling
    • Modifications and reconstruction functions
    • Utils+misc

Feel free to edit/suggest.

add CodeCov

We need to add Codecov and Coveralls to the testing pipeline

Automatic file loading

I still don't like the automatic file loading thing because it introduce alphabetic ordering issues with package loading, e.g. I have to append "a" in front of reaction.jl to make sure it loads before cobraModel.jl... I think it solves a slight inconvenience of listing all the packages but introduces a bigger inconvenience with having to have creative file names (less descriptive names)...

So far this only seems to affect me, but what do ya'll think?

Fix samplers and create good tests for them

Currently the samplers are not super robust and the testing leaves much to be desired.

  1. Fix ACHR
  2. Add better tests
  3. Add projections to ensure robust sampling in case the samplers go out of bounds

`loadModel` is broken

There's no haskey() for MAT files

using MAT
file=matopen("test/data/toyModel1.mat")
haskey(file, "model")
ERROR: MethodError: no method matching haskey(::MAT.MAT_v5.Matlabv5File, ::String)

Homogenize tests

After #92 the package structure will change somewhat. Let's fix the test layout to match the src directory structure better, and perhaps implement better tests (@stelmo I am looking at you)

SBMLModel should just be CobraModel

SBML.jl imports the dense version of the model on file, we should only have one struct that stores this information (avoid clutter). I think this should be CobraModel since model construction will likely happen in it... ?

Clean-up downloading of test models

  • check if the file exists before downloading
  • always check against a hash and print an error if the hashes do not match, so that we can quickly spot that something fishy happened to the models
  • preferably wrap the Download.download in something that does all this automagically

Implement knockouts in an efficient way

We should definitely have some way of doing knockouts on models. This makes the most sense for StandardModel. Currently the plan is to add a field to StandardModel:

mutable struct StandardModel <: MetabolicModel
    id::String
    reactions::Array{Reaction,1}
    metabolites::Array{Metabolite,1}
    genes::Array{Gene,1}
    gene_reaction::Dict{Gene, Array{Reaction,1}}
end

This should make looking up reactions affected by deletions quicker than loopings over all reactions for each gene.

Then a new function,

knockout_modification = knockout(model, gene1, gene2, ..., geneN)

needs to made that will intelligently create a callback that can be passed to the modifications argument in the analysis functions to actually do the knockout. WIP

Make sure the accessors for MetabolicModel kinda feature-complete

State:

  • reactions ๐Ÿ†—
  • n_reactions ๐Ÿ†— (efficient "just tell me how many reaccs you have)
  • metabolites ๐Ÿ†—
  • n_metabolites ๐Ÿ†—
  • genes ๐Ÿ†—
  • n_genes -- this would be great to have done in #101
  • stoichiometry ๐Ÿ†—
  • bounds ๐Ÿ†—
  • objective ๐Ÿ†—
  • balance ๐Ÿ†—
  • gene_associations ๐Ÿ†— (basically GRRs)
  • metabolite_chemistry ๐Ÿ†— (basically formulas+charges)
  • metabolite_annotations -- TODO, is there any expected standard to pull out e.g. the standardized identifiers? I'd go for metabolite_identifiers directly tbh. Done in #102 together with the below 2
  • reaction_annotations
  • gene_annotations
  • reaction_subsystem -- TODO
  • metabolite_comparment -- TODO
  • reaction/gene/metabolite_notes -- notes are extremely random, we might postpone them. If there's no conveyable structure for notes, I'd suggest not having them in the generic MetabolicModels at all.
  • reaction/gene/metabolite_name

Consistent model variable naming

it is a bit ugly that JSON and SBML model contain .m but MATModel contains .mat. Either clean up to all .m or use .json and .sbml.

Same for model names used in function parameters, there's m, model and a, with occasional excesses.

Community model + tutorial

  • load a heap of whatever models
  • have the model structure re-ID them correctly and add exchange reactions

Implement macros for all functions (where appropriate)

Let's make macros to really make the user interface clean. I have implemented fba macros in #53 (see the last few commits) and @marvinvanaalst has implemented a macro for reaction adding in #59 . @exaexa and I were also talking about this extensively on slack, it would be really cool to have something like this (from slack @exaexa):

vs = @variants
  knockout(123)
  knockout(4345)
  ...
end

@mod_variants! vs
  remove_reaction(123)
end

@combine_variants! vs
  no_modification()
  add_random_ATP()
  add_some_toxin()
end

Currently I have an fba mini version of this working as shown below:

using COBREXA
using Tulip

model = read_model(joinpath("e_coli_core.json"))
biomass = findfirst(model.reactions, "BIOMASS_Ecoli_core_w_GAM")
glucose = findfirst(model.reactions, "EX_glc__D_e")

vec = @flux_balance_analysis_vec model Tulip.Optimizer begin
    modify_objective(biomass)
    modify_constraint(glucose, -8.0, -8.0)
end

Some SBML formats (level 3?) are not read correctly

For example:

download("https://www.vmh.life/files/reconstructions/AGORA/1.03/reconstructions/sbml/Abiotrophia_defectiva_ATCC_49176.xml", "testModel.xml");
model = readSBML("testModel.xml");

findall(getOCs(model)!.=0)
# all zeros
getLBs(model)
# -Inf when should be -1000

Ongoing design considerations

From our discussion today via Slack (@exaexa @laurentheirendt):

  1. LinearModel -> CoreModel
  2. Creation of SBMLModel, MATModel, JSONModel, (maybe YAMLModel?) types to store models read in from those files. Use all fields supported by the various file types.
  3. Analysis functions should work on all model types. Input: model type. Output: numbers etc.
  4. Reconstruction functions will output StandardModel or CoreModel depending on the input type with restrictions on which type of model depending on the purpose of the reconstruction functions
  5. Squashing models: example: CoreModel and SBMLModel can at most output a CoreModel
  6. Reconstruction functions will have as an input StandardModel or CoreModel
  7. accessors rather than deep copies when converting model types

reorganize file contents

Transcript:

Mo  1:03 PM
we should change the file names of "modeling.jl" and "model_manipulations.jl" to something like "manipulate_linearmodel.jl" and "manipulate_fullmodel.jl"
mirek  1:04 PM
or modeling/fullmodel and modeling/linearmodel
would also make it easier to split into doc sections
Mo  1:05 PM
also I think find_exchange_metabolites should be near my exchange_reactions and we should just have one name and dispatch on model type
mirek  1:05 PM
yeah
Mo  1:05 PM
this is of course not for this PR but later
should I make an issue to remind us?
mirek  1:05 PM
same thing for the prettyprinting&misc functions probably, they are now mixed with the model docs
I'll make one

Regroup tests to files that match src/

Lots of tests should be homogenized e.g. test/io/io_test.jl tests read write of StandardModel and test/io/writer.jl does the same thing but in a different way. I think we can get rid of test/testing_functions.jl by making the testing style more uniform. Will add more comments as I spot things. Also, #64 will change all the file names in src/io and this should be reflected in test/io.

Implement function callbacks to make function modifications uniform and streamlined

@exaexa had a great idea of using callbacks as function arguments to modify the function applied to a model. E.g. using user defined bounds on some reactions in FBA etc. In fba(...), fva(...) and pfba(...) I have a bunch of keyword arguments that are largely repeats and should all be wrapped in some way. This is likely a feature the average user will end up using often I think.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.