lcsb-biocore / cobrexa.jl Goto Github PK

View Code? Open in Web Editor NEW

41.0 7.0 8.0 26.38 MB

Constraint-Based Reconstruction and EXascale Analysis

Home Page: http://bit.ly/COBREXA

License: Apache License 2.0

Julia 99.63% Dockerfile 0.03% TeX 0.33%

julia exascale high-performance hpc constraint-based-modeling cobra exascale-analysis

cobrexa.jl's Introduction

COnstraint-Based Reconstruction and EXascale Analysis

Documentation	Tests	Coverage	How to contribute?	Project status

This package provides constraint-based reconstruction and analysis tools for exa-scale metabolic modeling in Julia.

How to get started

Prerequisites and requirements

Operating system: Use Linux (Debian, Ubuntu or centOS), MacOS, or Windows 10 as your operating system. COBREXA has been tested on these systems.
Julia language: In order to use COBREXA, you need to install Julia 1.0 or higher. Download and follow the installation instructions for Julia here.
Hardware requirements: COBREXA runs on any hardware that can run Julia, and can easily use resources from multiple computers interconnected on a network. For processing large datasets, you are required to ensure that the total amount of available RAM on all involved computers is larger than the data size.
Optimization solvers: COBREXA uses JuMP.jl to formulate optimization problems and is compatible with all JuMP supported solvers. However, to perform analysis at least one of these solvers needs to be installed on your machine. For a pure Julia implementation, you may use e.g. Tulip.jl, but other solvers (GLPK, Gurobi, ...) work just as well.

💡 If you are new to Julia, it is advisable to familiarize yourself with the environment first. Use the Julia documentation to solve various language-related issues, and the Julia package manager docs to solve installation-related difficulties. Of course, the Julia channel is another fast and easy way to find answers to Julia specific questions.

Quick start

COBREXA.jl documentation is available online (also for development version of the package).

You can install COBREXA from Julia repositories. Start julia, press ] to switch to the Packaging environment, and type:

add COBREXA

You also need to install your favorite solver supported by JuMP.jl (such as Gurobi, Mosek, CPLEX, GLPK, Clarabel, etc., see a list here). For example, you can install Tulip.jl solver by typing:

add Tulip

Alternatively, you may use prebuilt Docker and Apptainer images.

If you are running COBREXA.jl for the first time, it is very likely that upon installing and importing the packages, your Julia installation will need to precompile their source code from the scratch. In fresh installations, the precompilation process should take less than 5 minutes.

When the packages are installed, switch back to the "normal" julia shell by pressing Backspace (the prompt should change color back to green). After that, you can download a SBML model from the internet and perform a flux balance analysis as follows:

using COBREXA   # loads the package
using Tulip     # loads the optimization solver

# download the model
download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml")

# open the SBML file and load the contents
model = load_model("e_coli_core.xml")

# run a FBA
fluxes = flux_balance_analysis_dict(model, Tulip.Optimizer)

The variable fluxes will now contain a dictionary of the computed optimal flux of each reaction in the model:

Dict{String,Float64} with 95 entries:
  "R_EX_fum_e"    => 0.0
  "R_ACONTb"      => 6.00725
  "R_TPI"         => 7.47738
  "R_SUCOAS"      => -5.06438
  "R_GLNS"        => 0.223462
  "R_EX_pi_e"     => -3.2149
  "R_PPC"         => 2.50431
  "R_O2t"         => 21.7995
  "R_G6PDH2r"     => 4.95999
  "R_TALA"        => 1.49698
  ⋮               => ⋮

The main feature of COBREXA.jl is the ability to easily specify and process a huge number of analyses in parallel. You. You can have a look at a longer guide that describes the parallelization and screening functionality, or dive into the example analysis workflows.

Testing the installation

If you run a non-standard platform (e.g. a customized operating system), or if you added any modifications to the COBREXA source code, you may want to run the test suite to ensure that everything works as expected:

] test COBREXA

Prebuilt images

Docker image is available from the docker hub as lcsbbiocore/cobrexa.jl, and from GitHub container repository. Download and use them as usual with docker:

docker run -ti --rm lcsbbiocore/cobrexa.jl:latest

# or alternatively from ghcr.io
docker run -ti --rm ghcr.io/lcsb-biocore/docker/cobrexa.jl:latest

In the container, you should get a julia shell with the important packages already installed, and you may immediately continue the above tutorial from using COBREXA.

Apptainer (aka Singularity) images are available from GitHub container repository. To start one, run:

singularity run oras://ghcr.io/lcsb-biocore/apptainer/cobrexa.jl:latest

...which gives you a running Julia session with COBREXA.jl loaded.

If you require precise reproducibility, use a tag like v1.2.2 instead of latest (all releases since 1.2.2 are tagged this way).

Acknowledgements

COBREXA.jl is developed at the Luxembourg Centre for Systems Biomedicine of the University of Luxembourg (uni.lu/lcsb), cooperating with the Institute for Quantitative and Theoretical Biology at the Heinrich Heine University in Düsseldorf (qtb.hhu.de).

The development was supported by European Union's Horizon 2020 Programme under PerMedCoE project (permedcoe.eu) agreement no. 951773.

If you use COBREXA.jl and want to refer to it in your work, use the following citation format (also available as BibTeX in cobrexa.bib):

Miroslav Kratochvíl, Laurent Heirendt, St Elmo Wilken, Taneli Pusa, Sylvain Arreckx, Alberto Noronha, Marvin van Aalst, Venkata P Satagopam, Oliver Ebenhöh, Reinhard Schneider, Christophe Trefois, Wei Gu, COBREXA.jl: constraint-based reconstruction and exascale analysis, Bioinformatics, Volume 38, Issue 4, 15 February 2022, Pages 1171–1172, https://doi.org/10.1093/bioinformatics/btab782

cobrexa.jl's People

Contributors

Stargazers

Watchers

Forkers

laurentheirendt exaexa htpusa humasak victorverafrazao vm-vh josepereiro hettiec

cobrexa.jl's Issues

Change CobraModel to FullModel

Type name change

Prettyprinting should not hide any information

Showing incomplete information is often confusing, some parts do not "combine" well.

Use this as a guideline:
http://hackage.haskell.org/package/base-4.15.0.0/docs/Text-Show.html#t:Show

add travis and github actions

this should only be tested on master
to be implemented when package has been released

Bounds should not be sparse vectors, maybe a new data type...

Bound vectors are typically not sparse in the usual sense. They are populated with max/min bounds and not very many zeros. It might make sense to make our own "sparse" vector format where the zeros are actually the max or min bounds. This might save significant storage at the exa-scale...

Overview issue: documentation structure

The docs are more like a collection of ideas now, we should give it a clear tutorial structure.

My idea for the structure is as follows:

Feel free to edit/suggest.

add CodeCov

We need to add Codecov and Coveralls to the testing pipeline

Automatic file loading

I still don't like the automatic file loading thing because it introduce alphabetic ordering issues with package loading, e.g. I have to append "a" in front of reaction.jl to make sure it loads before cobraModel.jl... I think it solves a slight inconvenience of listing all the packages but introduces a bigger inconvenience with having to have creative file names (less descriptive names)...

So far this only seems to affect me, but what do ya'll think?

Name functions consistently with JuMP style?

Idea (for discussion):
JuMP uses snake_case, Julia actually uses lot of snake_case for everything too. Should we go that way too before the package is out?

add doc test to pipeline

doc tests should be run on merge requests, but the documentation should not be deployed

Explain the need of modifying the sampling defaults in docs

Originally posted by @stelmo in #79 (comment)

This is fine (good size for testing quickly) but not realistic sized, we should put in the docs that the user must make these constants much larger.

Fix samplers and create good tests for them

Currently the samplers are not super robust and the testing leaves much to be desired.

Fix ACHR
Add better tests
Add projections to ensure robust sampling in case the samplers go out of bounds

improve readme for beginners

Expand IO (MAT)

original issue: https://git-r3lab.uni.lu/PerMedCoE/COBREXA.jl/-/issues/21

speed up testing pipeline

we need to speed up the testing pipeline

change badge links to cobrexa

(currently the badges point to Elmo's repo)

`loadModel` is broken

There's no haskey() for MAT files

using MAT
file=matopen("test/data/toyModel1.mat")
haskey(file, "model")

ERROR: MethodError: no method matching haskey(::MAT.MAT_v5.Matlabv5File, ::String)

clean up error handling and warnings

src/io/io.jl:

the warning is fishy, probably should be an error
the error only prints an error, should throw

`fluxBalanceAnalysisVec` seems to be missing

I commented the test for this function out and added a @test_broken true as a placeholder. I'm pretty sure this function got lost in translation somewhere :)

Homogenize tests

After #92 the package structure will change somewhat. Let's fix the test layout to match the src directory structure better, and perhaps implement better tests (@stelmo I am looking at you)

SBMLModel should just be CobraModel

SBML.jl imports the dense version of the model on file, we should only have one struct that stores this information (avoid clutter). I think this should be CobraModel since model construction will likely happen in it... ?

add many doctests, convert all examples to doctests

lots of doc tests are actually failing - https://git-r3lab.uni.lu/lcsb-biocore/COBREXA.jl/-/jobs/244834

generate and deploy docker & singularity containers

add container specification files
deploy

establish more elaborate PR and issue templates

Add references to methods where possible

We should put references to algorithms in the docs in the code, Cobrapy does it and I like it, it makes it easier to understand why/what exactly is implemented.

Originally posted by @stelmo in #60 (comment)

generate html files from tutorials automatically

Clean-up downloading of test models

check if the file exists before downloading
always check against a hash and print an error if the hashes do not match, so that we can quickly spot that something fishy happened to the models
preferably wrap the Download.download in something that does all this automagically

Implement knockouts in an efficient way

We should definitely have some way of doing knockouts on models. This makes the most sense for StandardModel. Currently the plan is to add a field to StandardModel:

mutable struct StandardModel <: MetabolicModel
    id::String
    reactions::Array{Reaction,1}
    metabolites::Array{Metabolite,1}
    genes::Array{Gene,1}
    gene_reaction::Dict{Gene, Array{Reaction,1}}
end

This should make looking up reactions affected by deletions quicker than loopings over all reactions for each gene.

Then a new function,

knockout_modification = knockout(model, gene1, gene2, ..., geneN)

needs to made that will intelligently create a callback that can be passed to the modifications argument in the analysis functions to actually do the knockout. WIP

Make sure the accessors for MetabolicModel kinda feature-complete

State:

Consistent model variable naming

it is a bit ugly that JSON and SBML model contain .m but MATModel contains .mat. Either clean up to all .m or use .json and .sbml.

Same for model names used in function parameters, there's m, model and a, with occasional excesses.

Community model + tutorial

load a heap of whatever models
have the model structure re-ID them correctly and add exchange reactions

Make prettyprinting systematic

There's a pretty good package for prettyprinting reasonably without colors, instead with all these nice features like auto-ellipsis and indentation: https://github.com/MechanicalRabbit/PrettyPrinting.jl

We should really use that.

i/o of SBML files

original issue: https://git-r3lab.uni.lu/PerMedCoE/COBREXA.jl/-/issues/19

Implement macros for all functions (where appropriate)

Let's make macros to really make the user interface clean. I have implemented fba macros in #53 (see the last few commits) and @marvinvanaalst has implemented a macro for reaction adding in #59 . @exaexa and I were also talking about this extensively on slack, it would be really cool to have something like this (from slack @exaexa):

vs = @variants
  knockout(123)
  knockout(4345)
  ...
end

@mod_variants! vs
  remove_reaction(123)
end

@combine_variants! vs
  no_modification()
  add_random_ATP()
  add_some_toxin()
end

Currently I have an fba mini version of this working as shown below:

using COBREXA
using Tulip

model = read_model(joinpath("e_coli_core.json"))
biomass = findfirst(model.reactions, "BIOMASS_Ecoli_core_w_GAM")
glucose = findfirst(model.reactions, "EX_glc__D_e")

vec = @flux_balance_analysis_vec model Tulip.Optimizer begin
    modify_objective(biomass)
    modify_constraint(glucose, -8.0, -8.0)
end

We should eventually use StableRNGs for sampling and all other random number generation.

Originally posted by @exaexa in #78 (comment)

Also see the related issue in GigaSOM:
LCSB-BioCore/GigaSOM.jl#147

Fix the code examples after StandardModel has changed

I found a few examples of outdated code in src/sampling/hit_and_run.jl that rely on having StandardModel .reactions as a vector. It would be great to have that cleaned out.

Decorate miscellaneous info&warnings with log topics

...so that one can turn them on and off on demand with the new interface.

@info "Annoying message"

becomes

@_topic_log @info "Default-suppressed annoying message"

Some SBML formats (level 3?) are not read correctly

For example:

download("https://www.vmh.life/files/reconstructions/AGORA/1.03/reconstructions/sbml/Abiotrophia_defectiva_ATCC_49176.xml", "testModel.xml");
model = readSBML("testModel.xml");

findall(getOCs(model)!.=0)
# all zeros
getLBs(model)
# -Inf when should be -1000

storage of large model

original issue: https://git-r3lab.uni.lu/PerMedCoE/COBREXA.jl/-/issues/15

Ongoing design considerations

From our discussion today via Slack (@exaexa @laurentheirendt):

LinearModel -> CoreModel
Creation of SBMLModel, MATModel, JSONModel, (maybe YAMLModel?) types to store models read in from those files. Use all fields supported by the various file types.
Analysis functions should work on all model types. Input: model type. Output: numbers etc.
Reconstruction functions will output StandardModel or CoreModel depending on the input type with restrictions on which type of model depending on the purpose of the reconstruction functions
Squashing models: example: CoreModel and SBMLModel can at most output a CoreModel
Reconstruction functions will have as an input StandardModel or CoreModel
accessors rather than deep copies when converting model types

Migration to github.com

original issue: https://git-r3lab.uni.lu/PerMedCoE/COBREXA.jl/-/issues/17

Wildcard documentation building

The tests and code are getting loaded with wildcards; we should have the same for docs building.

reorganize file contents

Transcript:

Mo  1:03 PM
we should change the file names of "modeling.jl" and "model_manipulations.jl" to something like "manipulate_linearmodel.jl" and "manipulate_fullmodel.jl"
mirek  1:04 PM
or modeling/fullmodel and modeling/linearmodel
would also make it easier to split into doc sections
Mo  1:05 PM
also I think find_exchange_metabolites should be near my exchange_reactions and we should just have one name and dispatch on model type
mirek  1:05 PM
yeah
Mo  1:05 PM
this is of course not for this PR but later
should I make an issue to remind us?
mirek  1:05 PM
same thing for the prettyprinting&misc functions probably, they are now mixed with the model docs
I'll make one

trigger test pipelines in external repos

as we need to stay compatible with certain external packages, triggering their testing pipelines needs to be integrated into our testing cycle

replace format check with bot action to fix automagically

using the extraordinary powers of @cylon-x 🤖

Regroup tests to files that match src/

Lots of tests should be homogenized e.g. test/io/io_test.jl tests read write of StandardModel and test/io/writer.jl does the same thing but in a different way. I think we can get rid of test/testing_functions.jl by making the testing style more uniform. Will add more comments as I spot things. Also, #64 will change all the file names in src/io and this should be reflected in test/io.

fix badges of CI in readme

Implement function callbacks to make function modifications uniform and streamlined

@exaexa had a great idea of using callbacks as function arguments to modify the function applied to a model. E.g. using user defined bounds on some reactions in FBA etc. In fba(...), fva(...) and pfba(...) I have a bunch of keyword arguments that are largely repeats and should all be wrapped in some way. This is likely a feature the average user will end up using often I think.

File cleanup + housekeeping

Housekeeping

Move analysis problem modifications out of reconstruction and into analysis directory

fluxVariabilityAnalysis doesn't check termination status

original issue: https://git-r3lab.uni.lu/PerMedCoE/COBREXA.jl/-/issues/22

remove PyCall Dependency

I think that this is not needed anymore ...