GithubHelp home page GithubHelp logo

sysbiochalmers / yeast-gem Goto Github PK

View Code? Open in Web Editor NEW
89.0 19.0 42.0 69.44 MB

The consensus GEM for Saccharomyces cerevisiae

Home Page: http://sysbiochalmers.github.io/yeast-GEM/

License: Creative Commons Attribution 4.0 International

MATLAB 89.64% Python 4.68% Jupyter Notebook 5.68%
genome-scale-models yeast systems-biology hacktoberfest saccharomyces-cerevisiae matlab biology reconstruction consensus python

yeast-gem's Introduction

yeast-GEM: The consensus genome-scale metabolic model of Saccharomyces cerevisiae

DOI GitHub version Join the chat at https://gitter.im/SysBioChalmers/yeast-GEMMemote history

Description

This repository contains the current consensus genome-scale metabolic model of Saccharomyces cerevisiae. It is the continuation of the legacy project yeastnet. For the latest release please click here.

Citation

  • If you use yeast-GEM please cite the yeast9 paper:

    Zhang, C. et al. Yeast9: a consensus yeast metabolic model enables quantitative analysis of cellular metabolism by incorporating big data. bioRxiv (2023) doi:10.1101/2023.12.03.569754

  • For pre-yeast9 versions:

    Lu, H. et al. A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism. Nature Communications 10, 3586 (2019). doi:10.1038/s41467-019-11581-3

  • Additionally, all yeast-GEM releases are archived in Zenodo, for you to cite the specific version of yeast-GEM that you used in your study, to ensure reproducibility. You should always cite the original publication + the specific version, for instance:

    The yeast consensus genome-scale model [Lu et al. 2019], version 8.3.4 [Sánchez et al. 2019], was used.

    Find the citation details for your specific version here.

Keywords

Utilisation: experimental data reconstruction; multi-omics integrative analysis; in silico strain design; model template
Field: metabolic-network reconstruction
Type of model: reconstruction; curated
Model source: YeastMetabolicNetwork
Omic source: genomics; metabolomics
Taxonomic name: Saccharomyces cerevisiae
Taxonomy ID: taxonomy:559292
Genome ID: insdc.gca:GCA_000146045.2
Metabolic system: general metabolism
Strain: S288C
Condition: aerobic, glucose-limited, defined media

Model overview

Taxonomy Latest update Version Reactions Metabolites Genes
Saccharomyces cerevisiae 04-Dec-2023 9.0.0 4130 2805 1162

Gene essentiality prediction

  • Accuracy: 0.882
  • True non-essential genes: 928
  • True essential genes: 63
  • False non-essential genes: 95
  • False essential genes: 38

Growth prediction

  • Correlation coefficient R2: 0.880

Growth curve

Installation & usage

Obtain model

You can obtained the model by any of the following methods:

  1. If you have a Git client installed on your computer, you can clone the main branch of the yeast-GEM repository.
  2. You can directly download the latest release as a ZIP file.
  3. If you want to contribute to the development of yeast-GEM (see below), it is best to fork the yeast-GEM repository to your own Github account.

Required software

Basic user

If you want to use the model for your own model simulations, you can use any software that accepts SBML L3V1 FBCv3 formatted model files. This includes any of the following:

Please see the installation instructions for each software package.

Developer

  • MATLAB-based
    If you want to contribute to the development of yeast-GEM, or otherwise want to run any of the provided MATLAB functions, then the following software is required:

  • Python-based
    Contribution via python (cobrapy) is not yet functional. In essence, if you can retain the same format of the model files, you can still contribute to the development of yeast-GEM. However, you cannot use the MATLAB functions.

    If you want to use any of the provided Python functions, you may create an environment with all requirements:

    pip install -r code/requirements/requirements.txt  # install all dependencies
    touch .env # create a .env file for locating the root

If you want to locally run memote run or memote report history, you should also install git lfs, as results.db (the database that stores all memote results) is tracked with git lfs.

Model usage

Make sure to load/save the model with the corresponding wrapper functions:

  • In Matlab:
    cd ./code
    model = loadYeastModel(); % loading
    saveYeastModel(model);    % saving
    • If RAVEN is not installed, you can also use COBRA-native functions (readCbModel, writeCbModel), but these model-files cannot be committed back to the GitHub repository.
  • In Python:
    Before opening Python, the following command should (once) be run in the yeast-GEM root folder:
    touch .env # create a .env file for locating the root
    Afterwards, the model can be loaded in Python with:
    import code.io as io
    model = io.read_yeast_model() # loading
    io.write_yeast_model(model)   # saving

Online visualization/simulation

  • You can visualize selected pathways of yeast-GEM and perform online constraint-based simulations using Caffeine, by creating a simulation with the latest yeast-GEM version available, and choosing any S. cerevisiae map (currently only iMM904 maps are available). Learn more about Caffeine.
  • Additionally, you can interactively navigate model components and visualize 3D representations of all compartments and subsystems of yeast-GEM at Metabolic Atlas. Learn more about Metabolic Atlas.

Contributing

Contributions are always welcome! Please read the contributions guideline to get started.

Contributors

Code contributors are reported automatically by GitHub under Contributors, while other contributions come in as Issues.

yeast-gem's People

Contributors

ae-tafur avatar benjasanchez avatar cheng-yu-zhang avatar demilappa avatar dependabot[bot] avatar edkerk avatar eiden309 avatar feiranl avatar gitter-badger avatar hongzhonglu avatar mihai-sysbio avatar snmendoz avatar tpfau avatar wtscott31 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yeast-gem's Issues

fix: drop storage of binaries

We might need to rethink storing the binary .mat file, at least in devel and the feature branches. If 2 projects are working in parallel, this will immediately create conflicts in that file, so changes from one PR would be lost as soon as a PR from the other branch merges. If we want to still offer the .mat version we could have it only in master, being created for each release in the update version commit.

@demilappa @mihai-sysbio @simas232 @edkerk ??

fix: flux prediction through PPP

These days, I helped Liming to do some flux analysis and found that the present yeast model can't predict the flux through PPP pathway. While the model corrected by Rui could do it well. So it is essential to do some correction based on the present yeast7.8.1.
@BenjaSanchez Have you meet such an issue during the flux analysis ?

duplicated reactions

Description of the issue:

@BenjaSanchez I found 5 duplicated reactions. They are the same but written in the opposite direction.

  1. 'r_2115' 'r_0163'
    '1 s_0359[c] + 1 s_0794[c] + 1 s_1203[c] -> 1 s_0680[c] + 1 s_1198[c]'
    '1 s_0680[c] + 1 s_1198[c] -> 1 s_0359[c] + 1 s_0794[c] + 1 s_1203[c]'

  2. 'r_0919' 'r_0342'
    '1 s_1085[er] + 1 s_1366[er] -> 1 s_0481[er]'
    '1 s_0481[er] -> 1 s_1085[er] + 1 s_1366[er]'

  3. 'r_0920' 'r_0343'
    '1 s_0507[er] + 1 s_1366[er] -> 1 s_0484[er]'
    '1 s_0484[er] -> 1 s_0507[er] + 1 s_1366[er]'

  4. 'r_1760' 'r_1148'
    '1 s_0666[c] <=> 1 s_0665[ce]'
    '1 s_0665[ce] <=> 1 s_0666[c]'

  5. 'r_4566' 'r_4232'
    '1 s_0340[c] + 1 s_0394[c] + 1 s_0794[c] -> 1 s_0434[c] + 1 s_3875[c]'
    '1 s_0434[c] + 1 s_3875[c] -> 1 s_0340[c] + 1 s_0394[c] + 1 s_0794[c]'

I would suggest to remove one of each pair of duplicated reactions and change to reversible the reaction that is kept.

This new issue is in relation with the issue #172. It is important to avoid duplicated reactions because it would facilites the translation of the model to BIGG identifiers.

If there is a particular reason to keep these duplicated reactions? If there is can handle them for the translation to BIGG identifiers (#172). Please, tell me what you think or decide.

If you agree, please assign this task to me

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

feat: biomass pseudo-reaction requirements

In the current yeastGEM, it lacks some specific cofactors and metal irons, which need to be added with the improvement of the model coverage when hundreds of new reactions will be added to the yeastGEM.

fix: gene ids to change in COBRA

Older COBRA versions replaced geneIDs with numbered ids (g1, g2 etc.). However fbc:id in SBML and .genes in COBRA and RAVEN should contain the systematic gene names (YAB123C), while fbc:label in SBML and .gene(Short)Names in COBRA and RAVEN should contain abbreviations such as HXK1 etc.

The newest COBRA shouldn't do this replacement anymore. @BenjaSanchez Can you please export the model with the newest COBRA version and check that the field <fbc:geneProduct fbc:id= at the end of the SBML file no longer contains the numbered geneIDs?

feat: indicate origin of mets and rxns added in recent updates

Following what we made in PR #156, we should include both in metNotes or rxnNotes why were the corresponding mets/rxns added for the following past updates to the model:

  • SLIMEr update: For mets: "pseudo-metabolite part of the SLIMEr formalism (PR #112)". For rxns: "pseudo-reaction part of the SLIMEr formalism (PR #112) - units in mg/gDWh"
  • New annotation update: For both mets and rxns indicate "added after new annotation (PR #142)"
  • Biolog update: For both mets and rxns indicate " added after the Biolog update (PR #149)"

For consistency, we should also update the notes added in #156:

  • Metabolomics update: For mets: "added from metabolomics data (PR #156)". For rxns: "metabolites observed in metabolomics data (PR #156)"

I hereby confirm that I have:

  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

feat: track COBRA/RAVEN versions

We have started tracking some dependencies, so now if a commit in a feature branch creates several changes we can easily detect if this is due to different dependencies in someone's setup. However I couldn't find a way of tracking the COBRA version, as they are not properly releasing their stable versions nor keeping track of it in their repo. So as long as they don't address this it's not possible to track the COBRA version. Maybe the latest commit in the master branch? @demilappa @mihai-sysbio input here would be appreciated. @edkerk @simas232 @Hao-Chalmers how is RAVEN handling this? Do you have some file in the repo that states the version number? As example see SBML toolbox, which makes it easy for us to keep track of the version.

@feiranl @hongzhonglu I have only tested the tracking of dependencies in a Windows platform. Could you help me testing the devel branch in your own computers and letting me know if the file dependencies.txt gets properly created?

fix: metCharges field

Original Yeast 7.6 model has metabolite charge values only for metabolites, for which such data is available. In this repository we falsively assumed the neutral charge for metabolites with missing charge information. As FBC is not forcing to define charge values for all metabolites, our changes should be reverted back how it was in original Yeast 7.6.

The only thing needs to be done here is to change charge values for relevant metabolites to NaN's in Matlab model and then generate new xml and txt files.

compatibility of identifiers with BIGG database

Description of the issue:

Dear all,

I would like to use yeast-GEM as a template for reconstructing other yeast metabolic networks. For example, I could use RAVEN, AuReMe or MetaDraft to create draft models for my yeast from yeast-GEM.
The problem is that If I would want to use other templates in addition to yeast-GEM (for example, models from the BIGG database) the automatic merge of the output draft networks (from each template) wouldn't be possible because of identifiers incompatibility.

For me, it would be of great help if the model would be written with BIGG identifiers but I can imagine a series of problems:

  1. It would be time-consuming to change all the identifiers
  2. It could be possible that some metabolites in the model have different charges with regard to the ones in the BIGG database, so strictly, they are not the same because they would have a different chemical formula.
  3. It could be possible that the one-to-one metabolite mapping would not be possible because sometimes two metabolites (alpha-D-glucose, beta-D-glucose) are represented as one in other databases (D-glucose)

@BenjaSanchez already mentioned two additional problems
Backward compatibility: People is already using the model as it is so new identifiers would mean that their current code won't work anymore.
Possible solution: I could make a script with a dictionary, so people could convert one format to another with this script
Missing IDs: what should we do with identifiers that are not in the BIGG database?
Possible solution: create new identifiers in a systematic and automatic way.

What do you think? I already have some scripts to change identifiers, so I think it wouldn't be so problematic for me to make this change. If I change the identifiers, would be worth it for other people too?

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

style: repo cleanup/order

  • move all data to ComplementaryData
  • drop all .mat files
  • change .gitignore to prevent undesired files
  • on master, allow certain files (.mat & .xlsx)

fix: basic information tests from memote

Description of the issue:

From the latest memote report, the following basic information tests are giving unexpected results:

  • Only 1 unique metabolite: discussed in #106 (comment)
  • 684 reactions appear as duplicated (due to repeated annotation ids)
  • Almost all rxns are detected as transport
  • NGAM is not detected (the test errors)

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

abbreviation of metabolite name for yeast map

@edkerk @BenjaSanchez @feiranl @zhengmingzhu
As we are drawing the yeast map, we find some names of metabolites are so long, it is not convenient to display them! So it is better to use the short names! The followed are some examples:
3-(3-sn-phosphatidyl)-sn-glycerol 1-phosphate (1-16:0, 2-16:1)[mm]
3-(3-sn-phosphatidyl)-sn-glycerol 1-phosphate (1-16:1, 2-16:1)[mm]
3-(3-sn-phosphatidyl)-sn-glycerol 1-phosphate (1-18:0, 2-16:1)[mm]
3-(3-sn-phosphatidyl)-sn-glycerol 1-phosphate (1-18:1, 2-16:1)[mm]
3-(3-sn-phosphatidyl)-sn-glycerol 1-phosphate (1-16:0, 2-18:1)[mm]
3-(3-sn-phosphatidyl)-sn-glycerol 1-phosphate (1-16:1, 2-18:1)[mm]

Do you have any suggestion about this issue?

fix: a compartment error in one reaction

Description of the issue:

  • r_4237 is same to r_4238, as only h2o’s compartment is different.

Expected feature/value/output:

  • r_4238 should be a reaction which transports Ca(2+) into Golgi.

Current feature/value/output:

r_4237 Calcium-transporting ATPase 2 (EC 3.6.3.8) (Vacuolar Ca(2+)-ATPase) H2O[v] + Ca(2+)[c] + ATP[v] => H+[v] + phosphate[v] + Ca(2+)[v] + ADP[v] 3.6.3.8 YGL006W
r_4238 Calcium-transporting ATPase 2 (EC 3.6.3.8) (Vacuolar Ca(2+)-ATPase) H2O[c] + Ca(2+)[c] + ATP[v] => H+[v] + phosphate[v] + Ca(2+)[v] + ADP[v] 3.6.3.8 YGL167C

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

fix: delete anaerobic branch

Considering that:

  • Now that there is a lot of active development of the model, it is taking to much time (and too many pull requests) to maintain up to date the anaerobic branch of the model as well.
  • The only difference in the anaerobic branch are a couple of changes.
  • For now we don't have any intentions of further improving anaerobic metabolism predictions.

I'm wondering if it is worth it to keep the anaerobic branch alive. I think the best is to delete this branch and just keep the anaerobicModel.m file available for anyone that wants to try those conditions. Any thoughts on this @demilappa @shaqHosseini @edkerk @hongzhonglu ?

feat: confidence scores for reactions

Based on the confidence scores of reactions, we can add scores for reactions yeast model.

Confidence Score Examples
Biochemical data 4 Direct evidence for gene product function and biochemical reaction: Protein purification, biochemical assays, experimentally solved protein structures, and comparative gene-expression studies.
Genetic data 3 Direct and indirect evidence for gene function: Knock–out characterization, knock-in characterization, and over-expression.
Physiological data 2 Indirect evidence for biochemical reactions based on physiological data: secretion products or defined medium components serve as evidence for transport and metabolic reactions.
Sequence data 2 Evidence for gene function: Genome annotation, SEED annotation32.
Modeling data 1 No evidence is available but reaction is required for modeling. The included function is a hypothesis and needs experimental verification. The reaction mechanism may be different from the included reaction(s).
Not evaluated 0 -

Based on the confidence scores, it can be more clear for us to improve the model quality consistently, how do you think of it ? @ALL.

fix: add rxnGeneMat to COBRA structure

@BenjaSanchez @edkerk @feiranl
When we do the essential gene analysis, we meet the followed errors, maybe we should add the field 'rxnGeneMat'.The same analysis for the original yeast7.6 is OK.

Have you meet the same issue?

grRatio = singleGeneDeletion(model)
Single gene deletion analysis in progress ...
0% [ ]Reference to non-existent field 'rxnGeneMat'.

Error in deleteModelGenes (line 71)
rxnInd = find(any(model.rxnGeneMat(:,geneInd),2));

Error in singleGeneDeletion (line 130)
[modelDel,hasEffect(i),constrRxnNames] =
deleteModelGenes(model,geneList{i});

ATP/NADH issue

In our model, 1 NADH is equivalent to 1 NADPH. 1 NADH can produce 1.266ATP through respiratory chain. However, when you set one glucose as input, it can produce 12 NADH but produce 20.5253 ATP if you use NGAM reaction as the objective due to the hydrogen[c] serve as the proton pump in the model. Do you think it is a problem we should fix?

fix: subSystems field

Currently the fields rxnECNumbers & subSystems are retrieved from KEGG & Swissprot, using an automatic script. However, this leads to many cases in which there is more than one match, undesired in the case of subsystems. At some previous version of the Yeast model, this information was indeed present, but someone deleted those fields. @hongzhonglu could you look further into this? The desired data should be available in the original sourceForge repository

fix: add missing annotation

The following information was lost since original Yeast 7.6

  • for reactions: pubmed
  • for genes: kegg.genes, sgd, uniprot

restore 'boundary' compartment

The model in this repository lacks 'boundary' compartment, unlike original Yeast 7.6.

It seems that the COBRA toolbox I/O functions are not amenable for models curation containing 'boundary' compartment as this compartment is always lost during I/O cycle. The corresponding functions from the RAVEN toolbox are useful for this purpose, but this would require an update for GEMs SOP.

feat: yaml as format for better diff

Can we move from using .txt to .yaml to diff changes between models? Then we can really check all changes in the XML models, instead of just changes related to stoichiometry, gene association, metabolite identifiers etc.

Not sure what would be best implementation:

  1. Using a hook makes it easy, don't need to make yaml on own computer. However, as I understand this, the yaml is only made after the commit?
  2. Generate yaml on own computer and include in commit. This way changes are already visible in commit stage, but not sure how easy it is to implement xml->yaml on own computer.

restore the list of Yeast 7.6 genes

The original Yeast 7.6 model contains 988 genes. Such number is higher than in this repository, because original model contained gene compartment information. So these 988 genes are the same 909 genes as in this repository, just in original Yeast 7.6 several of them are included in several compartments. As readCbModel omits gene compartments, it merges such overlapping genes into a non-redundant genes set thereby dealing with gene-reaction associations in the corresponding way.

The main question is whether we should retain gene compartment information. Such knowledge doesn't really contribute in growth simulations, but may be useful in more specific, compartment-wide computational approaches. If the answer is 'yes', then we can use the original gene IDs how they were in Yeast 7.6, e.g. e_0765. We would also need to discuss about the way how gene compartments should be exported to SBML, as it's designated field is not included in SBML level 3. Here I included several ideas:
a) if sticking with the COBRA toolbox, then we can add comparment information to the end of gene ID, the same way how SBML L2 COBRA format looked like, e.g. e_0765_c or e_0765[c].
b) if using I/O function for the RAVEN toolbox (+changed GEMs SOP), we can quickly tailor I/O functions as now it is not possible to export gene compartments if exporting to SBML L3.

fix: remove genes not found in S288C annotation

Description of the issue:

I was contacted by @snmendoz who pointed out that 2 of the current genes in the model cannot be found in the annotation of strain S288C, according to NCBI: YAR069W-A and YHR214W-F. @hongzhonglu do you see these genes in any of the newly added annotations? If so, are the annotations of those genes publicly available somewhere?

I hereby confirm that I have:

  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

feat: add gene annotation for transport reactions

Now there are 1029 transport reactions in total while only 174 transport reactions with gene relation. It is essential to add model genes associations for these transport reactions without gene relation.

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

feat: data files in repository

@ALL, I want to add a dataSource in our yeast model file. So the related data used for model update can be stored and reused. Do you have any suggestion about the dataSource file ?

fix: some SBO tests failing in memote

Description of the issue:

From the latest memote report, the following SBO tests are still causing problems:

  • Reactions:
    • Wrong scores for metabolic and transport reactions, as most reactions are being detected as transport. Discussed in #106 (comment)
    • No sink reactions detected, even though there are 3.
  • Genes: Currently there's no option for saving them with cobratoolbox.

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

fix: metabolite annotation information

@edkerk @BenjaSanchez, Dear both, these days we just further check the metabolite annotation information. Presently, about 65 old chebiID in yeast7.7 has been corrected. Now about total 906 metabolites has specific chebiID (total 1059 unique metabolites in model, 780 metabolites has chebiID in original yeast7.7). However, there still 153 metabolite which we could not find chebiID for them. But for part of them, there are general chebiID, so I think we can add the general chebiID for these, so how do you think of this idea?

For example, these followed metabolites come from 1-monolysocardiolipin (CHEBI:65106)
monolysocardiolipin (2-16:1, 3-16:0, 4-16:1)
monolysocardiolipin (2-16:1, 3-16:1, 4-16:1)
monolysocardiolipin (2-16:1, 3-18:0, 4-16:1)
monolysocardiolipin (2-16:1, 3-18:1, 4-16:1)
monolysocardiolipin (2-16:1, 3-16:0, 4-18:1)
monolysocardiolipin (2-16:1, 3-16:1, 4-18:1)
monolysocardiolipin (2-18:1, 3-16:0, 4-16:1)
monolysocardiolipin (2-18:1, 3-16:1, 4-16:1)
monolysocardiolipin (2-18:1, 3-18:0, 4-16:1)
monolysocardiolipin (2-18:1, 3-18:1, 4-16:1)
monolysocardiolipin (2-18:1, 3-16:0, 4-18:1)
monolysocardiolipin (2-18:1, 3-16:1, 4-18:1)
monolysocardiolipin (1-16:0, 2-16:1, 4-16:1)
monolysocardiolipin (1-16:0, 2-16:1, 4-18:1)
monolysocardiolipin (1-16:1, 2-16:1, 4-16:1)
monolysocardiolipin (1-16:1, 2-16:1, 4-18:1)
monolysocardiolipin (1-18:0, 2-16:1, 4-16:1)
monolysocardiolipin (1-18:0, 2-16:1, 4-18:1)
monolysocardiolipin (1-18:1, 2-16:1, 4-16:1)
monolysocardiolipin (1-18:1, 2-16:1, 4-18:1)
monolysocardiolipin (1-16:0, 2-18:1, 4-16:1)
monolysocardiolipin (1-16:0, 2-18:1, 4-18:1)
monolysocardiolipin (1-16:1, 2-18:1, 4-16:1)
monolysocardiolipin (1-16:1, 2-18:1, 4-18:1)
So we can add a general chebiID 65106 for these series of metabolites.

fix: inter-OS issues

Description of the issue:

When switching between MAC and Windows machines to save the model, stoichiometric coefficients that use scientific notation are stored differently in the .xml file: MAC stores them as, e.g., 6e-05, but Windows stores them as 6e-005. This creates unnecesary changes in the .xml file when diffing (example here), eventhough the .txt file (and now also the .yml file) remain the same. We could adapt saveYeastModel() to fix this after calling writeCbModel(), but maybe there is a better way... @tpfau do you know if this is due to COBRA toolbox or libSBML? As the .txt file remains the same I would think it's due to the latter. Would there be in either case an easy fix?

Example:

Trying an IO cycle in Windows and MAC:

model = loadYeastModel;
saveYeastModel(model);

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

fix: stoichiometric consistency due to SLIME rxns

Description of the issue:

In the consistency tests from the memote report, the "Stoichiometric Consistency" dropped significantly after adding SLIME reactions, as these reactions are by definition mass unbalanced. We should find a way of fixing this, by e.g. modifying the chemical formulas of the backbone and/or chain pseudo-metabolites.

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

fix: compatibility with memote

Description of the issue:

We would like to include memote as provider of QC, as we are about to include many new genes/reactions/mets, and it would be best to see how the model changes as we include them. In order to do that we need to:

  • Solve issue #19 (a requirement for memote is to have an sbml-compliant file)
  • As the current report has several problems, the model should also be fixed in order to yield proper results. This includes:
    • Scored tests: No SBO terms detected for metabolites/reactions/genes. This is probably leading to some of the other problems, so should be fixed first.
    • Unscored tests:
      • Basic information:
        • Only 1 unique metabolite
        • Several errored scores due to absence of SBO terms
        • NGAM is not detected
        • Model identifier is "COBRAmodel", should be "yeastGEM"
      • Biomass: Need to specify with SBO term

Please comment here if anyone finds more problems using memote @hongzhonglu @feiranl @edkerk @ChristianLieven @Midnighter

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Checked that a similar issue does not exist already

fix: metabolite id compliance with cobrapy

Description of the issue:

Following up on this tweet, a small annoyance for cobrapy users of the .xml file is the metabolite id format metID[comp] from COBRA toolbox, as the characters [] are stored in an odd way for SBML compatibility, and then read literally by cobrapy.

Example:

  • Original metabolite ID: s_0001[c]
  • Stored in .xml file/opened by cobrapy: s_0001__91__c__93__
  • How it should be opened by cobrapy: s_0001_c

@tpfau are there any plans of making te resulting .xml file compatible in this sense with cobrapy? Is there any other way of storing the compartment information in the metabolite id other than with [] characters?

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

fix: drop SBML toolbox dependency

COBRA stopped requiring SBML toolbox as a dependency, so we should look into dropping the dependency (this includes correcting saveYeastModel.m and updating the readme file).

fix: biomass tests from memote

Description of the issue:

From the latest memote report, the following biomass tests are giving unexpected results:

  • Biomass consistency = 0
  • GAM is not detected (the test errors) is detected
  • Unrealistic Growth Rate In Default Condition = true false
  • Ratio of Direct Metabolites in Biomass Reaction = 0.7 0.14
  • Number of Missing Essential Biomass Precursors = 37 30

Many of these tests failing may be a result of an improper detection of our biomass pseudoreaction as "clustered", i.e. separated into protein pseudoreaction, lipid pseudoreaction, etc.

More relevant info at opencobra/memote#243

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

fix: lipid biomass composition

The expansive description of lipid metabolism is confusing, so I might be wrong in the following. To my understanding, the model doesn't specify a distribution of different FA chain lengths, but demands all FA chains in equal amounts (for majority of lipid metabolism, sterol esters seem specific):

There are all these individual reactions that build up for instance TAG(16:0, 18:1, 18:1).

oleoyl-CoA[erm] + diglyceride (1-16:0, 2-18:1)[erm] => coenzyme A[erm] + triglyceride (1-16:0, 2-18:1, 3-18:1)[erm]

They use specific acyl-CoAs (so no pooled pseudometabolite), and nowhere in those reactions is there any specification on abundant each fatty acid is. With that in mind, it would be cheapest to make TAG(16:0, 16:1, 16:0), and this is what I actually see when I run FBA and minimize the number of fluxes.

The model has so-called 'ISA' reactions that 'converts' FA-chain specific TAG species into a generic TAG species:

triglyceride (1-16:0, 2-18:1, 3-18:1)[erm] => 0.67901 triglyceride[erm]

I don't understand what these coefficients mean, but they seem to connect to the chain length (e.g. triglyceride (1-18:0, 2-18:1, 3-16:0) gets the same coefficient, even though it has different number of saturations).

These generic lipid species are then used in the lipid pseudoreaction:

[...] + 0.000206 fatty acid[c] + [...] + 0.000781 triglyceride[c] + 1.5e-05 zymosterol[c] => lipid[c]

So nowhere along that path is there any specification of distribution of FA chain lengths & saturation, all TAG species are as likely to be made, with some correction for the amount of carbons (but not hydrogens, as the two species mentioned above with similar coefficients do have different molecular weights).

The ISA reactions for fatty acids have no influence in this for two reasons:

palmitate[c] => 0.61538 fatty acid[c]

1: the coefficients are again just representing the number of carbons, not any measured abundance
2: fatty acid[c] is only used in the lipid pseudoreaction to represent free fatty acids, it is not used to be incorporated in any other lipid species.

fix: incorrect metabolite annotation

Description of the issue:

There are multiple metabolites with incorrect annotations (CHEBI, KEGG).

Expected feature/value/output:

s_0511, s_0512 and s_0513 are all choline in different compartments, correctly annotated with CHEBI:15354 and KEGG C00114.

In addition, s_2807 is also annotated with those two CHEBI and KEGG IDs, even though it's (S)-3-hydroxyhexacosanoyl-CoA. Meanwhile, s_0045, which is the same compound but now located in the peroxisome instead of ER membrane is correctly annotated with CHEBI:52976

Current feature/value/output:

The RAVEN function checkModelStruct indicates that the following annotations are repeated for metabolites with different names:

WARNING: The following MIRIAM strings are associated to more than one unique metabolite name:
	chebi/CHEBI:15354
	chebi/CHEBI:15377
	chebi/CHEBI:15428
	chebi/CHEBI:15471
	chebi/CHEBI:15676
	chebi/CHEBI:15698
	chebi/CHEBI:15699
	chebi/CHEBI:15740
	chebi/CHEBI:15837
	chebi/CHEBI:15873
	chebi/CHEBI:15891
	chebi/CHEBI:16004
	chebi/CHEBI:16024
	chebi/CHEBI:16182
	chebi/CHEBI:16235
	chebi/CHEBI:16335
	chebi/CHEBI:16347
	chebi/CHEBI:16450
	chebi/CHEBI:16638
	chebi/CHEBI:16643
	chebi/CHEBI:16708
	chebi/CHEBI:16750
	chebi/CHEBI:16810
	chebi/CHEBI:16865
	chebi/CHEBI:16933
	chebi/CHEBI:16947
	chebi/CHEBI:16977
	chebi/CHEBI:16988
	chebi/CHEBI:17038
	chebi/CHEBI:17071
	chebi/CHEBI:17108
	chebi/CHEBI:17115
	chebi/CHEBI:17191
	chebi/CHEBI:17203
	chebi/CHEBI:17295
	chebi/CHEBI:17368
	chebi/CHEBI:17536
	chebi/CHEBI:17549
	chebi/CHEBI:17562
	chebi/CHEBI:17596
	chebi/CHEBI:17600
	chebi/CHEBI:17754
	chebi/CHEBI:17836
	chebi/CHEBI:17924
	chebi/CHEBI:18050
	chebi/CHEBI:24636
	chebi/CHEBI:27689
	chebi/CHEBI:27750
	chebi/CHEBI:27989
	chebi/CHEBI:28789
	chebi/CHEBI:28938
	chebi/CHEBI:29033
	chebi/CHEBI:29806
	chebi/CHEBI:29985
	chebi/CHEBI:30849
	chebi/CHEBI:31725
	chebi/CHEBI:32682
	chebi/CHEBI:36655
	chebi/CHEBI:4167
	chebi/CHEBI:48943
	chebi/CHEBI:48945
	chebi/CHEBI:49000
	chebi/CHEBI:50569
	chebi/CHEBI:50585
	chebi/CHEBI:57457
	chebi/CHEBI:57925
	chebi/CHEBI:58210
	chebi/CHEBI:58297
	chebi/CHEBI:58343
	chebi/CHEBI:62014
	chebi/CHEBI:62501
	chebi/CHEBI:70712
	chebi/CHEBI:72001
	chebi/CHEBI:75074
	chebi/CHEBI:87781
	chebi/CHEBI:88008
	chebi/CHEBI:88980
	chebi/CHEBI:88984
	chebi/CHEBI:89019
	chebi/CHEBI:89763
	chebi/CHEBI:89765
	chebi/CHEBI:89959
	chebi/CHEBI:90051
	kegg.compound/C00001
	kegg.compound/C00025
	kegg.compound/C00026
	kegg.compound/C00031
	kegg.compound/C00037
	kegg.compound/C00041
	kegg.compound/C00048
	kegg.compound/C00051
	kegg.compound/C00054
	kegg.compound/C00058
	kegg.compound/C00061
	kegg.compound/C00062
	kegg.compound/C00064
	kegg.compound/C00065
	kegg.compound/C00073
	kegg.compound/C00079
	kegg.compound/C00080
	kegg.compound/C00114
	kegg.compound/C00116
	kegg.compound/C00121
	kegg.compound/C00122
	kegg.compound/C00127
	kegg.compound/C00147
	kegg.compound/C00148
	kegg.compound/C00158
	kegg.compound/C00159
	kegg.compound/C00212
	kegg.compound/C00216
	kegg.compound/C00242
	kegg.compound/C00245
	kegg.compound/C00256
	kegg.compound/C00259
	kegg.compound/C00262
	kegg.compound/C00263
	kegg.compound/C00266
	kegg.compound/C00294
	kegg.compound/C00318
	kegg.compound/C00334
	kegg.compound/C00352
	kegg.compound/C00387
	kegg.compound/C00407
	kegg.compound/C00430
	kegg.compound/C00475
	kegg.compound/C00499
	kegg.compound/C00504
	kegg.compound/C00517
	kegg.compound/C00526
	kegg.compound/C00568
	kegg.compound/C00794
	kegg.compound/C00849
	kegg.compound/C00881
	kegg.compound/C01342
	kegg.compound/C01551
	kegg.compound/C01571
	kegg.compound/C01694
	kegg.compound/C01722
	kegg.compound/C02223
	kegg.compound/C02504
	kegg.compound/C02944
	kegg.compound/C03221
	kegg.compound/C03479
	kegg.compound/C04525
	kegg.compound/C05272
	kegg.compound/C05275
	kegg.compound/C05853
	kegg.compound/C07328
	kegg.compound/C07329
	kegg.compound/C12296
	kegg.compound/C14818

Reproducing these results:

checkModelStruct(model,true,false)

(note that the current checkModelStruct version has a bug that limits the output to the first 10 mistakes)
I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already

fix: chemical formulas field

When using http://sbml.org/validator/ on the ecYeast7 sbml file, 50 errors appear of the sort:

Error (SBML Validation Rule #fbc-20303): The value of attribute 'fbc:chemicalFormula' on the SBML object must be set to a string consisting only of atomic names or user defined compounds and their occurrence. Reference: L3V1 Fbc V2 Section 3.4 Encountered '(' when expecting a capital letter.

so these formulas should be fixed

fix: correction of gene relation for transport reactions

When compared with the gene annotation in database of SGD and uniprot, it could be found that there
are multiple errors in the gene relations for transport reaction, such as the genes for r_1101, r_1133. Thus, we need to correct them.
The transport rxn list need to be checked:
r_1101
r_1133
r_1173
r_1183
r_1238

I hereby confirm that I have:

  • Tested my code with all requirements for running the model
  • Done this analysis in the master branch of the repository
  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

Add gene annotation for aminophospholipid transport in model

@edkerk @BenjaSanchez @feiranl
Recently, we check the new GPRs from database, we have got the followed reaction:
phospholipid (in) + ATP + H20 <=> phospholipid (out) + ADP + phosphate + H+
YMR162C, YAL026C, YER166W, YIL048W, YDR093W

So based on the annotation of above genes, the phospholipid could be phosphatidylserine (PS), phosphatidylethanolamine (PE) and phosphatidylcholine (PC),

In our present model, there are a lot of transport reactions about these three metabolites, like:

phosphatidyl-L-serine (1-16:0, 2-16:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-16:0, 2-16:1) [Golgi membrane]
phosphatidyl-L-serine (1-16:1, 2-16:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-16:1, 2-16:1) [Golgi membrane]
phosphatidyl-L-serine (1-18:0, 2-16:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-18:0, 2-16:1) [Golgi membrane]
phosphatidyl-L-serine (1-18:1, 2-16:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-18:1, 2-16:1) [Golgi membrane]
phosphatidyl-L-serine (1-16:0, 2-18:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-16:0, 2-18:1) [Golgi membrane]
phosphatidyl-L-serine (1-16:1, 2-18:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-16:1, 2-18:1) [Golgi membrane]
phosphatidyl-L-serine (1-18:0, 2-18:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-18:0, 2-18:1) [Golgi membrane]
phosphatidyl-L-serine (1-18:1, 2-18:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-18:1, 2-18:1) [Golgi membrane]

According to the gene annotation, these reactions should be changed and the ATP should be put in the reactions.
The question is that the similar reaction is more in our model. If we change one, the others will be also changed in reactions formula though the compartment lacks gene annotation evidence. How do we handle this condition ?

phosphatidyl-L-serine (1-16:0, 2-16:1) [endoplasmic reticulum membrane] <=> phosphatidyl-L-serine (1-16:0, 2-16:1) [vacuolar membrane]
phosphatidyl-L-serine (1-16:1, 2-16:1) [endoplasmic reticulum membrane] <=> phosphatidyl-L-serine (1-16:1, 2-16:1) [vacuolar membrane]
phosphatidyl-L-serine (1-18:0, 2-16:1) [endoplasmic reticulum membrane] <=> phosphatidyl-L-serine (1-18:0, 2-16:1) [vacuolar membrane]
phosphatidyl-L-serine (1-18:1, 2-16:1) [endoplasmic reticulum membrane] <=> phosphatidyl-L-serine (1-18:1, 2-16:1) [vacuolar membrane]
phosphatidyl-L-serine (1-16:0, 2-18:1) [endoplasmic reticulum membrane] <=> phosphatidyl-L-serine (1-16:0, 2-18:1) [vacuolar membrane]
phosphatidyl-L-serine (1-16:1, 2-18:1) [endoplasmic reticulum membrane] <=> phosphatidyl-L-serine (1-16:1, 2-18:1) [vacuolar membrane]
phosphatidyl-L-serine (1-18:0, 2-18:1) [endoplasmic reticulum membrane] <=> phosphatidyl-L-serine (1-18:0, 2-18:1) [vacuolar membrane]
phosphatidyl-L-serine (1-18:1, 2-18:1) [endoplasmic reticulum membrane] <=> phosphatidyl-L-serine (1-18:1, 2-18:1) [vacuolar membrane]

phosphatidyl-L-serine (1-16:0, 2-16:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-16:0, 2-16:1) [cell envelope]
phosphatidyl-L-serine (1-16:1, 2-16:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-16:1, 2-16:1) [cell envelope]
phosphatidyl-L-serine (1-18:0, 2-16:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-18:0, 2-16:1) [cell envelope]
phosphatidyl-L-serine (1-18:1, 2-16:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-18:1, 2-16:1) [cell envelope]
phosphatidyl-L-serine (1-16:0, 2-18:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-16:0, 2-18:1) [cell envelope]
phosphatidyl-L-serine (1-16:1, 2-18:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-16:1, 2-18:1) [cell envelope]
phosphatidyl-L-serine (1-18:0, 2-18:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-18:0, 2-18:1) [cell envelope]
phosphatidyl-L-serine (1-18:1, 2-18:1) [endoplasmic reticulum membrane]  <=> phosphatidyl-L-serine (1-18:1, 2-18:1) [cell envelope]

feat: xls as release format

For people work with GEM-type repos:

Now the Sysbio SOP defines a folder ModelFiles for storing different types of model files. So far there are three requested subfolders: mat, xml, txt.

Here I propose to add an optional xls subfolder for excel format, which is indeed a useful way of presenting model. With this option, modeler may add excel format file for publishing or exchanging purposes. Please feel free to comment on this suggestion.

feat: MetaNetX IDs from yeast 7.6

Description of the issue:

Now that both model.metMetaNetXID and model.rxnMetaNetXID are valid fields in our model, we should add all available met and rxn IDs from yeast 7.6, which is present in the MetaNetX repository under model id yeastnet_v7_6.

I hereby confirm that I have:

  • Checked that a similar issue does not exist already

feat: additional branches

After some thought on how to optimize the development in our repo, talking to @ChristianLieven from memote, and reading this, I thought of the following branched system for the yeast model:

  • master branch: Only stable versions from devel would be pushed here; each time this happens a new release should be made (i.e. go up in version).
  • devel branch: this would be the development branch into which we continuously push changes (and not master as we are doing now). Can be occasionally changed directly if it's some important fix, but otherwise it's only changed by pushing from user branches.
  • user branches: one for each team member (but potentially more than one if needed for different projects). Here each one does directly changes and whenever needed can push to devel. Examples: hongzhong, feiran, benjamin_annotation, benjamin_slimer

Thoughts on this? If ok I can implement asap and then we can avoid having forked versions of the repo. This will merge very well with the memote framework (which will allow to make comparisons between branches).

@demilappa @edkerk @hongzhonglu @feiranl @Hao-Chalmers @simas232

fix: compatibility with cobrapy

Description of the issue:

We should include compatibility for cobrapy, so ideally the model can be fully used by python users. In the following what needs to be done:

  • To be able to use the full model:
    • Solve issue #101
    • Make sure no field is lost when the model is opened with cobrapy:
      • Metabolites:
        • id
        • name
        • formula
        • charge
        • annotation
          • chebi
          • kegg.compound
      • Reactions:
        • id
        • name
        • metabolites (with stoich coeffs)
        • lower_bound
        • upper_bound
        • gene_reaction_rule
        • subsystem
        • annotation
          • ec-code
          • kegg.reaction
          • pmid
        • confidence_score
      • Genes:
        • id
        • name
      • Compartments
  • To be able to also contribute: Make sure the model can go through a full IO cycle without losing any of the fields, and keeping the same order (at least in the .yml and .txt)

I have crossed out all the fields I could find in the model loaded with cobrapy, but maybe I missed something @Midnighter @phantomas1234? The final of the to-do's is probably the most challenging one, so if it's too hard we can create a separate issue once we address the other points. Suggestions/things I might have forgotten are welcome @edkerk @ChristianLieven @demilappa @simas232 @hongzhonglu

I hereby confirm that I have:

  • Checked that a similar issue does not exist already
  • If needed, asked first in the Gitter chat room about the issue

fix: notation of gene IDs that use the '-' character

The SGD notation for new genes that uses the -X format, where X can be any letter, is not compatible with the writeCbModel function of COBRA, as the character - is not properly parsed into the SBML file. For now, every - character has been replaced with an _ (for instance, gene #202 is YDR322C_A). However this is not ideal and hopefully COBRA should fix their parser. @simas232 I remember that you contacted them pointing out this issue. Did they ever answered? Should we create a new issue in their repo? Until they fix it this issue will remain open.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.