GithubHelp home page GithubHelp logo

postnovo's Introduction

Postnovo Logo

Post-processing peptide de novo sequences to improve their accuracy

Journal article: https://pubs.acs.org/doi/10.1021/acs.jproteome.8b00278

Quick Start

Much more detail on Postnovo options can be found in the Postnovo Wiki pages.

  1. Postnovo runs on a Unix-like OS. Use a powerful server or desktop with >25 GB available disk space.

  2. Download and decompress the latest release.

  3. Use Python 3. The Anaconda distribution contains all necessary package dependencies.

  4. Download DeNovoGUI and large pre-trained models with Postnovo setup.

    For low-res MS2 data (example uses nohup to avoid termination upon logout and & to run in background):

    nohup python main.py setup --denovogui --postnovo_low --deepnovo_low &

    For high-res MS2 data:

    python main.py setup --denovogui --postnovo_high --deepnovo_high

  5. Use the ProteoWizard msconvert tool to convert your RAW file to an MGF file with a certain spectrum header format.

    msconvert preformatted_spectra.raw --mgf --filter "titleMaker Run: <RunId>, Index: <Index>, Scan: <ScanNumber>"

    Add additional filters as needed, here for peak picking and removal of zero intensity peaks.

    msconvert preformatted_spectra.raw --mgf --filter "peakPicking vendor" --filter "zeroSamples removeExtra" --filter "titleMaker Run: <RunId>, Index: <Index>, Scan: <ScanNumber>"

  6. Reformat the input MGF file for compatability with all de novo sequencing tools.

    python main.py format_mgf --mgfs /path/to/preformatted_spectra.mgf --out /path/to/spectra.mgf

  7. Set up a container with TensorFlow to run DeepNovo (optional but recommended for Postnovo). Superuser privileges may be required.

    singularity build tensorflow.simg docker://tensorflow/tensorflow:latest

  8. Generate DeepNovo de novo sequences (can take up to ~12 hours depending on resources). The following examples consider low-res MS2 spectra. Processing high-res spectra requires more memory (see Predicting with Deepnovo). Postnovo only supports standard fixed C and variable M PTMs at the moment.

    Using a single machine with 32 cores:

    python main.py predict_deepnovo --mgf /path/to/spectra.mgf --container /path/to/tensorflow.simg --frag_resolution low --cpus 32

    Using a compute cluster via Slurm with 16 cores per node and sufficient memory allocation:

    python main.py predict_deepnovo --mgf /path/to/spectra.mgf --container /path/to/tensorflow.simg --frag_resolution low --cpus 16 --slurm --partition partition_with_16GB_mem --time_limit 36

  9. Download MaRaCluster to cluster spectra by peptide species. As MaRaCluster input, use the reformatted MGF file. Set "log10(p-value) threshold" (in the GUI) or "--pvalThreshold" (in the CLI) to -2.

  10. Run Postnovo, which in the process generates Novor and PepNovo+ de novo sequences via DeNovoGUI (can take up to ~3 hours depending on resources). Results are written by default to the directory of the MGF input.

    python main.py predict --mgf /path/to/spectra.mgf --clusters /path/to/MaRaCluster.clusters_p2.tsv --frag_method CID --frag_resolution low --denovogui --deepnovo --cpus 32

Copyright 2018, Samuel E. Miller. All rights reserved.

Postnovo is publicly available for non-commercial uses.

Licensed under GNU GENERAL PUBLIC LICENSE, Version 3, 29 June 2007.

See postnovo/LICENSE.txt.

postnovo's People

Contributors

semiller10 avatar

Stargazers

邱子杰 avatar irleader avatar Jeroen Van Goey avatar Tharan Srikumar avatar Chinmaya Narayana avatar wangm23456 avatar hidewon avatar  avatar  avatar

Watchers

hidewon avatar

postnovo's Issues

setup error

Hi,

I would like to test postnovo but I got an error message while I try to set up it. Actually, the same error for all command lines, e.g, python main.py setup --deepnovo_low --deepnovo_high --container /path/container or ´python main.py setup --check_updates`

> python3 main.py setup --deepnovo_low --deepnovo_high --container /mnt/DATA_DISKS/WORKSPACE2/cguetot/installers/postnovo/tensorflow.simg
Traceback (most recent call last):
File "main.py", line 12, in
import interspec
File "/mnt/DATA_DISKS/WORKSPACE2/cguetot/installers/postnovo/postnovo/interspec.py", line 100
global cluster_dfs
^
SyntaxError: name 'cluster_dfs' is used prior to global declaration

or

> python3 main.py setup --check_update
Traceback (most recent call last):
File "main.py", line 12, in
import interspec
File "/mnt/DATA_DISKS/WORKSPACE2/cguetot/installers/postnovo/postnovo/interspec.py", line 100
global cluster_dfs
^
SyntaxError: name 'cluster_dfs' is used prior to global declaration

I tried with version 1.0.9-alpha and also from the master branch.

thank you in advance,

Carlos

Not all Google drive locations accessible

Dear Samuel, Thanks for the nice tool, I've been trying to get it to work, but I am running into problems with downloading your files from the google drive.
I ended up trying gdown as well, but some files have permission problems, and do not appear when searched with URL either.

Common error message:
Access denied with the following error:

Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses. 

You may still be able to access the file from the browser:

 https://drive.google.com/uc?id=1XAipgL7afq16rHgJ_tOsAW7n7_YFVOc_ 

But the supplied link also does not seem to work.

Do you have any suggestions?
Kind regards,
Hugo Kleikamp

List of files that work or not:

<style> </style>
  Download Filename Google Drive ID Size Access (yes/no)
0 DeNovoGUI-1.16.2-mac_and_linux.tar.gz 12WKUfRMpNjY7YilRg7zGNISgycNBsZ9M 77725399 y
1 MSGFPlus.zip 1k66SctfqBQyYAdKhQJbie005-ZDPfZyb 19202332 y
2 postnovo_low_default_models.zip 1k-jSc4SG-yqJf2vYXoEE_dk0xYTHMaBK 882282390 y
3 postnovo_high_default_models.zip 1xkxT5_oVydFRMzwMcGIjncKSrlSHEPPR 882474786 y
4 postnovo_low_default_training_data.zip 1iIxJB9JdPg-eDI7v_LLYSrpzKK_nHRJm 620061720 n
5 postnovo_high_default_training_data.zip 1JvoWm4mtWDGO1RKengTZiIrtas2tC-E1 709973469 n
6 train_record_low.tsv 1pd4k9ETr8UutNIfioC7zvfAh5Rx4rG8m 244 y
7 train_record_high.tsv 1IennZj1QxxUOyDeIZZw3Z2ohuWneXg5U 586 y
8 deepnovo_low_default_models.zip 1Fx_gCoXzUTr7FMgmYpbL5xYkuY8-nR6- 820270761 y
9 deepnovo_high_default_models.zip 1v3F8ZKU88Gz7ViEWDlUVt-W6orYBLdmc 3192686775 y
10 knapsack.npy 1N_CQPQlSbAW_fShocE9gMB1oFUjcZKOY 775577350 y
11 postnovo_low_default_spectra.zip 1H0yD5jhrcvAzxNf1fIr6SSoR9wDUKnlB 2521313775 y
12 postnovo_high_default_spectra.zip 1XAipgL7afq16rHgJ_tOsAW7n7_YFVOc_ 1507107028 n
13 deepnovo_low_default_spectra.zip 1Bbz0QAqmAgizb6sbVDWNV791AA9fjx9U 521178064 y
14 deepnovo_high_default_spectra.zip 1JkgBJM6qsE8BB2kgBkrl8JprO2fHO2Oj 299785319 y
         

problems with tensorflow and python version

Hello,
I encountered problems with tensorflow version (2.3.1) contained in the singularity container. Currently, module are not found when called for import in each python deepnovo script. suggestions are to replace "import tensorflow as tf" by "import tensorflow.compat.v1 as tf".
Do you have any ideas or suggestions about it ?
thank you in advance,
benjamin Dartigues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.