GithubHelp home page GithubHelp logo

frobnitzem / sars_docking Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 20.15 MB

Summary secondary data and scripts for working with SARS-Cov2 gigadocking dataset: https://dx.doi.org/10.13139/OLCF/1783186

License: Other

Python 92.75% Shell 7.25%

sars_docking's Introduction

SARS Cov2 Docking Summary Data

This work is based on the SARS-CoV2 Docking Dataset, by David M. Rogers, Jens Glaser, Rupesh Agarwal, Josh Vermaas, Micholas Smith, Jerry Parks, Connor Cooper, Ada Sedova, Swen Boehm, Matthew Baker, and Jeremy Smith. It is licensed under a Creative Commons Attribution 4.0 International License.

https://creativecommons.org/licenses/by/4.0/

It includes the file rossetti.csv, which is a list of noncovalent inhibitors of SARS-CoV2 main protease disclosed in the reference: Rossetti, G.G., Ossorio, M.A., Rempel, S. et al. Non-covalent SARS-CoV-2 Mpro inhibitors developed from in silico screen hits. Sci Rep 12, 2505 (2022). https://doi.org/10.1038/s41598-022-06306-4.

The content of that file is a concatenation of all Supplementary Information tables from the Rossetti article. Four additional columns contain the IC50 listed in the Rossetti article's main text and Supplementary Figures, the IC50 unit (always uM for micro-molar here), a notes text-column, and a number from 1-8 indicating which of their supplementary tables the record originated from.

It has also been released by its authors under a Creative Commons Attribution 4.0 International License.

Layout

The top-level directory contains protein names, like MPro_6WQF, as its subdirectories. It also contains common information, like src, holding processing files.

Per-Target Data

  • docked - summary plots and raw array data for 2D histograms

    • atoms-tors.pdf, atoms_tors.nc
    • atoms-score.pdf atoms_score.nc
    • tors-score.pdf tors_score.nc
    • score-r3.pdf score_rf3.nc
    • v2-r3.pdf dude_rf3.nc
    • v2-score.pdf dude_score.nc
    • summary_i.pq
    • summary.pq
  • docked_lists - short lists of compounds and scores

    • rand.out.pq - random selection
    • AD.out.pq - sorted by AutoDock-GPU score
    • rf3.out.pq - sorted by RF3 score
    • v2AD.out.pq - sorted by both DUD-E v2 and RF3
    • ADrf3.out.pq - sorted by both AutoDock-GPU and RF3
    • v2rf3.out.pq - sorted by both DUD-E v2 and RF3
  • target

    • tgz files containing AutoDock data prepared for docking (e.g. 7jir+w2.tgz)
    • extracted protein pdbqt used for docking (e.g. 7jir+w2.pdbqt)

Source Files

  • src

    • write_confs.py - extract pdbqt files from parquet files
    • docked_sum.py - print count, min, max, avg summaries from summary.pq files
    • plot_atom_hist.py - create 2D plots containing atoms, torsions, etc.
    • plot_score_hist.py - create 2D plots containing scores
    • maccs.py - compute MACCS fingerprints for molecules within a parquet file
    • interaction.py - list neighboring protein/ligand atoms by chemical interaction
  • dataset

    • requirements.txt - list of python package dependencies
    • helpers.py - utility functions for common tasks
    • lazydf.py - low-memory wrapper for parquet files
    • read_sizes.py - utility program to display parquet file sizes
    • expt.sh, fish.py - batch script and source file for extracting compounds by name
    • list_10k.py - initial script to gather molecules based on cutoff
    • lists.sh, sublists.py, lists.000 - Create sub-lists of the score dataset based on score selection. These are used as input to get_lists.py.
    • top.sh, top_andes.sh, get_lists.py, topN.000 - batch script, source file, and example job output for extracting compounds by full name (including _T_0 suffix).
    • summary.sh, summary.py, summary.52533 - batch script, source file, and example output for creating bounds and summary histograms for dataset
    • atoms_tors.sh, atoms_tors.py, atoms_tors.000 - batch script, source file, and example output for creating histograms of atoms and torsions
  • dash - a (plotly/dash)[https://dash.plotly.com] viewer for docked structures and scores

Cite this work as:

SARS Cov2 Docking Summary Data, "https://github.com/frobnitzem/sars_docking" Oak Ridge National Laboratory / CC-BY-4.0, 2020-2022.

or

Rogers, Agarwal, Agarwal, et. al., "SARS-CoV2 Billion-Compound Docking." Scientific Data, 2022.

sars_docking's People

Contributors

frobnitzem avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.