This work is based on the SARS-CoV2 Docking Dataset, by David M. Rogers, Jens Glaser, Rupesh Agarwal, Josh Vermaas, Micholas Smith, Jerry Parks, Connor Cooper, Ada Sedova, Swen Boehm, Matthew Baker, and Jeremy Smith. It is licensed under a Creative Commons Attribution 4.0 International License.
It includes the file rossetti.csv
, which is a list of noncovalent inhibitors of SARS-CoV2 main protease disclosed in the reference:
Rossetti, G.G., Ossorio, M.A., Rempel, S. et al. Non-covalent SARS-CoV-2 Mpro inhibitors developed from in silico screen hits. Sci Rep 12, 2505 (2022). https://doi.org/10.1038/s41598-022-06306-4.
The content of that file is a concatenation of all Supplementary Information tables from the Rossetti article. Four additional columns contain the IC50 listed in the Rossetti article's main text and Supplementary Figures, the IC50 unit (always uM for micro-molar here), a notes text-column, and a number from 1-8 indicating which of their supplementary tables the record originated from.
It has also been released by its authors under a Creative Commons Attribution 4.0 International License.
The top-level directory contains protein names, like MPro_6WQF
, as its
subdirectories. It also contains common information, like src
,
holding processing files.
-
docked - summary plots and raw array data for 2D histograms
- atoms-tors.pdf, atoms_tors.nc
- atoms-score.pdf atoms_score.nc
- tors-score.pdf tors_score.nc
- score-r3.pdf score_rf3.nc
- v2-r3.pdf dude_rf3.nc
- v2-score.pdf dude_score.nc
- summary_i.pq
- summary.pq
-
docked_lists - short lists of compounds and scores
- rand.out.pq - random selection
- AD.out.pq - sorted by AutoDock-GPU score
- rf3.out.pq - sorted by RF3 score
- v2AD.out.pq - sorted by both DUD-E v2 and RF3
- ADrf3.out.pq - sorted by both AutoDock-GPU and RF3
- v2rf3.out.pq - sorted by both DUD-E v2 and RF3
-
target
- tgz files containing AutoDock data prepared for docking (e.g.
7jir+w2.tgz
) - extracted protein pdbqt used for docking (e.g.
7jir+w2.pdbqt
)
- tgz files containing AutoDock data prepared for docking (e.g.
-
src
- write_confs.py - extract pdbqt files from parquet files
- docked_sum.py - print count, min, max, avg summaries from
summary.pq
files - plot_atom_hist.py - create 2D plots containing atoms, torsions, etc.
- plot_score_hist.py - create 2D plots containing scores
- maccs.py - compute MACCS fingerprints for molecules within a parquet file
- interaction.py - list neighboring protein/ligand atoms by chemical interaction
-
dataset
- requirements.txt - list of python package dependencies
- helpers.py - utility functions for common tasks
- lazydf.py - low-memory wrapper for parquet files
- read_sizes.py - utility program to display parquet file sizes
- expt.sh, fish.py - batch script and source file for extracting compounds by name
- list_10k.py - initial script to gather molecules based on cutoff
- lists.sh, sublists.py, lists.000 - Create sub-lists of the score dataset based on score selection. These are used as input to get_lists.py.
- top.sh, top_andes.sh, get_lists.py, topN.000 - batch script, source file, and example job output for extracting compounds by full name (including _T_0 suffix).
- summary.sh, summary.py, summary.52533 - batch script, source file, and example output for creating bounds and summary histograms for dataset
- atoms_tors.sh, atoms_tors.py, atoms_tors.000 - batch script, source file, and example output for creating histograms of atoms and torsions
-
dash - a (plotly/dash)[https://dash.plotly.com] viewer for docked structures and scores
SARS Cov2 Docking Summary Data, "https://github.com/frobnitzem/sars_docking" Oak Ridge National Laboratory / CC-BY-4.0, 2020-2022.
or
Rogers, Agarwal, Agarwal, et. al., "SARS-CoV2 Billion-Compound Docking." Scientific Data, 2022.