openfreeenergy / cinnabar Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 12.0 4.58 MB

Package for consistent reporting of relative free energy results

License: MIT License

Python 100.00%

cinnabar's People

Contributors

Stargazers

Watchers

Forkers

dfhahn jchodera mikemhenry glass-w gilli08 zhang-ivy annamherz rkakamilan ianmkenney kntkb jenkescheen matthewtwarren

cinnabar's Issues

Input CSV

At the moment we pass in the relative experimental and relative calculated errors in the csv

# liga, ligb, exp DDG, ...........
A , B, -1 ...........
B, C, -3 ...........
A , C, -4...............

However, this allows for human error, and experimental values that are passed in might not be self consistent.

@jchodera suggested that we pass in experimental DGs, and computed DDGs.

EXP
#lig, exp DG, exp dDG, 
A -8
B -9
C -11

CALC
#ligA, ligB, calc DDG, calc dDDG, 
A , B, -1 ...........
B, C, -3 ...........
A , C, -4...............

This would reduce the chance of non-consistent experimental DDGs, and we could just add more columns for more additional compt. methods to plot?

Expand example notebooks & add static versions to docs page

The notebooks currently have limited information which makes is hard for users to know what they are meant to show (especially untitled.ipynb).

We should a) improve the notebooks to better explain what they are meant to show, b) add static versions of them to an "examples" section in the docs so that they are more easily accessible.

Some of the code stats.mle or specifically stats.form_edge_matrix expects the nodes of the graph to have integer names, and assumes that these are range(0,len(nodes)), which may not always be true. Some other node specifications might be preferable, such as ligand name or SMILES

Incorrect solutions obtained when a single node has absolute free energy defined

I'm trying to debug an odd issue in the newest Folding@home Sprint 5 dashboard where absolute compound free energies are coming out too negative.

I've traced the issue back to a peculiar issue: If we feed arsenic 0.1.0 a star map where only the central node has an absolute free energy specified, it comes back with an estimated free energy that is exactly 1.5 times more negative!

Any idea why this might be?

Add zenodo badge and/or CITATION.cff to make this repo citable

It would be useful to add the zenodo badge
to the readme, and possibly a CITATION.cff

basic cli usage broken

Not sure how long this hasn't worked, but running the cli entrypoint with the given example csv doesn't work

>>> cinnabar ./example.csv 
Traceback (most recent call last):
  File "/home/richard/miniconda3/envs/openfe/bin/cinnabar", line 33, in <module>
    sys.exit(load_entry_point('cinnabar', 'console_scripts', 'cinnabar')())
  File "/home/richard/code/cinnabar/cinnabar/arsenic.py", line 44, in main
    plotting.plot_DDGs(network.results, title=args.title, filename=f"{args.prefix}DDGs.png")
  File "/home/richard/code/cinnabar/cinnabar/plotting.py", line 252, in plot_DDGs
    x = [x[2]["exp_DDG"] for x in graph.edges(data=True)]
AttributeError: 'dict' object has no attribute 'edges'

Expand getting started docs

Current getting started docs don't really tell users much about how to install / use the package.

plot_DGs unusual result

Using the provided example.csv data, I cannot reproduce the DGs plot - the experimental data seems to be the negative of the true value.

Please let me know if I've done something wrong - PDF of notebook attached. Everything else looks correct.

absolute_wrong.pdf

Compare different systems/targets in a single plot

It might be useful to compare different graphs and/or systems in a single plot. Say for example you want to compare two different force fields compare to each other in a range of targets (not just a single one). Imagine comparing espaloma vs openff in all the targets of some protein-ligand-benchmark.

Right now we don't have an easy way to do this comparison from the cinnabar objects, I wonder if we want to include this capability in the future, such that we can extract the points from the plot that would be generated for each target (in x-y pairs or similar) from cinnabar objects and just merge them together to create a single plot with the information of different targets.

Standardize plotly and matplotlib plotting interfaces

It should be possible to have a unified interface for generating matplotlib and plotly plots.

Graphs containing self-edges will produce erroneous MLE estimates

It appears that graphs provided to arsenic.stats.mle can have self-edges (edges that connect to the same node at both ends).

This line assumes that cannot be the case, and will produce biased, incorrect estimates if provided with self-edges.

We should either automatically omit self-edges or raise an exception if they are encountered.

When some of the expriments data is missing?

Hi, I recently used this package to generate some statistical data, it failed when some of the molecules' experiment values were not provided. Is there a solution to this kind of data?

Error:

line 97, in generate_graph_from_results of wrangle.py
    node[1]["exp_DG"] = self.results["Experimental"][name].DG
KeyError: 'mol20'

Easier exposure for calculations where there isn't experimental data to compare against

When running this prospectively, I just use random numbers in the experimental block, but is there an easier/tidier way to get the per-node (relative)absolute free energies from a set of relative free energies without using this?

Update conda-forge package

After we cut our last release with the old api, we will either want to archive/update the readme here:
https://github.com/conda-forge/arsenic-feedstock/

Since we are changing the name, we shound't need to update the package again after the final release.

RMSE/AUE error calculation

There are different ways how the error on the RMSE, AUE is calculated in this freeenergyframework and pmx. The freeenergyframework method
draws bootstrap samples from the original sample, which does not account
for errors of the sample. The pmx method takes the datapoints from the original sample, and draws
the bootstrap sample's data point by drawing from a normal distribution
with center and standard deviation of the original sample's data point. Which is the correct way?

See the example below:

import numpy as np
from sklearn import metrics
import scipy

Example data

Two sets of delta G values with associated errors

# fake data
DG1  = np.random.random(100) * 10.0 - 5.0
dDG1 = np.random.normal(0, 1, size = 100)
DG2  = DG1 + np.random.normal(0, 2, size = 100)
dDG2 = np.random.normal(0, 1, size = 100)

from matplotlib import pyplot as plt
plt.figure(figsize=(6,6))
plt.errorbar(DG1, DG2, xerr=dDG1, yerr=dDG2, ls='', marker='o')
plt.xlabel('DG1')
plt.ylabel('DG2')
plt.xlim(-7,7)
plt.ylim(-7,7)

(-7, 7)

`freeenergyframework` method

Take bootstrap samples from the actual sample, which does not account for errors of the sample

nbootstrap = 1000
ci = 0.95
assert len(DG1) == len(DG2)
sample_size = len(DG1)
s_n = np.zeros([nbootstrap], np.float64) # s_n[n] is the statistic computed for bootstrap sample n
for replicate in range(nbootstrap):
    indices = np.random.choice(np.arange(sample_size), size=[sample_size], replace=True)
    DG1_sample, DG2_sample = DG1[indices], DG2[indices]
    s_n[replicate] = np.sqrt(metrics.mean_squared_error(DG1_sample, DG2_sample))
    
rmse_stats = dict()
rmse_stats['mle'] = np.sqrt(metrics.mean_squared_error(DG1, DG2))
rmse_stats['stderr'] = np.std(s_n)
rmse_stats['mean'] = np.mean(s_n)
# TODO: Is there a canned method to do this?
s_n = np.sort(s_n)
low_frac = (1.0-ci)/2.0
high_frac = 1.0 - low_frac
rmse_stats['low'] = s_n[int(np.floor(nbootstrap*low_frac))]
rmse_stats['high'] = s_n[int(np.ceil(nbootstrap*high_frac))]
print(rmse_stats)

{'mle': 1.852492378187966, 'stderr': 0.12139385458708932, 'mean': 1.8450353834007485, 'low': 1.601269239221521, 'high': 2.0890130880257103}

The `pmx` method:

Take the original sample and draw each bootstrap sample's data point from a normal distribution with the center and standard devivation of the original sample's data point.

nbootstrap = 1000
ci = 0.95
assert len(DG1) == len(DG2)
sample_size = len(DG1)
s_n = np.zeros([nbootstrap], np.float64) # s_n[n] is the statistic computed for bootstrap sample n
for replicate in range(nbootstrap):
    DG1_sample = np.zeros_like(DG1)
    DG2_sample = np.zeros_like(DG2)
    for i in range(sample_size):
        DG1_sample[i] = np.random.normal(loc=DG1[i], scale=np.fabs(dDG1[i]), size=1)
        DG2_sample[i] = np.random.normal(loc=DG2[i], scale=np.fabs(dDG2[i]), size=1)
    s_n[replicate] = np.sqrt(metrics.mean_squared_error(DG1_sample, DG2_sample))
    
rmse_stats = dict()
rmse_stats['mle'] = np.sqrt(metrics.mean_squared_error(DG1, DG2))
rmse_stats['stderr'] = np.std(s_n)
rmse_stats['mean'] = np.mean(s_n)
# TODO: Is there a canned method to do this?
s_n = np.sort(s_n)
low_frac = (1.0-ci)/2.0
high_frac = 1.0 - low_frac
rmse_stats['low'] = s_n[int(np.floor(nbootstrap*low_frac))]
rmse_stats['high'] = s_n[int(np.ceil(nbootstrap*high_frac))]
print(rmse_stats)

{'mle': 1.852492378187966, 'stderr': 0.13816081347307912, 'mean': 2.3424216885058007, 'low': 2.091178479342661, 'high': 2.6185886859019063}

The estimate ('mle') is obviously the same for both methods and the standard
deviation ('stderr') is quite similar. However, the confidence interval
differs a lot. For the pmx method, the confidence interval has a
similar range, but the mean, lower and upper bounds are much higher. The
estimate ('mle') lies not even in the confidence interval.

Do both

Bootstrap on datapoints AND errors

nbootstrap = 1000
ci = 0.95
assert len(DG1) == len(DG2)
sample_size = len(DG1)
s_n = np.zeros([nbootstrap], np.float64) # s_n[n] is the statistic computed for bootstrap sample n
for replicate in range(nbootstrap):
    DG1_sample = np.zeros_like(DG1)
    DG2_sample = np.zeros_like(DG2)
    for i in np.random.choice(np.arange(sample_size), size=[sample_size], replace=True):
        DG1_sample[i] = np.random.normal(loc=DG1[i], scale=np.fabs(dDG1[i]), size=1)
        DG2_sample[i] = np.random.normal(loc=DG2[i], scale=np.fabs(dDG2[i]), size=1)
    s_n[replicate] = np.sqrt(metrics.mean_squared_error(DG1_sample, DG2_sample))
    
rmse_stats = dict()
rmse_stats['mle'] = np.sqrt(metrics.mean_squared_error(DG1, DG2))
rmse_stats['stderr'] = np.std(s_n)
rmse_stats['mean'] = np.mean(s_n)
# TODO: Is there a canned method to do this?
s_n = np.sort(s_n)
low_frac = (1.0-ci)/2.0
high_frac = 1.0 - low_frac
rmse_stats['low'] = s_n[int(np.floor(nbootstrap*low_frac))]
rmse_stats['high'] = s_n[int(np.ceil(nbootstrap*high_frac))]
print(rmse_stats)

{'mle': 1.852492378187966, 'stderr': 0.1628450105340105, 'mean': 1.8693378181772506, 'low': 1.5683291549599805, 'high': 2.2080338926207124}

Cut new release?

After fixing #38, it would be useful to cut a new release!

@mikemhenry : Can you help with this?

Support CSV + dicts in FEMap

Right now we only support reading in a CSV, but we should expose the API to allow users to pass in python objects.

https://github.com/openforcefield/openff-arsenic/blob/master/openff/arsenic/wrangle.py#L67

To avoid major refactoring and to keep things backwards compatible, I'll add a bit of code that checks if csv is a file or a python object, and then take it from there.

[Question] Allow arbitrary xy tick frequency when plotting

Hi, is there a way to control the tick frequency?

Add health checker for dataset analysis before calculation

Clean up readme for data folder

This still uses the readme from the cookiecutter, we should either clean it up or remove it.

Passing in relative experimental results in csv

@jchodera pointed out that when we pass in the relative experimental errors, we don't check that they are self-consistent, plus we need a single absolute value such as to correctly offset the absolute plots anyway. ALSO this makes the tracking of experimental errors much more confusing.

The solution is to change the format of the csv file to be something like

### EXP
# lig, exp DG kcal/mol, exp dDG kcal/mol
a, -8.40, 0.22
b, -9.23, 0.33
c, -7.05, 0.22

### CALC
# ligA, ligB, calc DG kcal/mol, calc dDG kcal/mol
a, b, 1.32, 0.65
b, c, 3.45, 0.34

This format would also help with what @dfhahn spoke about with handling multiple simulation methods, we could just chain them together.

Fix node ordering issue

The return of stats.mle() is ordered based on the order of the graph.nodes() passed in, however, this can cause issues if the return values are expected to be in the order of the ligands.

Graph nodes are ordered in the way that they are added.
edges = [(2, 0), (0, 1)]
will make a graph where graph.nodes() is ordered 2, 0, 1

the f_i from stats.mle will then be ordered 2, 0, 1, NOT 0, 1, 2 which causes confusion

This is fine if
for f, n in zip(f_i, g.nodes()):
node['fe'] = f

is used, but it's not very logical.

Change the function to return something more robust, like a dict, or another graph

Travis CI Security Breach Notice

MolSSI is reaching out to every repository created from the MolSSI Cookiecutter-CMS with a .travis.yml file present to alert them to a potential security breach in using the Travis-CI service.

Between September 3 and September 10 2021, the Secure Environment Variables Travis-CI uses were leaked for ALL projects and injected into the publicly available runtime logs. See more details here. All Travis-CI users should cycle any secure variables/files, and associated objects as soon as possible. We are reaching out to our users in the name of good stewards of the third-party products we recommended and might still be in use and provide a duty-to-warn to our end-users given the potential severity of the breach.

We at MolSSI recommend moving away from Travis-CI to another CI provider as soon as possible. The nature of this breach and the way the response was mis-handled by Travis-CI, MolSSI cannot recommend the Travis-CI platform for any reason at this time. We suggest either GitHub Actions (as is used from v1.5 of the Cookiecutter-CMS) or some other service offered on GitHub.

If you have already addressed this security concern or it does not apply to you, feel free to close this issue.

This issue was created programmatically to reach as many potential end-users as possible. We do apologize if this was sent in error.

95% CI sometimes doesn't contain the RMSE or MUE

I generated a cinnabar plot for my terminally-blocked amino acid RBFE calc data, comparing the forward vs reverse DDGs, but the RMSE and MUE do not lie within their respective 95% CIs.

Here is the code that does the bootstrapping:

cinnabar/cinnabar/stats.py

Lines 111 to 130 in 1dbe248

 for replicate in range(nbootstrap): 

 y_true_sample = np.zeros_like(y_true) 

 y_pred_sample = np.zeros_like(y_pred) 

 for i, j in enumerate( 

 np.random.choice(np.arange(sample_size), size=[sample_size], replace=True) 

 ): 

 y_true_sample[i] = np.random.normal(loc=y_true[j], scale=np.fabs(dy_true[j]), size=1) 

 y_pred_sample[i] = np.random.normal(loc=y_pred[j], scale=np.fabs(dy_pred[j]), size=1) 

 s_n[replicate] = compute_statistic(y_true_sample, y_pred_sample, statistic) 

 rmse_stats = dict() 

 rmse_stats["mle"] = compute_statistic(y_true, y_pred, statistic) 

 rmse_stats["stderr"] = np.std(s_n) 

 rmse_stats["mean"] = np.mean(s_n) 

 # TODO: Is there a canned method to do this? 

 s_n = np.sort(s_n) 

 low_frac = (1.0 - ci) / 2.0 

 high_frac = 1.0 - low_frac 

 rmse_stats["low"] = s_n[int(np.floor(nbootstrap * low_frac))] 

 rmse_stats["high"] = s_n[int(np.ceil(nbootstrap * high_frac))]

@jchodera points out that "The bootstrapping is both (1) resampling with replacement and (2) adding noise related to the experimental error to the experimental data and noise related to the predicted error in each replicate.
If we intend to capture the finite sample size effect, we only want (1), and do not want to include (2).
Can you try eliminating the extra addition of normal error in lines 117 and 118? You can just say "scale=0*..." to set the scale to zero by just adding the 0* in place rather than rewriting those lines."

When I re-generate the plot with John's suggested fix, the problem goes away:

Fix badges

Codecov points to the wrong branch and CI badge isn't in place

feature: error per cycle graph

traverse computational graph, find all cycles of size < n, for each calculate cycle closure error
add nice function for plotting above data

Add module level docstring or general info about what each module does

It's currently unclear what each component of arsenic does, indeed an inexperienced user may not understand the difference between plotlying.py and plotting.py, or what wrangle.py does. It would be good to either have module level docstring or some rst files to document what each bit of the code does.

Clean up the CI yaml files

Generally we should be able to clean up CI, avoid pulling in unnecessary dependencies (e.g. sphinx), and improve the testing matrix (possibly even add a Windows runner if we feel like it).

Rework FEMap API

Current plan is to create an new API that will allow for both connected and disconnected nodes (i.e. relative and absolute results) to be passed in directly to arsenic.

To be discussed further over the next few weeks - @mikemhenry is currently in charge of the first pass of this API.

Add option for different colour map in _master_plot

It would be useful to have a different colour map option in _master_plot. coolwarm can make it sometimes hard to see values close to the x=y line.

Bootstrapping

Need to figure out what we should actually be bootstrapping. Edges run? Ligands and their associated edges?

Also, the FEMap should carry around the bootstrap estimates and statistics so that it's easy to reload a FEMap object and replot for doing stuff online. This would involve changing the point at which we are bootstrapping to outside the plotting code (which also makes sense).

Work out why travis builds are failing

Add experimental mean ΔG back to calculated and experimental ΔGs when plotting

Right now, we subtract the mean calculated ΔG and mean experimental ΔG and plot, making for a ridiculous plot that include ΔG >= 0 (Kd > 1 Molar), which is totally nonsensical for an absolute ΔG.

We should instead add back the experimental mean ΔG if there are no calculated ΔG values so the overall absolute ΔG are in fact absolute ΔG.

Warn if node is overwritten

Right now if a molecule is named the same as another we overwrite the data which can cause issues if you have different protination states but same molecule name

[proposal] Clear up Python support - NEP29?

We should clear up the Python support being offered, drop 3.7 and add 3.10 to CI runners. Generally I propose we just follow NEP29 here.

Units on statistics

@dfhahn

The plots currently have statistics written, whereas some of these don't have units associated (rho/R2)

Could we just remove the units for all statistics? I think they're not necessary if there are units on the axes?

Fix uncertainty factor naming in stats.mle

stats.mle states that the uncertainty symbols are automatically constructed from the specified factor name, but the actual code does not implement the stated scheme.

It would be easier if the user could just specify the names of the factor and its standard error.

Test notebooks

We should test the notebooks using nbval or equivalent. In the first instance just making sure all the cells run should be fine.

Add R2max calculation

Use the equation
R2_max = 1 - (std(exp err) / std(affinity))^2

To define a maximum possible R2 based on the experimental distribution and errors

https://doi.org/10.1021/acs.jcim.9b01067

Release on conda-forge?

Since this codebase doesn't depend on any exclusively-omnia packages, it would be great if we could get this deployed via conda-forge and/or pip!

Add tests to code

Extend to support the case where multiple edges span two nodes

We may want to extend arsenic to the case where multiple edges span two nodes.

Add tests for plotting functions

While working on #74, we discovered the functions in plotting.py and plotlying.py are not being tested. Would be great to add these in the future.

Make sure we annotate the units for experimental error

kcal/mol is what we want as the default

[proposal] Rename to arsenic?

Would it make sense to rename the package completely to arsenic instead of openff-arsenic?

Catch the exception for the R2 computation

In
https://github.com/OpenFreeEnergy/arsenic/blob/main/arsenic/stats.py#L81-L83
The scipy.stats.linregress were used to find the R2.
In the older versions (scipy<1.8) when both x and y are the same value, for example:

import scipy
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress([1.33, 1.33, 1.33], [0.33, 0.33, 0.33])

will return a runtime warning.
Now this would return a ValueError, it might be good to catch this exception and return 0? Thanks.

The code to reproduce this error is

from openff.arsenic.stats import bootstrap_statistic
bootstrap_statistic([1.33, 2.00], [0.33, 1.00], statistic="R2")

Effect of centralizing

For plotting, when centralizing is set to true, both experimental and predicted dG values are centered around 0.

plotting.plot_DGs(network.graph,method_name='openfe',target_name='name',filename=f'test_DGs.png',centralizing=True)

When centralizing is set to false, experimental dG values have their correct absolute value but predicted dG values are centered around 0.

plotting.plot_DGs(network.graph,method_name='openfe',target_name='name',filename=f'test_DGs.png',centralizing=False)

How can one get both experimental and predicted dG values to be plotted at their true absolute value?

Make README say what the point of this is/show example output

Popped on here to point this out to someone, and noticed the README does not:
a) lead with a summary of what the point of this package is/what it's for
b) show examples of what is generated (example plots or analysis)
c) really state any conclusions or recommendations
d) Link to the benchmarking best practices paper

Perhaps these can be quickly addressed?

Separate out stats calls from plotting

Ideally all the plotting functions should be accepting x & y points (& error bars), rather than calculating statistics internally. This allows the plotting functions to be best practices in visualisation, regardless of your chosen statistical methods.

Cleanup setup.py / setup.cfg, add pyproject.toml

General need to clean up the setup procedure, maybe also expose the env yaml file to users (or just point to it in the install instructions).

	for replicate in range(nbootstrap):
	y_true_sample = np.zeros_like(y_true)
	y_pred_sample = np.zeros_like(y_pred)
	for i, j in enumerate(
	np.random.choice(np.arange(sample_size), size=[sample_size], replace=True)
	):
	y_true_sample[i] = np.random.normal(loc=y_true[j], scale=np.fabs(dy_true[j]), size=1)
	y_pred_sample[i] = np.random.normal(loc=y_pred[j], scale=np.fabs(dy_pred[j]), size=1)
	s_n[replicate] = compute_statistic(y_true_sample, y_pred_sample, statistic)

	rmse_stats = dict()
	rmse_stats["mle"] = compute_statistic(y_true, y_pred, statistic)
	rmse_stats["stderr"] = np.std(s_n)
	rmse_stats["mean"] = np.mean(s_n)
	# TODO: Is there a canned method to do this?
	s_n = np.sort(s_n)
	low_frac = (1.0 - ci) / 2.0
	high_frac = 1.0 - low_frac
	rmse_stats["low"] = s_n[int(np.floor(nbootstrap * low_frac))]
	rmse_stats["high"] = s_n[int(np.ceil(nbootstrap * high_frac))]

openfreeenergy / cinnabar Goto Github PK

cinnabar's People

Contributors

Stargazers

Watchers

Forkers

cinnabar's Issues

Example data

freeenergyframework method

The pmx method:

Do both

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`freeenergyframework` method

The `pmx` method: