choderalab / protons Goto Github PK

View Code? Open in Web Editor NEW

21.0 20.0 13.0 165.15 MB

OpenMM testbed for constant-pH methodologies.

Home Page: http://protons.readthedocs.io/

License: MIT License

Shell 0.34% Python 99.32% HTML 0.01% Gherkin 0.33%

constant-ph simulation ph-methodologies python proton openmm

protons's Introduction

Protons

Testbed for constant-pH methodologies using OpenMM.

Manifest

protons/ - Python module implementing constant-pH methodologies in Python

   calibration.py        - Calibration engines
   cnstphgbforces.py     - CustomGBForces that exclude contributions from discharged protons
   ligutils.py           - Work in progress code for ligand parametrization.
   tests/                - Unit tests

protons/examples/

   explicit-solvent-example.py - explicit solvent NCMC example
   amber-example/        - example system set up with AmberTools constant-pH tools
   calibration-implicit/ - terminally-blocked amino acids parameterized for implicit solvent relative free energy calculations
   calibration-explicit/ - terminally-blocked amino acids parameterized for explicit solvent relative free energy calculations

references/ - some relevant literature references

Dependencies

protons will eventually be made conda installable. The list of dependencies can be found here.

Contributors / coauthors

Bas Rustenburg [email protected]
Gregory Ross [email protected]
John D. Chodera [email protected]
Patrick Grinaway [email protected]
Jason Swails [email protected]
Jason Wagoner [email protected]

protons's People

Contributors

Stargazers

Watchers

Forkers

swails nilmeier slucore pgrinaway juliebehr jchodera x9p msultan bas-rustenburg yongwangcph somous-jhzhao layeqa bbraunsfeld

protons's Issues

Speed up driver init by caching exceptions.

This is a reminder of the slow parts of init, in case we ever want to speed it up. We may be able to cache some exception parameters instead of retrieving them for every titratable group.

Relevant code statements

protons/protons/driver.py

Line 595 in 43b9734

group['exception_indices'] = self._get14exceptions(self.system, atom_indices)

protons/protons/driver.py

Line 374 in 43b9734

 [particle1, particle2, chargeProd, sigma, epsilon] = force.getExceptionParameters(exception_index) 

Not a priority.

Clean the data repo

Add new data and running scripts, and clean up files that used the old API in https://github.com/choderalab/constant-ph-systems repo, and start adding new systems to it.

Units of g_k in the code

The titration state weight is assumed to be in units of molar energy:
https://github.com/choderalab/protons/blob/master/protons/driver.py#L515

If you store the numbers obtained directly from calibration, this would give you beta^2 * g_k. I overlooked this when setting up the abl-imatinib calculations.
https://github.com/choderalab/protons/blob/master/protons/driver.py#L1836

Maybe we want to make sure that beta is appropriately removed from the results spat out by the calibration, so that the current API works when you store the same numbers in the ffxml file.

Or, we could adjust the code to only takes a unitless reference energy for each state (beta * gK being the standard), to avoid any confusion with unit conversion.

Add LICENSE file

It looks like we are missing a LICENSE file containing the MIT License in the top-level directory.

Bring documentation up to date

This is a reminder for me to update the documentations.

One of the things that will need to be paid attention to are the paths in the configuration of the documentation so that references to different classes work again. Additionally, many sections will need to be rewritten.

Split off stable Amber-based API

Based on discussion in #57 .

The simplest, most flexible thing I can think of is to change the main ProtonDrive class to a base class without an __init__ function, and then have an Amber based init function for an AmberProtonDrive or CpinProtonDrive.

When we change the format around, we can implement a new constructor for an XmlProtonDrive, or something similar. I think that should work.

Automated plotting module

Now that we've settled on a datafile format, we should make a module that can plot the data we have. This should include ways to

visualize a calibration
visualize an equilibrium simulation
visualize NCMC protocols/acceptance rates

Add counter-ion swaps for maintaining charge neutrality

Greg has added functionality to saltswap that should allow us to plug saltswappers into our Simulation style classes. This should allow the code to maintain charge neutrality.

I should design a CounterIonProtonDrive that can interact with a salinator/swapper from saltswap code.
We should extend functionality of ConstantPHSimulation and ConstantPHCalibration as needed to facilitate the new class.
Add a salt reporter?

As John points out below, in the future we will also want to support compatibility with a full osmostat simulation.

Create a module for BAR analysis of calibration data

We have settled and fixed most bugs in the netCDF4 format. We should automate the BAR analysis procedure.

Ions are accumulating over the course of several NCMC protocols

There seems to be accumulation of ions over the course of a simulation.

I thought I had built ways to prevent this. I tested the selection of salt here:
https://github.com/choderalab/protons/blob/master/protons/tests/test_explicit.py#L733

So perhaps I overlooked something, or something is wrong inside of the NCMCProtonDrive.

Decide on simulation settings:

We want to pick a default

NCMC protocol ( could use linear for everything)
Desired simulation length
pH values of interest

Support for modified version of forces in alchemy

We should add support for custom forces. the parameter names used in software such as Yank should have been standardized. We can adhere to the same standard in the code to maintain compatibility.

Use getter functions of ExternalPerturbation integrators

The new openmmtools integrators have some new getter functions that we may want to use inside of the ncmc protocol work method.

Implementing inherent-pKa biases

I had a few thoughts on implementing the two-step method from the Chen&Roux paper. (doi: 10.1021/acs.jctc.5b00261), and I just wanted to leave those here to make sure I am interpreting things correctly, as well as discuss details of the implementation.

Equation 25 (here below) is defined for a single transition, for one residue:

Implementation plan

Randomly pick a residue from all titratable residues
Randomly pick a physically accessible state (since it needs to have a pKa).
Accept/reject with probability given by eq. 25.

If accepted:

Compute the probability of accepting the new parameters using _compute_log_probability like we do currently.

Because of the pKa bias, we have higher chance of these being accepted.

If rejected:

Try again, but we won't need to calculate the expensive second part, so we can increase the number of trials.

Changes required to other parts of the code:

We need an equilibrium constant for pairs of states, not single states (pKa), which likely means that it needs to be a physical process.

Even for proteins we would need to adapt our code because residues have more than one state (e.g. ASP has 5), and we need to define which transitions are valid, and what the pKa is.

We'd need to encode the sign of the transition (λ' - λ).

Some questions/issues:

Our old solution would work for tautomers, but using this, we'd have to come up with a separate expression to update tautomers (use the old method?).
Would we allow picking a transition to the current state (old state = new state)? (always accept?)
For states that always should be accepted (equally probable tautomers of ASP), set pKa = pH (kind of a hack)?

Wisdoms, Part 1; the carboxylates

I've skimmed through the source code here (mostly looking at class methods and docstrings) and I'm providing unsolicited comments based on my experience with Mongan's method and its implementation in Amber. I cleaned up and fixed the implementation in sander and implemented the method in pmemd and pmemd.cuda, so my experience is reasonably comprehensive. I'll share what I've learned so far in a series of posts. Some of these will appear in an upcoming publication (that's on my Ph.D. advisor's desk...)

The first is primarily relevant to carboxylate residues, but applies to some extent to every titratable residue. The good results reported by Mongan and myself [DOI 10.1021/ct300512h] for carboxylate residues is largely accidental and result from the dynamics adopting 'bad' conformations. When I tried using a hybrid GB/explicit method that samples conformations in explicit solvent and protonation states in GB, I found that the computed pKas of the carboxylate residues were systematically low (in several cases by more than 3-4 pK units). I was using the same GB model for protonation state sampling as I did for the all-implicit method, so the only difference with respect to the protonation state sampling was the difference in conformational state sampling between implicit and explicit calculations. The only way the implicit solvent calculations got reasonable carboxylate pKas was by adopting structures that resulted in more exposed carboxylates. After too many months of experimentation---and 3 weeks before my defense---I realized that the reason the carboxylate residues were being so badly underestimated was that the effective GB radii on the carboxylate functional group were about 0.1 A too large on average. It was pretty obvious in retrospect -- the ghost hydrogens--2 attached to each carboxylate oxygen--are responsible. The effective radius of the oxygens in the 'deprotonated' AS4 model compound is about 0.1 A larger than that of ASP. When I decreased AS4 and GL4 carboxylate oxygen radii by 0.1 A (and recomputed the reference energies), the RMSE of my computed pKas to experiment went from ~2.2 pK units to 0.7 - 0.8 pK units.

The best way to handle this would be to exclude 'inactive' protons from the effective radius calculation. I'll modify the OpenMM implementations of the Amber GB models and drop them in this repo. This would require recomputing the reference energies, but it may be worthwhile. I only had time to verify that uniformly reducing the oxygen radii by 0.1 A yielded good results, so I'm not sure what (if any) improvement to expect compared to computing the 'correct' effective radii. Regardless, it is important to account for this for carboxylate residues at least.

Change the way we deal with avoiding errors when changing number of exclusions

See this thread:
openmm/openmm#252 (comment)

For any exceptions/exclusions that may change during the constant-pH simulation, we should set chargeProd and epsilon to be non-zero before creating the context, and then set them back to whatever value we like.

Renaming of <Protons/> ffxml block for consistency.

https://github.com/choderalab/protons/blob/master/examples/Ligand%20example/imidazole.xml#L24

@jchodera notes: For example, we have <Residue> denote a residue, <Atom> denotes an atom. The <Protons> tag doesn't denote a proton, but a block of information defining parameters for different titration states.

The suggestion is to rename <TitrationStates/>. Potentially could come up with a term that encompasses tautomers as well, since @wiederm is using the code for that purpose.

Todo:

Modify the setup code that generates the files
Modify the code that reads the blocks
Update any example files (can probably just use sed).

Update the documentation.

The documentation is out of date. Some examples would now work differently, and some code has been deprecated/removed.
We should add new documentation and new examples before releasing

Link to docs from README.md

@bas-rustenburg: Can you add a link to the docs from the README.md?

Update the netCDF recording to support SAMS details updates

As mentioned in #90 , it would be helpful to add SAMS details to netCDF recording. We could potentially use the netCDF files to checkpoint/restart a simulation.

Decide on rest of figures for paper

We need to decide on the rest of the figures for the paper.

For example, we will need to summarize results of constant-pH simulations of complexes and their impact on free energies.

Fix broken remove_temp_files argument

The remove_temp_files keyword argument under protons.ligands.generate_protons_ffxml appears to be broken. Some files can not be removed because they are still open at the time of deletion. Recommended workaround right now is to use False as an argument, and delete the files manually if desired.

Low priority issue.

Abl benchmarks

Breakdown of an Abl NCMC benchmark.

Could protons be used for membrane proteins?

Hi, I am not sure if this is the right place to ask for help. If not, sorry for this.

Actually I want to titrate a few residues at the TM regions of a membrane protein. I tried but didn't find any good protocol to do it yet. It seems protons is the right tool. I'm considering to prepare the system by CHARMMGUI and AMBERTOOLs, and then run it by OpenMM and protons. But I still have a few concerns:

do you think could the charmm36 force field (the only choice in charmmgui as building membrane protein system) be used in protons? If not, do you have any good protocol to prepare such kind of system?
I noticed most of CpHMD works are for globular proteins, could the parameters also be used for membrane proteins?

Thanks a lot if you could give any comments.

Yong

Potential bug or expected behavior in SAMS?

Weird behavior observed in SAMS run. Potentially related to the restart/continue mechanism, or a bug in the second stage of the SAMS script. Unsure if expected behavior or bug

Calibration bugs

Opening a separate issue to discuss calibration API and bugs. Tagging @jchodera.

Add adjust-to-pH feature

The code currently relies on the g_k values in the input xml files. These are assumed to be at the right value to produce the desired target weights.

The user can override the g_k values (current approach) using driver.import_gk_values, but it would be more foolproof to standardize the input files to be for equal πs, and have the code adjust to target πs instead of manually adjusting g_k.

We could also automate that by for

Amino acids pick the pi based on pKa
For ligands record epik target weights in ligand xml per pH in each state block.

I would propose a format that adds a line such as
<Weight pH="7.4" log_pi="-1.3"/> to each state in the protons xml file. That way we could also in the future encode multiple pHs in one xml file.

Calibration plotting tweaks

Tweaks for calibration plots suggested by @jchodera

make sure the figure width is set to one column (3.5 inches?)
reduce the axis label font size
increase the axis tick labels to at least 6pt
Use log bias weight $g$ (kT) for the y-axis (no capitalization)
Use update for the x-axis (no capitalization)
Option to add a legend that indicates which state index is which color

New NCMC implementation

I'm about to start a new NCMC implementation PR, which will also integrate parts of #8 from @nilmeier.

Here is the proposed procedure:

Add an optional argument to MonteCarloTitration constructor called maintainChargeNeutrality to select whether water molecules should be replaced with ions to maintain charge neutrality.
Monovalent cation/anion parameters would be optionally set with cationName='Na+' and anionName='Cl-' which specifies name of atom in prmtop.topology from which to take (charge, sigma, epsilon) parameters from when waters are converted into monovalent ions to maintain charge neutrality. This would obviously only be used in explicit solvent.
Build a list of water molecules.
A class method nsteps_per_trial (optionally set during the MonteCarloTitration constructor) will control whether we are using instantaneous switching (0 steps) or NCMC (>= 1 steps).
Each MC update attempt, the original positions and velocities will be cached, and we will use the integrator described in this paper---where the Hamiltonian switching updates happen in between the middle of the timestep---to integrate dynamics. The NCMC criteria will be used to accept/reject the move, as in the recent Roux paper.
For now, I will assign a new momentum at the beginning of each NCMC switching iteration, but we can implement more clever approaches to avoid momentum reversal (e.g. those from Roux) later once we have benchmarking data.
If maintainChargeNeutrality=True and there are waters in the system, we will convert waters to/from monovalent ions to maintain charge neutrality during the switching process. We will only convert either waters or monovalent ions that have been converted from waters by the constant-pH facility---we will not modify any existing ions, since we can't easily convert those back into waters. We will keep the hydrogen masses constant, but turn off the electrostatics on the hydrogens as waters are converted into ions. Initially, we will just select waters or ions at random, but later, we may be able to select waters/ions based on electrostatics to increase acceptance probabilities.

Debug simulation crashes

As a reminder to do some debugging, I'm logging some simulations that failed, in case the issues are related to the code.

Failed simulations that will need debugging.

./ALK-Alectinib/3AOX_fixed_ph7.4.pdb.ligand/failed.txt failed fast
./VEGFR1-Axitinib/4AG8_fixed_ph7.4.pdb.ligand/failed.txt failed fast
./ALK-Crizotinib/2XP2_fixed_ph7.4.pdb.ligand/failed.txt failed fast
./ALK-Crizotinib/2XP2_fixed_ph7.4.pdb.apo/failed.txt failed fast
./MET-Crizotinib/2WGJ_fixed_ph7.4.pdb.ligand/failed.txt failed fast
./BCRABL-Dasatinib/2GQG_fixed_ph7.4.pdb.ligand/failed.txt failed fast
./VEGFR1-Regorafenib/2QU5_fixed_ph7.4.pdb.ligand/failed.txt failed fast
./ALK-Crizotinib/2YFX_fixed_ph7.4.pdb.ligand/failed.txt late failure, restarted.
./BCRABL-Imatinib/3PYY_fixed_ph7.4.pdb.ligand/failed.txt late failure, restarted, failed again
./VEGFR1-Axitinib/4AGC_fixed_ph7.4.pdb.ligand/failed.txt late failure, restarted
./BRAF-Dabrafenib/5CSW_fixed_ph7.4.pdb.ligand/failed.txt late failure
./BRAF-Dabrafenib/5HIE_fixed_ph7.4.pdb.ligand/failed.txt late failure
./BCRABL-Imatinib/2HYY_fixed_ph7.4.pdb.ligand/failed.txt late failure
./BRAF-Dabrafenib/4XV2_fixed_ph7.4.pdb.ligand/failed.txt late failure

List of all simulations, failed ones in bold

./ALK-Alectinib/3AOX_fixed_ph7.4.pdb.ligand/trajectory.dcd
./ALK-Alectinib/3AOX_fixed_ph7.4.pdb.apo/trajectory.dcd
./VEGFR1-Axitinib/4AG8_fixed_ph7.4.pdb.ligand/trajectory.dcd
./VEGFR1-Axitinib/4AGC_fixed_ph7.4.pdb.ligand/trajectory.dcd
./VEGFR1-Axitinib/4AG8_fixed_ph7.4.pdb.apo/trajectory.dcd
./VEGFR1-Axitinib/4AGC_fixed_ph7.4.pdb.apo/trajectory.dcd
./ALK-Crizotinib/2XP2_fixed_ph7.4.pdb.ligand/trajectory.dcd
./ALK-Crizotinib/2YFX_fixed_ph7.4.pdb.ligand/trajectory.dcd
./ALK-Crizotinib/4ANQ_fixed_ph7.4.pdb.ligand/trajectory.dcd
./ALK-Crizotinib/4ANS_fixed_ph7.4.pdb.ligand/trajectory.dcd
./ALK-Crizotinib/2XP2_fixed_ph7.4.pdb.apo/trajectory.dcd
./ALK-Crizotinib/2YFX_fixed_ph7.4.pdb.apo/trajectory.dcd
./ALK-Crizotinib/4ANQ_fixed_ph7.4.pdb.apo/trajectory.dcd
./ALK-Crizotinib/4ANS_fixed_ph7.4.pdb.apo/trajectory.dcd
./MET-Crizotinib/2WGJ_fixed_ph7.4.pdb.ligand/trajectory.dcd
./MET-Crizotinib/2WGJ_fixed_ph7.4.pdb.apo/trajectory.dcd
./BRAF-Dabrafenib/4XV2_fixed_ph7.4.pdb.ligand/trajectory.dcd
./BRAF-Dabrafenib/5CSW_fixed_ph7.4.pdb.ligand/trajectory.dcd
./BRAF-Dabrafenib/5HIE_fixed_ph7.4.pdb.ligand/trajectory.dcd
./BRAF-Dabrafenib/4XV2_fixed_ph7.4.pdb.apo/trajectory.dcd
./BRAF-Dabrafenib/5CSW_fixed_ph7.4.pdb.apo/trajectory.dcd
./BRAF-Dabrafenib/5HIE_fixed_ph7.4.pdb.apo/trajectory.dcd
./BCRABL-Dasatinib/2GQG_fixed_ph7.4.pdb.ligand/trajectory.dcd
./BCRABL-Dasatinib/4XEY_fixed_ph7.4.pdb.ligand/trajectory.dcd
./BCRABL-Dasatinib/2GQG_fixed_ph7.4.pdb.apo/trajectory.dcd
./BCRABL-Dasatinib/4XEY_fixed_ph7.4.pdb.apo/trajectory.dcd
./BCRABL-Imatinib/2HYY_fixed_ph7.4.pdb.ligand/trajectory.dcd
./BCRABL-Imatinib/3PYY_fixed_ph7.4.pdb.ligand/trajectory.dcd
./BCRABL-Imatinib/2HYY_fixed_ph7.4.pdb.apo/trajectory.dcd
./BCRABL-Imatinib/3PYY_fixed_ph7.4.pdb.apo/trajectory.dcd
./VEGFR1-Regorafenib/2QU5_fixed_ph7.4.pdb.ligand/trajectory.dcd
./VEGFR1-Regorafenib/2QU5_fixed_ph7.4.pdb.apo/trajectory.dcd

Streamlining protein:ligand constant-pH simulation

Right now, there are a lot of manual steps involved in setting up a constant-pH simulation.

I've put the instructions from @bas-rustenburg on setting up a kinase:inhibitor simulation on this wiki page as an example:
https://github.com/choderalab/protons/wiki/Setting-up-a-kinase:inhibitor-constant-pH-simulation

@peastman: As we move toward getting this code feature-ready, we would love to make it easier for users to set up these simulations. Can you take a look at the setup steps and help us brainstorm how we might be able to streamline this?

For example, we might be able to add features to PDBFixer or app.Modeller to help the user rename their residues to titratable forms, but we presumably also want to give them flexibility in specifying which residues should be allowed to titrate.

@bas-rustenburg : Let's brainstorm this further when you return from vacation this week.

Don't need to pass ffxml file to constructor

It looks like the ForceField object doesn't cache the XML files, so we will need to either request that this be added to OpenMM or you would need to add to the register a parser.

OpenMM API for C++ layer

This issue will eventually hold a discussion about the API to be implemented in the OpenMM C++ layer, but for now, I am just referencing the prior discussion on the OpenMM GitHub repo:

openmm/openmm#172

Ideas for increasing acceptance rates

Marilyn Gunner suggests we can propagate just the first few solvation shells (or atoms in a radius around the residue that is changing protonation states) during NCMC to increase acceptance rates.
We could get a lot of mileage out of more clever schemes for proposing which residues (or groups of residues) should have their protonation states modified
@pgrinaway suggests using particle filtering (with resample-move) as a way of refreshing the entire protonation state periodically
Marilyn Gunner also suggests we could use MCCE to enumerate the O(100) populated coordinated protonation states and try perturbations to these in parallel
Better integrators might support larger timesteps and less overall protocol steps
Better nonequilibrium protocols could also improve acceptance rates

Todo list

Brainstorming things to do before we can start using this code more reliably.
Tagging @jchodera @gregoryross, feel free to add points by editing (assuming you can) or replying.

Lower priority/ extra features:

pKa biasing
Selection of residue pairs for simultaneous (de)protonation
Getting rid of cpin files
- Protein constant-pH ffxml?

Good ways to handle atom types between protonation states

At some point in the future, we should come up with a preferred way of dealing with atom types. For now, a low priority since we have a working solution.

At the moment we want to avoid changing van der Waals and bonded parameters between different protonation states of ligands.

I recently implemented a simple algorithm to resolve these, which uses the most populated protonation state as the initial set of atom types and adds atom types from subsequent states to fill in the missing protons. It then checks for the atoms that lack bond parameters and swaps the atom type of one of the two atoms in the bond to an atom type from another state that does have bonded parameters. See protons/ligands.py for the implementation.

In the future we want may want to consider other options as well, such as

a new scheme creating new hybrid atom types or new bond parameters to describe the molecule.
Updating van der Waals parameters as well (whilst still require fixing bonds to a single set)
Updating vdW parameters and bonds.
Other options?

Create mechanism of checkpointing constant-pH simulation

We currently have no straightforward way to continue simulations after termination. It would be helpful to add a feature to instantiate the ProtonDrive and Sams sampler from a previous simulation.

This would include:

Saving the protonation states, and (potentially updated) weights of each state.
Sams iteration numbers for resuming calibrations.

Brainstorm simulation API proposal

So we can discuss it here:

from __future__ import print_function
from simtk.openmm import app
# TODO  from constph import ConstpHForceField ?
import simtk.openmm as mm
from simtk import unit
from sys import stdout

pdb = app.PDBFile('input.pdb') # preprocessed with right residue names to indicate constph?

forcefield = app.ConstpHForceField('amber99cph.xml', 'tip3p.xml', 'ligandcph.xml') # subclass of forcefield that supports custom format residues

system = forcefield.createSystem(pdb.topology, 
 nonbondedMethod=app.PME,nonbondedCutoff=1.0*unit.nanometers, constraints=app.HBonds, rigidWater=True, ewaldErrorTolerance=0.0005, cph_indices=None, ph=7.4)
# cph_indices, if None, all that can be matched set up as constant ph, else, list of indices
integrator = mm.LangevinIntegrator(300*unit.kelvin, 1.0/unit.picoseconds, 2.0*unit.femtoseconds)
integrator.setConstraintTolerance(0.00001)

# TODO define compound integrator here too, or leave for simulation?
ncmcintegrator =  VelocityVerletIntegrator(ncmc_timestep)
system.addForce(mm.MonteCarloBarostat(1*unit.atmospheres, 300*unit.kelvin, 25))

platform = mm.Platform.getPlatformByName('CUDA')
properties = {'CudaPrecision': 'mixed'}
# compound integrator under the hood, hide system update under the hood
simulation = app.Simulation(pdb.topology, system, {'md': integrator, 'ph': ncmcintegrator}, platform, properties)
simulation.context.setPositions(pdb.positions)

print('Minimizing...')
simulation.minimizeEnergy()
simulation.context.setVelocitiesToTemperature(300*unit.kelvin)

simulation.reporters.append(app.DCDReporter('trajectory.dcd', 1000)) # modify this to only write out active protons?
# report protonation states
simulation.reporters.append(app.StateDataReporter(stdout, 1000, step=True,
    potentialEnergy=True, temperature=True, progress=True, remainingTime=True,
    speed=True, totalSteps=1000, protonationStates=True, separator='\t'))

print('Running Production...')
simulation.calibrate(10000, ph_every=(100,1)) # 10000 cph calibration steps, updating ref energies
simulation.step(1000000, ph_every=(6000, 1)) # run mc step every 6000 md steps, 1 attempt
print('Done!')

Dealing with velocities in ncmc

As @gregoryross points out, we have to reset velocities to original values upon rejecting. Currently, we only change their sign on acceptance.

Handle syn- and anti- protons for carboxylic acids in ligands

We currently don't have a way to handle syn- and anti- protons for carboxylic acids in ligands.

The octa-acids may need to have this feature.

Latest cpinutil.py and explicit solvent messages?

I realize we're using a possibly-outdated version of cpinutil.py for our current tests.

@swails: Where should we get the most up-to-date version of this tool?

Also, I'm wondering what this warning means:

Warning: Carboxylate residues in explicit  solvent simulations require a modified
topology file! Use the -op flag to print one.

Add documentation for setting up a system using tleap

Document how to use tleap, and cpin_util.py to start a constph simulation.

Wisdoms, Part 2; Protonation exchange attempts

This part deals with the protonation state change strategy that yields the most efficient sampling of the total semi-grand ensemble.

I've thought about this quite a bit, and I think Amber's approach is about as good as you can do. To lay the background, Amber attempts to change a random residue's protonation state to a random (but different) state. Unless two residues are coupled together energetically, attempting two protonation state changes at the same time will hurt the probability of accepting the move.

I'm pretty sure the maximum acceptance probability for the reference compound will occur when the pH is set to the pKa of the residue. At this point, the model CYS residue (which has 2 protonation states), has a 45% acceptance rate. Titrating two independent CYS residues will drop that to <25%. (This code already takes the single-residue approach). One difference of Amber's is that the multi-state move is attempted between a random residue in the pairlist of the first chosen residue. I don't think this makes too much of a difference from your approach, but for a large number of residues in a large system it might.

I also think that only attempting one protonation state change per MC attempt is the best way to go. My argument for this point is not that many change attempts will hurt, just that the simulation becomes less efficient doing that. In the process of computing the energy of a protonation state, you also get the forces essentially for free. With those forces in hand, there is little cost in advancing the simulation, and this way you utilize all of the energy calculations for the different trial states. In my experience, even in typical proteins with 10-20+ titratable residues, protonation state sampling occurs so rapidly that I've always focused on trying to improve conformational sampling. Most papers that focus on improving CpHMD focus on enhancing conformational sampling for this reason.