reactionmechanismgenerator / rmg-database Goto Github PK

View Code? Open in Web Editor NEW

95.0 95.0 137.0 211.76 MB

The database of chemical parameters used with Reaction Mechanism Generator

Home Page: http://rmg.mit.edu/database/

Python 98.52% Shell 0.01% Jupyter Notebook 1.41% PostScript 0.06%

rmg-database's People

Contributors

Stargazers

Watchers

Forkers

rwest shamelmerchant pierrelb faribas sean-v8 calebclass comocheng bslakman bbuesser jbarlow3 nyee vrlambert keceli enochd agvandeputte connie jabrownmit kehang alaraen cainja nickvandewiele chatelak olusade lin82 mliu49 goldmanm dgreen18 alui1 cfgoldsmith alongd yunsiechung rgillis8 iyhsieh aelong cgrambow pengzhang13 zjburas lyle-zhang soumsrani adeeljamal wangyuran nateharms gitter-badger jimchu10 harlock0083 fredwbacon phalgunlolur mjohnson541 awaggett yplitw wayneyann pattanaikl mcs16 ajocher sarahkha wdhua2008 xiaoruidong hiroumitani davidfarinajr kblondal gyg16 amarkpayne lily90502 yfyh2013 cookman2019 oscarwumit mingyuwan pw0908 hwpang mbprend ericwein chrisbneu skrsna ramirjos solmazptb sevyharris michaelgeuking toyegoke iamjiazhang jonwzheng marks72 nickdewey jeaersse jcquim071 rvkmr1989 venoosam maiti2 yeseulchoiau danielokuo tranlucsy ehguzman raschwind grigorevae tcbbcc pk-organics taikikato skethirajan phantom-balance vbarber820 arun1pal

rmg-database's Issues

Be careful while typing SMILES for molecules on the dev website

I am making this issue to make people aware that with new changes with adjlist, you need to be able to write correct SMILES to get the molecule you want

For example if you want CH2=CHNH2

You cannot type CH2=CH2N ... you will get some random molecule
http://tinyurl.com/lhu8sme

The correct way to type this molecule in the website is C=CN
http://tinyurl.com/lwmmb3t

I and Aaron spent some time getting confused today and thought its better other people are aware.

Conversion from abraham parameters for solvent to coefficients in RMG

I am wondering is there any documentation from @ajalan? how did he convert the solvent parameters from LSER database to parameters to put into RMG libraries? Is it easy for solute there are jsut used as they are but not for solvents...

At one point it may be interesting to document it? Or maybe there is already something but I did not found it.

Do you have any information about it? @rwest @bslakman @connie

Duplicate reactions

There are many duplicate reactions in the Chemkin file RMG-Py generates (chem.inp), which amounts to about 10%-20% of total reactions, to my experience. Following's a representative example from my trials:

! Reaction index: Chemkin #56; RMG #556
! Library reaction: Nitrogen_Glarborg_Lucassen_et_al
NCC(1)+CH3(28)=C2H6N(93)+CH4(37) 7.300e+02 2.990 7.950

! Reaction index: Chemkin #60; RMG #1712
! Template reaction: H_Abstraction
! Estimated using template (C_sec;C_methyl) for rate rule (C/H2/NonDeN;C_methyl)
! Multiplied by reaction path degeneracy 2
CH3(28)+NCC(1)=CH4(37)+C2H6N(93) 9.203e-03 4.351 3.987

In the above example you can see two duplicate reactions (#56, #60), the first taken from a defined kinetic library, whereas the second was generated via the kinetic rules. Needless to say, Chemkin will not run unless one of the reactions is deleted or both are marked with "DUP".

At other times in the same file RMG gives a duplicate reaction with kinetic data taken from two different libraries, instead of using by default the earliest defined library:

! Reaction index: Chemkin #19; RMG #480
! Library reaction: combustion_core/version5
H(10)+H(10)+M=H2(11)+M 1.870e+18 -1.000 0.000
CH4(37)/3.00/ N2(3)/0.40/ C2H6(84)/3.00/ H2(11)/0.00/ Ar/0.35/
DUPLICATE

! Reaction index: Chemkin #21; RMG #1641
! Library reaction: Nitrogen_Glarborg_Gimenez_et_al
H(10)+H(10)+M=H2(11)+M 7.000e+17 -1.000 0.000
H2(11)/0.00/ N2(3)/0.00/
DUPLICATE

At least in this example it wrote "DUP" after the reactions, nevertheless the second reactions should have been suppressed.

I'm aware that similar issues were brought up and solved in the past (#49, #83, #146, #334, #337).
Somehow the phenomenon returned.

Any suggestions?

Biradicals 2S and 2T thermo and reactivity

Noticed this when importing USC-Mech-ii.
RMG predicted some reactions that matched USC-Mech reactions, and created a H2CC species:

1 C 0  {2,D}
2 C 2T {1,D}

But the USC thermo is for the singlet:

1 C 0  {2,D}
2 C 2S {1,D}

Which is much more stable.
The DFT_QCI_Thermo database has both singlet and triplet in.
I am not sure what the reaction should make.

Probably we should have a state-crossing reaction, a bit like the 1,2-Birad_to_alkene family, that reacts H2CC(T) + M <=> C2CC(S) + M.

There may be other issues though - I'm not sure when RMG makes a 2T and when a 2S, nor am I sure if they are equally reactive in terms of matching nodes in reaction families.

got an unexpected keyword argument 'rank'

I updated to the RMG-database version 724c086 (running RMG v1.0.3), and now all my runs crash while loading the thermo group database:

Loading thermodynamics library from thermo_DFT_CCSDTF12_BAC.py in /home/alongd/ws/RMG-database/input/thermo/libraries...
Loading thermodynamics library from CHN.py in /home/alongd/ws/RMG-database/input/thermo/libraries...
Loading thermodynamics library from DFT_QCI_thermo.py in /home/alongd/ws/RMG-database/input/thermo/libraries...
Loading thermodynamics library from primaryThermoLibrary.py in /home/alongd/ws/RMG-database/input/thermo/libraries...
Loading thermodynamics library from 1Cu4.py in /home/alongd/ws/RMG-database/input/thermo/libraries...
Loading thermodynamics group database from /home/alongd/ws/RMG-database/input/thermo/groups...
Error: Error while reading database '/home/alongd/ws/RMG-database/input/thermo/groups/ring.py'.
Traceback (most recent call last):
  File "/home/alongd/ws/RMG-Py/rmg.py", line 165, in <module>
    rmg.execute(inputFile, output_dir, **kwargs)
  File "/home/alongd/ws/RMG-Py/rmgpy/rmg/main.py", line 514, in execute
    self.initialize(inputFile, output_directory, **kwargs)
  File "/home/alongd/ws/RMG-Py/rmgpy/rmg/main.py", line 380, in initialize
    self.loadDatabase()
  File "/home/alongd/ws/RMG-Py/rmgpy/rmg/main.py", line 303, in loadDatabase
    depository = False, # Don't bother loading the depository information, as we don't use it
  File "/home/alongd/ws/RMG-Py/rmgpy/data/rmg.py", line 94, in load
    self.loadThermo(os.path.join(path, 'thermo'), thermoLibraries, depository)
  File "/home/alongd/ws/RMG-Py/rmgpy/data/rmg.py", line 114, in loadThermo
    self.thermo.load(path, thermoLibraries, depository)
  File "/home/alongd/ws/RMG-Py/rmgpy/data/thermo.py", line 382, in load
    self.loadGroups(os.path.join(path, 'groups'))
  File "/home/alongd/ws/RMG-Py/rmgpy/data/thermo.py", line 422, in loadGroups
    self.groups['ring']    =    ThermoGroups(label='ring').load(os.path.join(path, 'ring.py'   ), self.local_context, self.global_context)
  File "/home/alongd/ws/RMG-Py/rmgpy/data/base.py", line 230, in load
    exec f in global_context, local_context
  File "/home/alongd/ws/RMG-database/input/thermo/groups/ring.py", line 27, in <module>
    rank = 10,
TypeError: loadEntry() got an unexpected keyword argument 'rank'

Reaction libraries error: too many values to unpack

When I tried running input.py with reaction libraries (reactionLibraries = ['ERC-FoundationFuelv0.9']), I got the error:

Traceback (most recent call last):
File "/Users/phillipwestmoreland/anaconda/bin/rmg.py", line 147, in
rmg.execute(args)
File "/Users/phillipwestmoreland/anaconda/lib/python2.7/site-packages/rmgpy/rmg/main.py", line 462, in execute
self.initialize(args)
File "/Users/phillipwestmoreland/anaconda/lib/python2.7/site-packages/rmgpy/rmg/main.py", line 354, in initialize
self.loadDatabase()
File "/Users/phillipwestmoreland/anaconda/lib/python2.7/site-packages/rmgpy/rmg/main.py", line 276, in loadDatabase
reactionLibraries = [library for library, option in self.reactionLibraries],
ValueError: too many values to unpack

When I took out 'ERC-FoundationFuelv0.9', it ran okay.

importChemkinLibrary not user friendly

Most users wont go into rmgpy/chemkin.py and write print statements where needed in order to 'debug' their chemkin mechanism formatting errors or mismatches with expectations of RMG. It would be nice to see what the offending species, reaction, thermo entry, etc., is to make it easier for users to use this tool.

Handling CO in RMG-Py

With recent code changes in RMG-Py we now form both CO triplet and CO singlet in our mechanism.

To explain what is happening we first need to note the legacy way of handling CO.

In RMG-Java CO is represented as CO(T) and even though CO(S) is the more stable electronic state. We also assigned CO(S) thermo to CO(T) for example in DFT-QCI thermo library. The seed mechanism also have CO(T) and the families produce CO(T) only I think. Hence we were chemically wrong but consistent throughout which I think is OK.

Chemically CO(S) is the more stable state than CO(T).

in RMG-Py some libraries are imported from RMG-Java and have CO(T) while some nitrogen specific libraries have CO(S). The families in RMG-Py have no restriction and produce bot CO(T) and CO(S) .. if you were to load DFT-QCI as a thermo library now you would assign CO(S) thermo to CO(T). This hodgepodge can lead to weird mechanism getting formed and we should treat CO correct consistently.

Looks like there is a typo in rate rules file

Location: RMG-database\input\kinetics\families\H_Abstraction\rules.py
Line: 2038 / Entry 205
“E0 = (1.45, 'kcal/mol') “

I think it should be a negative value as -1.45

I checked the source literature.
It looks like the parameters for this entry are taken from the rate for i-Butane T000 site.
They were written as logA = 3.41, n = 1.9, B = -730 in the original paper.
Then we have B*1.987/1000= -1.45
So maybe it's a typo in the rate rules file?

USC-Mech-ii thermo library has completely wrong adjlists

https://github.com/GreenGroup/RMG-database/blob/master/input/thermo/libraries/USC-Mech-ii.py

Anyone want to help fix them?

Incorrect indexes

There's an index attribute for all entries in training reactions, kinetics groups & libraries, and thermo groups and libraries. The original purpose was just for referencing: its easier to point somebody to index number 60 than to a possibly long string title. Additionally, these indexes originally matched up with the order in Java.

Currently, we know that there it no longer matches Java, and there are accidental duplicates. It is also an inconvenience to pick an non-duplicate and non-arbitrary index when adding new entries. It seems like there are two options to clean these up:

Eliminate the attribute altogether. I have personally never used an index number to reference it to another person, and there's an argument that the string label is easy enough to copy and paste.
Write a script to clean up indexes in the files. If we do this, we can maintain the original function, but it would be a nice feature to put the largest index on the top of every file.

I'm okay with either of these options. How do other people feel about this?

H + CO2 <=> [O]C=O has two occurances under R_Addition_MultipleBond

In the Chemkin output file I get the following duplicate reactions for H + CO2 <=> [O]C=O

! Reaction index: Chemkin #551; RMG #11336
! PDep reaction: PDepNetwork #1126
! Flux pairs: H(5), CHO2(3053); CO2(30), CHO2(3053); 
H(5)+CO2(30)(+M)=CHO2(3053)(+M)                     1.000e+00 0.000     0.000    
    TCHEB/ 1000.000  1600.000 /
    PCHEB/ 0.493     2.961    /
    CHEB/ 6 4/
    CHEB/ 9.976e+00    3.654e-01    -1.313e-03   -5.167e-05  /
    CHEB/ 2.585e-01    9.412e-03    4.366e-04    1.175e-05   /
    CHEB/ 5.590e-03    -1.645e-04   -2.254e-06   4.709e-07   /
    CHEB/ 7.225e-04    -7.087e-06   1.669e-07    1.970e-08   /
    CHEB/ 2.234e-05    -1.396e-06   1.525e-08    -6.190e-09  /
    CHEB/ -1.768e-06   -2.518e-07   -8.667e-09   -1.136e-10  /
DUPLICATE

! Reaction index: Chemkin #557; RMG #13857
! PDep reaction: PDepNetwork #1639
! Flux pairs: CHO2(3053), H(5); CHO2(3053), CO2(30); 
CHO2(3053)(+M)=H(5)+CO2(30)(+M)                     1.000e+00 0.000     0.000    
    TCHEB/ 1000.000  1600.000 /
    PCHEB/ 0.493     2.961    /
    CHEB/ 6 4/
    CHEB/ 8.885e+00    3.654e-01    -1.313e-03   -5.167e-05  /
    CHEB/ 2.827e-01    9.412e-03    4.366e-04    1.175e-05   /
    CHEB/ 2.605e-03    -1.645e-04   -2.254e-06   4.709e-07   /
    CHEB/ 4.510e-04    -7.087e-06   1.669e-07    1.970e-08   /
    CHEB/ -8.878e-06   -1.396e-06   1.525e-08    -6.190e-09  /
    CHEB/ -5.664e-06   -2.518e-07   -8.667e-09   -1.136e-10  /
DUPLICATE

From the PDepNetwork files I get respectively:

reaction(
    label = 'reaction1',
    reactants = ['H(5)', 'CO2(30)'],
    products = ['[O]C=O(1169)'],
    transitionState = 'TS1',
    kinetics = Arrhenius(A=(16820,'m^3/(mol*s)'), n=1.23, Ea=(52.7723,'kJ/mol'), T0=(1,'K'), comment="""Estimated using template (Cdd_Od;HJ) for rate rule (CO2;HJ)
Multiplied by reaction path degeneracy 2
Ea raised from 48.7 to 52.8 kJ/mol to match endothermicity of reaction."""),
)

reaction(
    label = 'reaction1',
    reactants = ['H(5)', 'CO2(30)'],
    products = ['O=[C]O(3053)'],
    transitionState = 'TS1',
    kinetics = Arrhenius(A=(0.7734,'m^3/(mol*s)'), n=2.941, Ea=(35.6477,'kJ/mol'), T0=(1,'K'), comment="""Estimated using template (Od_R;HJ) for rate rule (Od_Cdd-Od;HJ)
Multiplied by reaction path degeneracy 2."""),
)

For the first reaction, CO2 is recognized as Cdd_Od, and for the second as Od_Cdd-Od, both in R_Addition_MultipleBond (unrelated: could be nice to have the family name integrated into the PDepNetwork comments, see ReactionMechanismGenerator/RMG-Py#605)

From my understanding, this approximately doubles the reaction rate as both duplicate reactions are accounted for when generating the flux.

PDep reactions in Reaction Libraries have units error

See ReactionMechanismGenerator/RMG-Py#96 for details.

But basically,

        arrheniusHigh = Arrhenius(A=(2.1e+18,"s^-1"), n=-0.6148, Ea=(92540,"cal/mol"), T0=(1,"K")),
        arrheniusLow = Arrhenius(A=(2.6e+49,"s^-1"), n=-8.8, Ea=(101500,"cal/mol"), T0=1),

the arrheniusLow should have units of cm3/mol/s.

This results in a 10^6 error in the resulting rate once it's been through RMG-Py.

Classifying the NO2 + OH <=> HNO3 reaction

The reaction:
+ <=>

currently does not fall into any of the current reaction families.
It might resemble R_Addition_MultipleBond, but in this case we are still left with the double bond in the product due to charge separation in the resonance structure, and the reaction results in decrementing the nitrogen's lone pair.

Searching for this reaction in the web kinetics search brings up two library reactions with identical coefficients (Nitrogen_Glarborg_Zhang_et_al no. 713, Nitrogen_Glarborg_Gimenez_et_al no. 937), but these libraries won't always be put to work in nitrogen model generations.

I guess we should create a new reaction family for it.

(message edited, original message with trace moved to issue #145 )

'kineticsGroups.py generate' dies with KeyError: 'HOCH[OO]CH3'

When I run python kineticsGroups.py generate I get :

Categorizing reactions in training and test sets for HO2_Elimination_from_PeroxyRadical
Traceback (most recent call last):
  File "kineticsGroups.py", line 253, in <module>
    args.run(args, database)
  File "kineticsGroups.py", line 208, in generate
    plot = False,
  File "kineticsGroups.py", line 55, in generateKineticsGroupValues
    reaction, template = database.kinetics.getForwardReactionForFamilyEntry(entry=entry, family=family, thermoDatabase=database.thermo)
  File "/Users/rwest/XCodeProjects/RMGpy/RMG-Py/rmgpy/data/kinetics.py", line 2670, in getForwardReactionForFamilyEntry
    template = [groups.entries[label] for label in entry.label.split(';')]
KeyError: 'HOCH[OO]CH3'

Phenoxy thermo problems?

See ReactionMechanismGenerator/RMG-Java#287 for details.
I am cross-posting here in case it is a general database issue not an RMG-Java problem.

Parent/child nodes written as siblings for thermo

Because of the confusing issue of overlapping children, it looks like we have never written many checks about sibling nodes. In groups.py (and possibly other trees in thermo and kinetics) we have some nodes in the following pattern:

L1. Grandparent
L2. Parent
L2. Child

In the above case, the node child is completely inaccessible, because a molecule will always match parent first. It should probably be written as:
L1. Grandparent
L2. Parent
L3. Child

I think it should be simple to write a script to correct this. I'll just load in the database, search for cases like above, then change the attributes node.children and node.parents appropriately, and finally re-save the database. Please let me know if you think there will be any other complications.

New families not passing new unit tests

The three new families are not passing the brand new database unit tests, causing the RMG-Py project to fail its Travis-CI test. (This is not surprising, as the long-established families were failing them until very recently)

@faribas do you think you could take a look, and either comment here or perhaps open a new pull request with some fixes? (from the GreenGroup/master branch)

You can run the tests by doing

$ cd $RMGpy
$ nosetests -v -d --nologcapture rmgpy/databaseTest.py

or just

$ python rmgpy/databaseTest.py

I am currently getting

test_kinetics_checkChildParentRelationships (__main__.TestDatabase) ... skipped 'WIP test failed: In Disproportionation family, group Y_rad_birad_trirad_quadrad is not a proper parent of its child Y_2centerbirad.'
test_kinetics_checkCorrectNumberofNodesInRules (__main__.TestDatabase) ... FAIL
test_kinetics_checkGroupsFoundInTree (__main__.TestDatabase) ... FAIL
test_kinetics_checkGroupsNonidentical (__main__.TestDatabase) ... FAIL
test_kinetics_checkNodesInRulesFoundInGroups (__main__.TestDatabase) ... FAIL

======================================================================
FAIL: test_kinetics_checkCorrectNumberofNodesInRules (__main__.TestDatabase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "rmgpy/databaseTest.py", line 35, in test_kinetics_checkCorrectNumberofNodesInRules
    self.assertEqual(len(nodes), expectedNumberNodes, "Wrong number of groups or semicolons in family {family} rule {entry}.  Should be {num_nodes}".format(family=family_name,entry=entry,num_nodes=expectedNumberNodes))
AssertionError: Wrong number of groups or semicolons in family Intra_R_Add_ExoTetCyclic rule R2OOR_SCO;Cs_rad_intra.  Should be 3

======================================================================
FAIL: test_kinetics_checkGroupsFoundInTree (__main__.TestDatabase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "rmgpy/databaseTest.py", line 59, in test_kinetics_checkGroupsFoundInTree
    self.assertTrue(ascendParent is not None, "Group {group} in {family} family was found in the tree without a proper parent.".format(group=child,family=family_name))
AssertionError: Group RHadd_intra in Intra_RH_Add_Endocyclic family was found in the tree without a proper parent.

======================================================================
FAIL: test_kinetics_checkGroupsNonidentical (__main__.TestDatabase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "rmgpy/databaseTest.py", line 74, in test_kinetics_checkGroupsNonidentical
    self.assertFalse(family.matchNodeToNode(nodeGroup, nodeGroupOther), "Group {group} in {family} family was found to be identical to group {groupOther}".format(group=nodeName, family=family_name, groupOther=nodeNameOther))
AssertionError: Group multiplebond_intra in Intra_RH_Add_Endocyclic family was found to be identical to group doublebond_intra

======================================================================
FAIL: test_kinetics_checkNodesInRulesFoundInGroups (__main__.TestDatabase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "rmgpy/databaseTest.py", line 46, in test_kinetics_checkNodesInRulesFoundInGroups
    self.assertTrue(node in family.groups.entries, "In {family} family, no group definition found for label {label} in rule {entry}".format(family=family_name, label=node, entry=entry))
AssertionError: In Intra_RH_Add_Endocyclic family, no group definition found for label double_bond_intra_Nd in rule R6_SSS;double_bond_intra_Nd;radadd_intra_O

----------------------------------------------------------------------
Ran 5 tests in 121.067s

FAILED (failures=4, skipped=1)

@connie, one disadvantage with the unit test framework (for debugging) is that it only tells you one error at a time per test then gives up on that test. As it takes 2 minutes to run the tests (on my old laptop) this could be tedious for fixing many errors. Is it still easy to run the script that gives all the errors at once?
For me, running

$ python databaseTester.py

doesn't report any problems.

Unable to generate HONO + OH <=> N(O)(O)[O] in R_Addition_MultipleBond

Running a nitrogen sim in RMG without defining reaction libraries (testing rate rules) resulted with an error (one of many):

Error: Unable to calculate degeneracy for reaction <Molecule "ON=O"> + <Molecule "[OH]"> <=> <Molecule "N(O)(O)[O]"> in reaction family R_Addition_MultipleBond. Expected 1 reaction but generated 0
Reactant: <Molecule "ON=O">
Reactant: <Molecule "[OH]">
Product: <Molecule "N(O)(O)[O]">
Traceback (most recent call last):
  File "C:\Code\RMG-Py\rmgpy\scoop_framework\util.py", line 112, in __call__
    return self.myfn(*args, **kwargs)
  File "C:\Code\RMG-Py\rmgpy\rmg\react.py", line 96, in reactMolecules
    rxns = family.generateReactions(molecules)
  File "C:\Code\RMG-Py\rmgpy\data\kinetics\family.py", line 1367, in generateReactions
    reactionList.extend(self.__generateReactions(reactants, forward=False))
  File "C:\Code\RMG-Py\rmgpy\data\kinetics\family.py", line 1587, in __generateReactions
    reaction.degeneracy = self.calculateDegeneracy(reaction)
  File "C:\Code\RMG-Py\rmgpy\data\kinetics\family.py", line 1386, in calculateDegeneracy
    return reactions[0].degeneracy
IndexError: list index out of range

Traceback (most recent call last):
  File "C:\Code\RMG-Py/rmg.py", line 152, in <module>
    rmg.execute(**kwargs)
  File "C:\Code\RMG-Py\rmgpy\rmg\main.py", line 533, in execute
    bimolecularReact=self.bimolecularReact)
  File "C:\Code\RMG-Py\rmgpy\rmg\model.py", line 621, in enlarge
    rxns = reactAll(self.core.species, numOldCoreSpecies, unimolecularReact, bimolecularReact)
  File "C:\Code\RMG-Py\rmgpy\rmg\react.py", line 155, in reactAll
    rxns = list(react(*spcTuples))
  File "C:\Code\RMG-Py\rmgpy\rmg\react.py", line 75, in react
    combos
  File "C:\Code\RMG-Py\rmgpy\scoop_framework\util.py", line 154, in map_
    return map(WorkerWrapper(args[0]), *args[1:], **kwargs)
  File "C:\Anaconda\envs\rmg_env\lib\site-packages\scoop-0.7.2.0-py2.7.egg\scoop\fallbacks.py", line 49, in wrapper
  File "C:\Code\RMG-Py\rmgpy\scoop_framework\util.py", line 112, in __call__
    return self.myfn(*args, **kwargs)
  File "C:\Code\RMG-Py\rmgpy\rmg\react.py", line 96, in reactMolecules
    rxns = family.generateReactions(molecules)
  File "C:\Code\RMG-Py\rmgpy\data\kinetics\family.py", line 1367, in generateReactions
    reactionList.extend(self.__generateReactions(reactants, forward=False))
  File "C:\Code\RMG-Py\rmgpy\data\kinetics\family.py", line 1587, in __generateReactions
    reaction.degeneracy = self.calculateDegeneracy(reaction)
  File "C:\Code\RMG-Py\rmgpy\data\kinetics\family.py", line 1386, in calculateDegeneracy
    return reactions[0].degeneracy
IndexError: list index out of range

It fits the R_Addition_MultipleBond family, yet no reaction was generated.
+ <=>

Searching for ON=O and OH in the kinetics search brings up only the addition reaction of the OH group to the O atom, not N. This is strange, since the right groups for the above reaction exist in this reaction family: N3d-NonDe_Od for ON=O, and OJ_pri for OH.

import/export: troe parameters losing precision

In kinetics_libraries/Dooley/C1/pdepreacitons.txt for example, there's a Troe T** that starts as 6964.00 in the published model but by the time it's been imported and exported it becomes 7e+03. We should try to increase the precision of the output formatting (perhaps use g instead of e, and/or just more characters).

Problem with Forbidden Structure: "N_birad_singlet_2singleBonds"

This forbidden structure:
entry(
label = "N_birad_singlet_2singleBonds",
group =
"""
1 N u0 p1 {2,S} {3,S}
2 R ux {1,S}
3 R ux {1,S}
""",
shortDesc = u"""""",
longDesc =
u"""

""",
)

chokes on amines. From the title, I wonder if the issue occurs because it should read u2 instead of u0 in the connectivity diagram (i.e. biradical?)

Importing and exporting to java saves template for reverse reactions.

I believe this reverse template is generated at startup from the forwards template, for both Java and Python versions, and should not need to be saved in the database file.

This may in fact be a bug from importing the Java database, rather than exporting it. ( edit: I now think it's most likely an export problem). Either way, a round-trip from Java to Python to Java adds some entries to the dictionary that RMG-Java then cannot read.

eg. in R_Addition_MultipleBond/dictionary.txt, this appears:

YXZ.6
1 *1 Os        0 {2,S} {4,S}
2 *2 {Sid,SiO} 1 {1,S} {3,D}
3    Od        0 {2,D}
4 *3 R         0 {1,S}

Kinetics families that still need to be updated following Cd/CO/CS changes

While running a job, I found that the Intra_R_Add_Exocyclic famiily still needs to be updated following the changes in #107. RMG cannot match C=S at all.

This may already be on Nathan's todo list (@nyee ), but I'm putting it here so it's not forgotten.

radicals exported incorrectly to Java

In Py, the groups are flexible to take multiple radicals such as:

CO_birad
1 *1 C {2S,2T} {2,D}
2 O 0 {1,D}

Java can only handle one number for the radicals, but the exporting feature doesn't account for this. In addition, sometimes a weird radical set gets added to the Java version. See from 1,2 Insertion:

R_CO_R'1
1 *1 C {0,0} {2,D} {3,S} {4,S}
2 O 0 {1,D}
3 *2 {Cb,Sis,Sid,H,Cs,O,Cd} 0 {1,S}
4 *3 H 0 {1,S}

Ea in ROOH_sec;C_rad/H/NonDeC - Amrit Jalan's Thesis - Is the entry in the database correct?

The reaction ROOH + R. = ROO. + RH has an exact match in the RMG database, indicated with '[AJ]'.
It was suggested that this was calculated by Amrit Jalan - however reading his thesis, unless I am making a mistake, a positive activation energy is suggested while the database contains a negative activation energy? (Thesis link: https://dspace.mit.edu/handle/1721.1/91059 - page 195)

Entry in the database (same applies for the entry above with ROOH_pri):

entry(
    index = 551,
    label = "ROOH_sec;C_rad/H/NonDeC",
    kinetics = ArrheniusEP(
        A = (2.51e-11, 'cm^3/(mol*s)'),
        n = 6.77,
        alpha = 0,
        E0 = (-8.6, 'kcal/mol'),
        Tmin = (500, 'K'),
        Tmax = (1000, 'K'),
    ),
    rank = 3,
    shortDesc = u"""[AJ]CBS-QB3 calculations with 1DHR corrections, reverse rates computed using DFT_QCI_thermo""",
)

And in Jalan's Thesis (page 195) I find:

Table 6.1: Predicted Arrhenius parameters (Units: mol, cm 3 , sec, kcal/mol) for
CH3OO* + RH* H-abstraction reactions with R=CH3CH2CH2_, CH3C_(OOH)CH3,
CH3COCH2_, CH3CH2C_O.

Reaction A n Ea
CH3CH2CH3 -> CH300H + CH3CH_CH3 1.3 x10-8 6.2 8.5
CH3CH(OOH)CH 3 -> CH3C(O)CH3 + *OH + CH300H 1.1 x10-13 7.2 2.9
CH3C(O)CH3 -> CH300H + *CH2C(O)CH3 4.4 x10-12 7.1 8.9
CH3CH2CHO -> CH300H + CH3CH2C_O 1.9 x10-3 4.5 4.4

The units for A are sec- 1 for unimolecular reactions and in cm 3 mol-1 sec-1 for
bimolecular reactions, with Ea in kcal/mol. The rate coefficient is
k = A(T/1[K])" exp(-Ea/RT).

importOldDatabase.py does not import individual family forbidden groups

The overall forbidden groups are imported, but not the ones pertaining just to specific families. I'm not even sure this functionality exists yet in Py.

Logic Nodes with "OR"

If we have for example a group name containing "OR" inside of an "OR" logic node:

group = "OR{ORing1, ORing2}

RMG interprets the second OR as a logic OR, even though there are no curly braces next to it. This can easily be fixed by changing the group names but should there be a better checking method to make sure this is not happening?

import/export: kinetics library default units

Setting the default units as cm3 instead of m3 when exporting RMG-Java style kinetics libraries would make comparison easier with the published chemkin files from which these reaction libraries came.

Creating new training reaction through website does not apply starred atoms

The starred atoms for reaction training reactions seem to work for for families but not for others. For instance, in intra_H_migration, it does not apply starred atoms. For some other families, such as HO2_Elimination_from_PeroxyRadical, the starring of the reactants/products does occur. When there are no starred atoms for species in the training reaction, it generally produces an UndeterminableKineticsError.

We need to make sure starred atoms always appear when saving the training reaction, or allow for the fact that they aren't there. The user should only have to insert the kinetics and comments for this to work.

Diels-Alder reactions of cyclopentene and 1,4-pentadiene are missing

These should react via Diels-Alder to form JP-10 at the 2,4 positions of pentadiene.
This is what the RMG website currently gives.

NetworkError: Unexpected type of path reaction (specifically related to Cdd_Od-N3d in R_Addition_MultipleBond)

I got the following trace (also related to RMGPy 577):

Traceback (most recent call last):
  File "C:\Code\RMG-Py/rmg.py", line 152, in <module>
    rmg.execute(**kwargs)
  File "C:\Code\RMG-Py\rmgpy\rmg\main.py", line 603, in execute
    self.reactionModel.enlarge(objectToEnlarge)
  File "C:\Code\RMG-Py\rmgpy\rmg\model.py", line 675, in enlarge
    self.updateUnimolecularReactionNetworks()
  File "C:\Code\RMG-Py\rmgpy\rmg\model.py", line 1471, in updateUnimolecularReactionNetworks
    network.update(self, self.pressureDependence)
  File "C:\Code\RMG-Py\rmgpy\rmg\pdep.py", line 561, in update
    K = self.calculateRateCoefficients(Tlist, Plist, method)
  File "C:\Code\RMG-Py\rmgpy\pdep\network.py", line 210, in calculateRateCoefficients
    self.setConditions(T, P)
  File "C:\Code\RMG-Py\rmgpy\pdep\network.py", line 317, in setConditions
    self.calculateMicrocanonicalRates()
  File "C:\Code\RMG-Py\rmgpy\pdep\network.py", line 569, in calculateMicrocanonicalRates
    raise NetworkError('Unexpected type of path reaction "{0}"'.format(rxn))
rmgpy.pdep.network.NetworkError: Unexpected type of path reaction "HNCO(74) + H(5) <=> C(=N)[O](180)"

The reaction is N=C=O + H <=> N=C[O], and the reaction site here should be Cdd_Od-N3d (no such group is yet defined under the R_Add_MulBond family, though). While this group should be added, it currently falls under the more general group Cdd_Od, but no rate rules are defined for the latter.

import/export: save ratelibrary.txt comments to rules.py?

The rules.py file has a longDesc field which could store comments but is currently empty. We could put unparseable lines (mostly comments) from the ratelibrary.txt files in here when we import, so they don't get lost.

Incorrect parent-child relationships in Solvation abraham groups

Using the new database unit tests, I've found some mismatched parent-child relationships in the solvation abraham groups:

In abraham group, node N3sH-N5ring is not a proper parent of its child N3sH-pyrrole.
In abraham group, node N3s-noH-N5ring is not a proper parent of its child N3s-noH-pyrrole.
In abraham group, node N3s-noH is not a proper parent of its child NO2.
In abraham group, node Sds is not a proper parent of its child SdsOsOdOd.

@bslakman maybe you could take a look quickly and fix them? I am a bit unsure on how to fix since I don't know the original source.

import/export: collision efficiency formatting

The e formatting loses some precision and makes it harder to read. Could we use g?
Also, there's trailing whitespace, lots of spaces, and some pretty long lines (one is 161 characters long).

Fatal error with explicit valence for nitrogen

We're trying to build a mechanism for 2-nitrodiphenylamine:
species(
label='2NDPA',
reactive=True,
structure=adjacencyList(
"""
1 O u0 p3 c-1 {2,S}
2 N u0 p0 c+1 {1,S} {3,D} {4,S}
3 O u0 p2 c0 {2,D}
4 C u0 p0 c0 {2,S} {5,D} {9,S}
5 C u0 p0 c0 {4,D} {6,S} {17,S}
6 C u0 p0 c0 {5,S} {7,D} {18,S}
7 C u0 p0 c0 {6,D} {8,S} {19,S}
8 C u0 p0 c0 {7,S} {9,D} {20,S}
9 C u0 p0 c0 {4,S} {8,D} {10,S}
10 N u0 p1 c0 {9,S} {11,S} {21,S}
11 C u0 p0 c0 {10,S} {12,D} {16,S}
12 C u0 p0 c0 {11,D} {13,S} {22,S}
13 C u0 p0 c0 {12,S} {14,D} {23,S}
14 C u0 p0 c0 {13,D} {15,S} {24,S}
15 C u0 p0 c0 {14,S} {16,D} {25,S}
16 C u0 p0 c0 {11,S} {15,D} {26,S}
17 H u0 p0 c0 {5,S}
18 H u0 p0 c0 {6,S}
19 H u0 p0 c0 {7,S}
20 H u0 p0 c0 {8,S}
21 H u0 p0 c0 {10,S}
22 H u0 p0 c0 {12,S}
23 H u0 p0 c0 {13,S}
24 H u0 p0 c0 {14,S}
25 H u0 p0 c0 {15,S}
26 H u0 p0 c0 {16,S}
"""),
)
Once we comment out the forbidden structure "N_birad_singlet_2singleBonds", RMG-Py starts but then crashes with:

[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 6 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 12 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 12 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 12 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 12 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 10 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 10 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 10 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 10 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 12 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 12 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N greater than permitted
[15:01:43] Explicit valence for atom # 13 N, 4, is greater than permitted
[15:01:43] Explicit valence for atom # 13 N greater than permitted
Traceback (most recent call last):
File "../../../rmg.py", line 165, in
rmg.execute(inputFile, output_dir, **kwargs)
File "/Users/etierney/RMG-Py/rmgpy/rmg/main.py", line 541, in execute
bimolecularReact=self.bimolecularReact)
File "/Users/etierney/RMG-Py/rmgpy/rmg/model.py", line 748, in enlarge
self.processNewReactions(self.react(database, self.core.species[i]), self.core.species[i], None)
File "/Users/etierney/RMG-Py/rmgpy/rmg/model.py", line 649, in react
reactionList.extend(database.kinetics.generateReactionsFromFamilies([moleculeA], products=None, only_families=only_families))
File "/Users/etierney/RMG-Py/rmgpy/data/kinetics/database.py", line 419, in generateReactionsFromFamilies
reactionList.extend(family.generateReactions(reactants))
File "/Users/etierney/RMG-Py/rmgpy/data/kinetics/family.py", line 1244, in generateReactions
reactionList.extend(self.__generateReactions(reactants, forward=False))
File "/Users/etierney/RMG-Py/rmgpy/data/kinetics/family.py", line 1414, in __generateReactions
products0 = [product.generateResonanceIsomers() for product in products0]
File "rmgpy/molecule/molecule.py", line 1463, in rmgpy.molecule.molecule.Molecule.generateResonanceIsomers (build/pyrex/rmgpy/molecule/molecule.c:27204)
File "rmgpy/molecule/molecule.py", line 1464, in rmgpy.molecule.molecule.Molecule.generateResonanceIsomers (build/pyrex/rmgpy/molecule/molecule.c:27141)
File "rmgpy/molecule/resonance.py", line 10, in rmgpy.molecule.resonance.generateResonanceIsomers (build/pyrex/rmgpy/molecule/resonance.c:2463)
File "rmgpy/molecule/resonance.py", line 26, in rmgpy.molecule.resonance.generateResonanceIsomers (build/pyrex/rmgpy/molecule/resonance.c:2255)
File "rmgpy/molecule/resonance.py", line 254, in rmgpy.molecule.resonance.generateKekulizedResonanceIsomers (build/pyrex/rmgpy/molecule/resonance.c:5615)
File "rmgpy/molecule/resonance.py", line 267, in rmgpy.molecule.resonance.generateKekulizedResonanceIsomers (build/pyrex/rmgpy/molecule/resonance.c:5446)
File "rmgpy/molecule/generator.py", line 288, in rmgpy.molecule.generator.toRDKitMol (build/pyrex/rmgpy/molecule/generator.c:6645)
File "rmgpy/molecule/generator.py", line 328, in rmgpy.molecule.generator.toRDKitMol (build/pyrex/rmgpy/molecule/generator.c:6387)
ValueError: Sanitization error: Explicit valence for atom # 13 N greater than permitted

Since the atom number for the Nitrogen changes, I wonder if it happens during a check for resonance structures? Either way, how do we fix it?

Intra_R_Add_ExoTetCyclic family rules.py is empty

https://github.com/GreenGroup/RMG-database/blob/master/input/kinetics/families/Intra_R_Add_ExoTetCyclic/rules.py contains no data. It looks like a copy error of the groups.

Errors now show up where reactions in this family's kinetics cannot be determined. Can you also check the other 2 new families to make sure all the rules are in rules.py?

Some molecules need multiple adjlists in NIST/dictionary.txt

Am I correct in thinking the label for each entry in NIST/reactons.py is used to determine the species in the reaction, which we get from NIST/dictionary.txt? If so, then for some species we might need to repeat some adjlists with different labeled atoms. If this is only used for connectivity, shouldn't we get rid of the atom labels?

For example, looking at the H_Abstraction data https://github.com/GreenGroup/RMG-database/tree/master/input/kinetics/families/H_Abstraction/NIST.
HO2 has an adjlist of:

HO2
1    O 0 2 {2,S} {3,S}
2 *3 O 1 2 {1,S}
3    H 0 0 {1,S}

This would be the abstracting radical in the reaction, as it is in entry 247:
label = "H2 + HO2 <=> H2O2 + H"

If it is not the abstracting radical, this adjlist is incorrect, as in the case of entry 264:
label = "HO2 + H <=> H2 + O2"

In this case the adjlist should be:

1 *1 O 0 2 {2,S} {3,S}
2    O 1 2 {1,S}
3 *2 H 0 0 {1,S}

A possible solution would be to add the second adjlist and give it another label in NIST/dictionary.txt. Then in NIST/reactions.py, change the cases where HO2 is not an abstracting radical to the new dictionary label.

Removed database entries

I'm writing this issue as a place to store entries removed from RMG-database. Sometimes changes in code or questionable sources may make us want to remove some (hopefully temporarily). Putting them here, prevents them from being lost forever, especially if they are unpublished or unrecorded elsewhere.

The following entry was removed because we recently changed out Nitrogen atomtypes to be more specific. Previously it was 'swept under the rug', probably falling under some atomtype that was not intended. In the current working version of NAtomType branch, there is no atomtype to describe the negative 2 charged nitrogen:

~~from thermo_DFT_CCSDTF12_BAC thermo library:~~

entry(
index = 434,
label = "NNH2(S)",
molecule =
"""
1 N u0 p0 c+2 {2,S} {3,S} {4,S}
2 H u0 p0 c0 {1,S}
3 H u0 p0 c0 {1,S}
4 N u0 p3 c-2 {1,S}
""",
thermo = ThermoData(
Tdata = ([300,400,500,600,800,1000,1500],'K'),
Cpdata = ([8.61,9.45,10.46,11.48,13.24,14.63,16.86],'cal/(molK)'),
H298 = (71.66,'kcal/mol'),
S298 = (52.09,'cal/(molK)'),
),
shortDesc = u"""""",
longDesc =
u"""
level of theory: CCSD(T)F12A/cc-pVTZ-F12//B3LYP/6-311++g(d,p) + BAC
""",
)

Edit: The above entry is now re-implemented with #163.

PrIMe database assumes T0=298K as default?

Many of the reactions that I have corrected in the PrIMe database (by comparison with the NIST database) have a T0=(1,"K") whereas the A factor is actually such that they should have T0=(298,"K"). I wonder if this is true of all modified Arrhenius expressions in PrIMe, and we should just change the import script and import them all again?

I still haven't figured out where the Ea numbers reported in units of "K" come from - I can't figure out what combination of mistakes would give those numbers - but that's a separate issue.

H abstraction by :CO much too slow?

Hydrogen abstraction by carbon monoxide, which in RMG is a bi-radical, seems too slow.
The reverse reaction is abstraction of the H from [CH]=O.
I'm not clear which direction is used to predict the rate, but the end result is ~13 orders of magnitude slower than in USC-Mech ii., eg

USC-Mech:

iC3H7+HCO=C3H8+CO 1.200e+14 0.000 0.000`

T/[K]                 500   1000    1500    2000
log10(k/[mole,m,s]) +8.1    +8.1    +8.1    +8.1

RMG:

! HCO(105) + iC3H7(21545) <=> CO(74) + C3H8(79)
! Reaction index: Chemkin #527; RMG #56434
!! Template reaction: H_Abstraction [Xrad_H,InChI=1/C3H7/c1-3-2/h3H,1-2H3]
! Flux pairs: S(21545), C3H8(79); HCO(105), CO(74)
! Kinetics comments:
!!   H_Abstraction estimate: [InChI=1/C3H8/c1-3-2/h3H2,1-2H3,Y_1centerbirad]
HCO(105)+S(21545)=CO(74)+C3H8(79)                   1.000e+05 0.000     10.000   

T/[K]                500    1000    1500    2000
log10(k/[mole,m,s]) -5.4    -3.2    -2.5    -2.1

Universal Database for both RMG-Py and RMG-Java

We just discussed (again) forming a universal database structure for both RMG-Py and RMG-Java. Here are my notes from the videoconference. Add discussion and development notes below...

Universal Database

human editable and readable
- look more like a spreadsheet - eg. one line per reaction.
- minimize boilerplate
- excel sheets? csv?
- benefit of plan text
  - editable using emacs over ssh
  - works with git
should store
- dictionary in new 4-column adjacency list with explicit hydrogens
- comments
- uncertainties
- confidences
- libraries
- rules
- groups
- depositories (actual reactions, with labeled atoms)
exportable to Java
- remove N and fourth column
- or modify Java so it can parse the universal database
exportable to Python
- use "import to database" script,
- or make it read the universal database
- modify the .load() methods
Instead of exporting, Make both read the universal database
- Java could ignore the extra column in the adjacency list, and all N atom types
- Benefit: people won't be tempted to edit the exported database!
- Should be designed so new features are backwards compatible,
  - perhaps programs ignore stuff they don't understand
  - eg. temperature-dependent viscosity parameters in solvent database.
  - transition state estimate databases
  - new atom types
- Possible drawback
  - risk slowing development of new features (barrier: have to implement in both)
  - or keep breaking RMG-Java

importOldDatabase.py cannot import old transport groups

TransportDatabase currently has no functional loadOld() function.

We might want to write this function in the future for completeness sake.

import/export: primaryAbrahamLibrary

This imported as garbage (because RMG-Py currently doesn't know anything about Abraham's model) but luckily enough the exported version looks quite like the original. Perhaps we should exclude it from the import/export until we handle it properly.

Exporting to Java syntax messes up uncertainty units

Import this:

533.    C_methane   C2b                 294-376     7.5e12  0.0  0  1.05       1.6e12  0.0 0   0.12 4   Matsugi et al 10.1021/jp1012494

and you get this

    kinetics = ArrheniusEP(
        A = (7500000000000.0, 'cm^3/(mol*s)', '+|-', 1600000000000.0),
        n = 0,
        alpha = 0,
        E0 = (1.05, 'kcal/mol', '+|-', 0.12),
        Tmin = (294, 'K'),
        Tmax = (376, 'K'),
    ),

which becomes this

533   C_methane    C2b   294-376  7.50e+12  0.00  0.00  1.05 1.6e+18  0  0 2.86807e-05   4   Matsugi et al 10.1021/jp1012494

when you export it again.
The uncertainty on A is six orders of magnitude larger (m3/cm3), and on Ea is 4184 times smaller (J/kCal).

Labelling bugs with importChemkinLibrary.py

I recently tried to use the importChemkinLibrary script on a merged mechanism and noticed a couple bugs that made it so the user couldn't use the libraries without making modifications:

For kinetics library:

dictionary.txt saved different names from reactions.py. The dictionary removed the parenthetical indicies, but reactions.py did not (e.g. in reactions.py labels CH3(1)+CH3(1) <=> C2H6(2), whereas dictionary.txt has only CH3 and C2H6).
Does not catch if species name had an "=", which causes parsing errors in kinetics library

For thermo library:
RMG needs thermo values at 298K, but the script faithfully copies over temperature ranges that are in the chemkin file. We probably should have it throw an error, instead of changing the temperature range.

SubstitutionS radical tree not sufficiently exhaustive

The SubstitutionS kinetics family consists of RSR reacting with a radical species. It is currently turned off by default, but I've been using it for the model I've been working on with sulfur.

Based on the comments in the Java database, it seems that the radical tree is based off of the radical tree in H_Abstraction. However, while the H_Abstraction tree has grown, the SubstitutionS tree has not.

The two cases I've encountered were only caught because a reaction was found, but RMG was unable to find the reverse reaction because a species did not fit anywhere in the tree. I'm not sure what happens if a reactant can't be matched to a node to begin with. Does RMG just assume that it can't react in that particular family and move on?

Case 1: [S] (triplet)

Similar to [O] (triplet), which does have a node in the tree (O_atom_triplet under Y_1centerbirad)
Does not match anything because Y_1centerbirad only allows Cs, Cd, O
Could be easily fixed by adding more atom types under Y_1centerbirad (H_Abstraction has Cs, Cd, CO, CS, O, S, N)

Case 2: C=C(C)[CH][S]

Would fall under Y_rad in the H_Abstraction tree, nothing more specific applies
Does not match anything because there is no Y_rad group in SubstitutionS
Could be fixed by adding Y_rad group and moving some of the existing head nodes (HJ, CJ, SJ, O_rad) under it

I think the SubstitutionS radical tree should definitely be expanded/restructured. However, I'm not sure what would be the optimal way in terms of the chemistry. In theory, it could be an exact copy of the H_Abstraction radical tree, but there might be some groups that we don't want to react in SubstitutionS.

groups written with Cd=Sd not detectable

With the changes in atomType detection from this RMG-Py commit, Cd=Sd is no longer a valid way to write groups. The carbon will always be parsed as a CS, and never as a Cd. This has always been true for CO as well.

Also Cdd is a bit tricky too. Sd=Cdd=Sd or Sd=C=Sd are valid, but Sd=CS=Sd is not.

More generally, we have a problem where some groups are inaccessible, but we don't know it. I think we should write the following unit test:

For every group object

Convert Group object to simplest Molecule object
Descend the Tree searching for Molecule
Verify that original Group is found

treatment of multiple (conflicting) rules needs fixing

In RMG-java it looks at the "rank" column (I think) and/or chooses the first (or last?). Perhaps a design decision is needed here.

We should have library/depository values for statistical mechanics (vibrational frequencies)

We are currently estimating vibrational frequencies (in order to get densities of states for the PDep Master Equation calculations) for all sorts of small molecules, for which we should use the known values from high level calculations (eg. those used to generate the thermo).

Group tree and rate rule templates must have matching order in kinetic families

As I found in 6ded3d8, the order in which the groups are listed in a reaction template matters, as it must match that of the tree's L1 groups. Example:

For the kinetic family H_Abstraction, the top nodes are X_H_or_Xrad_H_Xbirad_H_Xtrirad_H and Y_rad_birad_trirad_quadrad. The order they are listed in the group tree is

L1:  X_H_or_Xrad_H_Xbirad_H_Xtrirad_H
      L2: .....
L1: Y_rad_birad_trirad_quadrad
      L2: ....

And all the rate rules must be defined like so:

label = "X_H_or_Xrad_H_Xbirad_H_Xtrirad_H;Y_rad_birad_trirad_quadrad"

label = "Subgroup of X_H_or_Xrad_H_Xbirad_H_Xtrirad_H; Subgroup of Y_rad_birad_trirad_quadrad"

I found that the reversed order rate rules defined with the Y_rad subgroup first and then the X_H subgroup are simply not used. When I reversed the order of the L1 nodes in the tree, we still get a working minimal example in RMG-Py, but we get a completely different model, and an edge that is about 300 reactions smaller. This indicates to me that this is a hidden error where simply the out of order rate rules are being ignored. In other words, we need to fix this error and make a unit test to ensure it does not continue occurring.

There are 2 ways to fix:

Algorithmically change RMG-Py to recognize out of order rate rule templates
Order them correctly in the raw database

In this situation I advocate for solution 2, because it is more transparent to the user where the groups come from when browsing the database.

reactionmechanismgenerator / rmg-database Goto Github PK

rmg-database's People

Contributors

Stargazers

Watchers

Forkers

rmg-database's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs