cclib / cclib Goto Github PK

View Code? Open in Web Editor NEW

319.0 20.0 162.0 75.38 MB

Parsers and algorithms for computational chemistry logfiles

Home Page: https://cclib.github.io/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.07% Python 78.24% DIGITAL Command Language 1.57% Arc 0.83% TeX 1.32% C++ 0.95% Roff 16.89% Nix 0.13%

python computational-chemistry quantum-chemistry hacktoberfest

cclib's Introduction

cclib

IMPORTANT for upcoming 2.0 release We are preparing for the 2.0 release now that 1.8.1 is done. Although most of the new features are on the unstable main branch, we will now be making some breaking changes to the default master branch. See #1395 for more information.

If you choose to follow main, we reserve the right to rewrite history until the final v2.0 tag is created, after which main will replace master as the default branch.
We do not expect to make any further tagged or versioned releases on the master branch.
This message will disappear when the final release of 2.0, after any alphas/release candidates/etc. is made.

cclib is a Python library that provides parsers for output files of computational chemistry packages. It also provides a platform for computational chemists to implement algorithms in a platform-independent way.

For more information, go to https://cclib.github.io. There is a mailing list for questions at https://groups.google.com/g/cclib.

cclib's People

Contributors

Stargazers

Watchers

Forkers

langner berquist atenderholt ghutchis mattbernst czajkowska clyde-fare avirshup ben-albrecht chrisjsewell mwykes martinp23 jchodera andersx keceli joanwa yidapa binpeng bwang2453 elekezem schamnad solccp cstein rcrehuet jaimergp fruttodelmondo srtlg ccjalal sandeepkrjha klemensnoga invalidpointer dennissheberla nitish6174 gaursagar saisankargochhayat zli37 xymaxim psychodevil richardjgowers chemistry-scripts khatiaxomiya mcocdawc chayast europj baoilleach joe83830 pikulsomesh adabbott mithlesh4257 alokkr01 pasenor szl0072 mrauha aspirincode maxscheurer spearous0001 nsm120 kunalsharma05 yishutu grzegorzmazur masker-li alesgenova shivupa agamat peter-reinholdt abotiamnot bhavaygg ayushgupta6598 amandadumi shibalik fdroessler the-grayson-group micaela-matta amarkpayne kuriba eimrek cks-coil plin1112 jamesetsmith pandaxtc markperri kuustudio pstjohn shijunang stair-lab-cit aalexmmaldonado marcin-witkowski jjgoings dpadula85 mscho527 ezpzbz gomarusai mkatouda pvrt-research khadev balticpinguin spkorhonen simonaxelrod oliver-s-lee jcerezochem

cclib's Issues

relaxed scans

I'm looking to handle relaxed potential energy scans, starting with Gaussian. These jobs are started with the Opt=ModRedundant keyword.

However, because the optfinished flag is set after the first scan point is finished, the remaining coordinates are not parsed. I've also had requests to handle IRC calculations, so it probably affects those as well.

Is there any opposition to changing the behavior of parsing coordinates? Note that geovalues contains convergence information for each point in the scan.

My proposal is change optfinished to a list that contains the index of atom coords corresponding to the structure that is optimized. This list could be populated as the file is parsed, or in an after_parsing step.

Thoughts?

Make scfvalues available from non-#P Gaussian files

(Moved from https://sourceforge.net/p/cclib/feature-requests/5/)

Created: 2009-12-27
Creator: xaverxn

So far, cclib can extract certain information only from gaussian files with additional output activated (#P keyword). As you can see from my last bug report, this is true for the number of steps needed for SCF convergence (but maybe others, too). As this information is available from non-#P logs as well, I'd like to be able to get it from cclib, too. Of course the content of 'scfvalues' can't be extracted. If you don't want to change it's content structure, you could simply create as many empty lists as steps needed, so len(data.scfvalues) gives the right number for #P and non-#P logs.

Update unit tests for Jaguar

I contacted Schrodinger for a trial license for this purpose.

Move content from Sourceforge

Creating this as a follow up to #11, since the content is going into this repository.

Gaussian #t files fail to parse

(Originally mentioned by me in #16 ).

When parsing mocoeffs, it fails because nmo hasn't been set. Any thoughts on how this should be handled? I see two possibilities:

If nmo isn't present, log a warning and skip (i.e. return).
Assume nmo == nbasis.

I'm leaning towards Option 1 since the assumption in Option 2 might produce errors in the future if nmo != nbasis, which we've seen. I'd rather require #n or #p calculations for the mocoeff attribute.

Version string in init.py needs updating

It currently says 1.1.

ORCA: difference between electric/velocity transition dipole moments

The issue of whether we should parse both types and which is the more standard one came up in #44. I believe the electric (length gauge) numbers are usually the ones claculated (and the more reliable ones), but I'm creating this issue to confirm that.

I asked a question on the ORCA forum about this: https://cec.mpg.de/forum/viewtopic.php?f=8&t=910 (need to be logged in to see it)

Move content from Sourceforge

After moving the code and data itself, we should also eventually move the remaining content. That's the intent of this ticket.

I think there are several things:

open tickets: I've started to move them manually (some may fit better into cclib-data)
closed tickets: do we care about these?
the wiki: eventually we robably want to move this content, too; some pages, such as the parsed attributes list, could be easily regenerated automatically then

TURBOMOLE and Spartan'08 formats

(moved manually from https://sourceforge.net/p/cclib/feature-requests/3/)

(this is a feature request from user jbaltrus in 2009)

I would like to request implementation for Turbomole and Spartan'08 files. I can upload output files if needed

Many unit tests fail for Molpro 2012

After upgrading the unit test files to Molpro 2012 (#37), a singiifcatnyl larger number of tests fail. Marking this for 1.3 so that we can focus on the release, though.

Gaussian03/borane-opt.log regression

I've looked into this regression failure. The nmo attribute changes over the course of the optimization, so the assertions that nmo == self.nmo fail.

I see two fixes:

Don't use the assert statement for optimizations.
Keep track of nmo at each step in the optimization, and only assert for the current step.

Fix 1 seems most straight-forward, but fix 2 gives better error checking.

Thoughts?

From old progress: improve help(...)

Taken from http://cclib.sourceforge.net/wiki/index.php/Progress:

help(cclib) doesn't say anything useful, AFAIK.
help(myparser) doesn't say anything useful, AFAIK.

Support for gfprint

There are two sp_basis unit tests for Gaussian, one uses GFINPUT and the other GFPRINT. We currently only support the first output. Perhaps it would not be so much work to support the second also.

If we do not want to do that, it would be good to make the GFPRINT version into a regression and perhaps merge the sp_basis logfile into sp.

Switch to atomic units

We might have had this discussion on the mailing list before, but it would be nice to be able to get all attributes in alternative units. Mostly atomic units for me (hartress, etc), since I find myself constantly converting to those.

It would be useful both in the parsers and ccopen/ccget.

ORCA: len(atomcoords) != len(geovalues)

In working on pull #29, I've discovered a potential bug with ORCA.

It seems that ORCA adds an extra structure at the end of an optimization so that the length of geovalues is one less than the length of atomcoords. For the dvb_gopt file:

>>> data.geovalues.shape
(21, 5)
>>> data.atomcoords.shape
(22, 20, 3)

Any idea what the last structure is, or the best way to handle this? It doesn't appear that data.atomcoords[-1] is a rotated version of data.atomcoords[-2].

Review of attributes used by bridges (was: From old progress: bridge status of charge and multiplicity)

(From http://cclib.sourceforge.net/wiki/index.php/Progress)

attributes for charge and multiplicity were recently added, see how they fit in with passing objects to OpenBabel or PyQuante

Also could check for newer attributes.

The way Jaguar prints basis set information...

The new Jaguar logfiles for 8.3 print basis sets, but some of the coefficients given for STO-3G crash the unit test.

Link to line in Jaguar output with the basis set for hydrogen:
https://github.com/cclib/cclib/blob/master/data/Jaguar/basicJaguar8.3/dvb_sp_hf.out#L144

The exponents for hydrogen are OK, but the coefficients are different than the standard values. Notice how the last primitive has a coefficient of 1.0. That leads me to think this is actually two contractions, but that wouldn't make much sense.

Fix double counting of MBOs for restricted calculations

The Mayer bond order method appears to give values twice the expected amount for restricted calculations. This is not a problem with unrestricted calculations.

dipole moments (and/or higher multipoles)

I can think of many reasons it would be great to parse dipole moment vectors and/or higher multipoles (via numpy perhaps?)

I can help implement some of the parsing for Gaussian, GAMESS-US, etc.

Future of automatic SVN revision assignment

Most files have this:

__revision__ = "$Revision$"

After to moving to git, this will not work. Should we just throw it away, or replace with something git-friendly?

Add unit test for Firefly and retire PC-GAMESS

I now added outputs for unit tests from Firefly 8.0.1 (d61da18 which is included in PR #111). I think that means we can retire the PC-GAMESS unit tests and make them regressions, like we did with WinGAMES a couple of years ago. Any objections?

Make sure optdone is emtpy but set when geometry is not converged

So we've turned optdone into a list, and I've made sure it is not set by default, because it is not appropriate for most jobs (such as SP). Acutally added a test for this in 80e27d7.

It would be nice to make sure, however, that it is set to an empty list for geometry optimizations that do not converge. Perhaps a regression or two would be a nice way to check this.

Unit tests for methods

The methods are not currently tested systematically and it is becoming apparent the code would benefit from that (start from #60 and #67).

Support for parsing multiple files

(moved manually from Sourceforge feautre request 1)

Many computational chemistry programs print output across multiple files (most notably GAMESS, Molpro, Turbomole). Parsing more than one can provide better data (more precision) and in some cases is necessary. Concatenating files is a solution, but it should also be possible to pass multipole file names to parsers and utility scripts.

Many ORCA unit tests do not pass

Many ORCA unit tests still do not pass, for both 2.9 and 3.0, as discussed in #37. In particular, the raman and IR values are sometimes quite different than expected.

Better error handling in CDA module

(moved from Source feature request 2, https://sourceforge.net/p/cclib/feature-requests/2/)

Check to see if atomnos from fragments are the same as in the entire molecule before proceeding. If not, throw an appropriate error.

Next, check for atomcoords (I think it does this already) and finally the number of basis functions. Exit sanely.

Review of unit tests

Many of the unit tests were updated for 1.2b, and it would be worthwhile to quickly check whether they can be made more comparable with little effort. I'm thinking mainly about checking whether input coordinates and basis sets are the same everywhere. For example, the TD unit tests for GAMESS-US2012 use 6-31G, but STO-3G is used most of the time for the dvb* series.

There are also things of the sort we discussed in #44, which may or not be caused by similar differences in the input.

Add optdone attribute to remaining parsers

Seems like only Gaussian has it right now, so add it for the remaining.

progress information appears to be broken

The following no longer works:

import sys
from cclib.parser import ccopen
from cclib.progress import TextProgress

progress = TextProgress()
parser = ccopen(sys.argv[1], progress)
data = parser.parse(cupdate=1.0)

Output:

adam@mqair:~/cclib/src/cclib/parser$ python3 test_gaussian.py dvb_gopt.log
Traceback (most recent call last):
  File "test_gaussian.py", line 9, in <module>
    data = parser.parse(cupdate=1.0)
  File "/usr/local/lib/python3.3/site-packages/cclib/parser/logfileparser.py", line 223, in parse
    self.updateprogress(inputfile, "Unsupported information", cupdate)
  File "/usr/local/lib/python3.3/site-packages/cclib/parser/logfileparser.py", line 287, in updateprogress
    newstep = inputfile.tell()
OSError: telling position disabled by next() call

It's somewhat related to the io.open call introduced with pull #34. Removing it fixes the problem for Python2. However, calling open with Python3 has the same problem because open is an alias for io.open.

Before Python3, were we calling inputfile.next() or inputfile.readline()? What was the rational to changing to next(inputfile)?

I think to fix this problem, we need to change all instances of next(inputfile) to inputfile.readline().decode().

Update shebang in scripts?

Shall we update the shebangs in the script to use python3? Currently they are:
#!/usr/bin/env python

regression.py no longer doing specific tests

There are 41 specific regression tests (e.g. testADF_basicADF2004_01_dvb_sp_c_adfout) that don't appear to be called anymore. It's not obvious to me why this is the case...

Support more types of archives, and multiple files

This is an afterthought of #12 which would be useful. Currently, if a zip archive is given to cclib with multiple files inside, it complains. It should be able to iterate over the files inside, though. This should also, in principle, work for tar archives.

One question to be settled is whether to treat the files inside in multifile mode by default. I would say no.

Molpro: parser crashes on some files when parsed alone

The Molpro parser needs bout the .out and .log file to parse some attributes, which is reflected in the unit tests. It should not crash, however, even if it cannot extract something from a file:

data/Molpro/basicMolpro2012$ ccget -l dvb_gopt.log
Attempting to parse dvb_gopt.log
cclib can parse the following attributes from dvb_gopt.log:
  aonames
  atombasis
  atomcoords
  atomnos
  charge
  coreelectrons
  gbasis
  geotargets
  homos
  mult
  natom
  nbasis
  nmo
  scftargets
  scfvalues

data/Molpro/basicMolpro2012$ ccget -l dvb_gopt.out
Attempting to parse dvb_gopt.out
Traceback (most recent call last):
  File "/usr/local/bin/ccget", line 119, in <module>
    main()
  File "/usr/local/bin/ccget", line 100, in main
    data = log.parse()
  File "/usr/local/lib/python3.2/dist-packages/cclib/parser/logfileparser.py", line 229, in parse
    self.extract(inputfile, line)
  File "/usr/local/lib/python3.2/dist-packages/cclib/parser/molproparser.py", line 392, in extract
    moenergy = float(line.split()[2])
ValueError: could not convert string to float: '='

Unicode errors should not stop parser

This was raised in #28 but I think it warrants a separate issue if not solved right away. I think the simplest way to handle this is to somehow ignore the unicode characters that trigger an error, so the parser cna continue. Generally the unicode will not be within the data we parse, anyway.

Unit tests for scripts

Discussion about this started with #53

GAMESS 2012 water_cis_det.out fails to parse

The parser crashes when attempting to parse optically inactive transitions:

adam@mqair:~/cclib/data/GAMESS/basicGAMESS-US2012$ ccget -l water_cis_dets.out 
Attempting to parse water_cis_dets.out
Traceback (most recent call last):
  File "/usr/local/bin/ccget", line 145, in <module>
    main()
  File "/usr/local/bin/ccget", line 126, in main
    data = log.parse()
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cclib/parser/logfileparser.py", line 248, in parse
    self.extract(inputfile, line)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cclib/parser/gamessparser.py", line 233, in extract
    statenumber = int(line.split()[-1])
ValueError: invalid literal for int() with base 10: 'INACTIVE,'

This issue was brought to my attention by Daniel R. Haney.

Update CDA script to handle multi-file logfiles

For logfiles that are split into multiple files (e.g. Turbomole), the CDA script will not work.

ADF: scftargets is not parsed correctly for 2013.01

This came up after updating the ADF unit tests with the 2013.01 output files (see #37). Basically, the numerical intergration in the SCF is done with a different method (fuzzy cells instead of Voronoi polyhedra), hence the output is different.

It is not obvious to me how to parse scftargets in this case, which is what this issue is for. We probably want to fix this only after merging #37 into master, though.

CML output

(Moved from https://sourceforge.net/p/cclib/feature-requests/6/)

Created: 2011-11-03
Creator: Karol M. Langner

It might be nice to be able to save parsed data in CML.

First release on github

I created a 'retro' release on github for 1.1, and I think it would be a good idea to plan the next release (first real one on github).

Also... with the tag got 1.1, can we delete the branch? That's what I read here: http://stackoverflow.com/questions/1307114/how-can-i-archive-git-branches

Any thoughts?

optdone vs. optfinished

I was looking at the Gaussian parser and found optdone and optfinished flags. Only optdone is listed in data.py. The only other parser with optfinished is the GAMESS parser.

Is there a reason for optdone instead of optfinished?

Add regression test with FULLSCF

See #39 for details.

Future of nmo, nbasis and similar attributes

Following #94 and similar bugs in the past, there was an idea to change nmo into an array, since it can change between SCF cycles as orbtials are dropped for various reasons. This issue is meant to collect our discussion on what to do with this.

Two aspects to consider:

As Noel pointed out, changing the API would infer a major version change to cclib 2.x
Changing nmo in this way suggests other attributes should also change that can behave similarly, including nbasis.

A list of lines from which info was extracted

(Moved from https://sourceforge.net/p/cclib/feature-requests/4/)

Created: 2009-12-27
Creator: xaverxn

Hi,
I'd like to have access to the the actual lines cclib extracted information from. They could simply be stored in a (python) list during parsing. That way cclib users would have a kind of 'concentrated' log file, with just the lines 'cclib deems most important'.

Release v1.2

This is means for discussing the v1.2 release, which is due on April 15th.

Things to get done by me:

update changelog (finished, please double check if I missed smthg)
update website and docs
update versions in code

Marking for v1.2

cclib / cclib Goto Github PK

cclib's Introduction

cclib

cclib's People

Contributors

Stargazers

Watchers

Forkers

cclib's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs