GithubHelp home page GithubHelp logo

cclib / cclib Goto Github PK

View Code? Open in Web Editor NEW
319.0 20.0 162.0 75.38 MB

Parsers and algorithms for computational chemistry logfiles

Home Page: https://cclib.github.io/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.07% Python 78.24% DIGITAL Command Language 1.57% Arc 0.83% TeX 1.32% C++ 0.95% Roff 16.89% Nix 0.13%
python computational-chemistry quantum-chemistry hacktoberfest

cclib's Introduction

cclib

IMPORTANT for upcoming 2.0 release We are preparing for the 2.0 release now that 1.8.1 is done. Although most of the new features are on the unstable main branch, we will now be making some breaking changes to the default master branch. See #1395 for more information.

  • If you choose to follow main, we reserve the right to rewrite history until the final v2.0 tag is created, after which main will replace master as the default branch.
  • We do not expect to make any further tagged or versioned releases on the master branch.
  • This message will disappear when the final release of 2.0, after any alphas/release candidates/etc. is made.

DOI PyPI version GitHub release build status license

cclib logo

cclib is a Python library that provides parsers for output files of computational chemistry packages. It also provides a platform for computational chemists to implement algorithms in a platform-independent way.

For more information, go to https://cclib.github.io. There is a mailing list for questions at https://groups.google.com/g/cclib.

cclib's People

Contributors

adabbott avatar alesgenova avatar amandadumi avatar andrew-s-rosen avatar atenderholt avatar baoilleach avatar berquist avatar bwang2453 avatar cks-coil avatar dependabot[bot] avatar eimrek avatar elliotfarrar avatar gaursagar avatar ghutchis avatar jaimergp avatar jevandezande avatar kunalsharma05 avatar langner avatar mcocdawc avatar migatt avatar mscho527 avatar oliver-s-lee avatar pre-commit-ci[bot] avatar schamnad avatar schneiderfelipe avatar sheepforce avatar shivupa avatar tdi-tenderholt avatar weronikazak avatar xymaxim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cclib's Issues

relaxed scans

I'm looking to handle relaxed potential energy scans, starting with Gaussian. These jobs are started with the Opt=ModRedundant keyword.

However, because the optfinished flag is set after the first scan point is finished, the remaining coordinates are not parsed. I've also had requests to handle IRC calculations, so it probably affects those as well.

Is there any opposition to changing the behavior of parsing coordinates? Note that geovalues contains convergence information for each point in the scan.

My proposal is change optfinished to a list that contains the index of atom coords corresponding to the structure that is optimized. This list could be populated as the file is parsed, or in an after_parsing step.

Thoughts?

Make scfvalues available from non-#P Gaussian files

(Moved from https://sourceforge.net/p/cclib/feature-requests/5/)

Created: 2009-12-27
Creator: xaverxn

So far, cclib can extract certain information only from gaussian files with additional output activated (#P keyword). As you can see from my last bug report, this is true for the number of steps needed for SCF convergence (but maybe others, too). As this information is available from non-#P logs as well, I'd like to be able to get it from cclib, too. Of course the content of 'scfvalues' can't be extracted. If you don't want to change it's content structure, you could simply create as many empty lists as steps needed, so len(data.scfvalues) gives the right number for #P and non-#P logs.

Gaussian #t files fail to parse

(Originally mentioned by me in #16 ).

When parsing mocoeffs, it fails because nmo hasn't been set. Any thoughts on how this should be handled? I see two possibilities:

  1. If nmo isn't present, log a warning and skip (i.e. return).
  2. Assume nmo == nbasis.

I'm leaning towards Option 1 since the assumption in Option 2 might produce errors in the future if nmo != nbasis, which we've seen. I'd rather require #n or #p calculations for the mocoeff attribute.

Move content from Sourceforge

After moving the code and data itself, we should also eventually move the remaining content. That's the intent of this ticket.

I think there are several things:

  1. open tickets: I've started to move them manually (some may fit better into cclib-data)
  2. closed tickets: do we care about these?
  3. the wiki: eventually we robably want to move this content, too; some pages, such as the parsed attributes list, could be easily regenerated automatically then

Many unit tests fail for Molpro 2012

After upgrading the unit test files to Molpro 2012 (#37), a singiifcatnyl larger number of tests fail. Marking this for 1.3 so that we can focus on the release, though.

Gaussian03/borane-opt.log regression

I've looked into this regression failure. The nmo attribute changes over the course of the optimization, so the assertions that nmo == self.nmo fail.

I see two fixes:

  1. Don't use the assert statement for optimizations.
  2. Keep track of nmo at each step in the optimization, and only assert for the current step.

Fix 1 seems most straight-forward, but fix 2 gives better error checking.

Thoughts?

Support for gfprint

There are two sp_basis unit tests for Gaussian, one uses GFINPUT and the other GFPRINT. We currently only support the first output. Perhaps it would not be so much work to support the second also.

If we do not want to do that, it would be good to make the GFPRINT version into a regression and perhaps merge the sp_basis logfile into sp.

Switch to atomic units

We might have had this discussion on the mailing list before, but it would be nice to be able to get all attributes in alternative units. Mostly atomic units for me (hartress, etc), since I find myself constantly converting to those.

It would be useful both in the parsers and ccopen/ccget.

ORCA: len(atomcoords) != len(geovalues)

In working on pull #29, I've discovered a potential bug with ORCA.

It seems that ORCA adds an extra structure at the end of an optimization so that the length of geovalues is one less than the length of atomcoords. For the dvb_gopt file:

>>> data.geovalues.shape
(21, 5)
>>> data.atomcoords.shape
(22, 20, 3)

Any idea what the last structure is, or the best way to handle this? It doesn't appear that data.atomcoords[-1] is a rotated version of data.atomcoords[-2].

The way Jaguar prints basis set information...

The new Jaguar logfiles for 8.3 print basis sets, but some of the coefficients given for STO-3G crash the unit test.

Link to line in Jaguar output with the basis set for hydrogen:
https://github.com/cclib/cclib/blob/master/data/Jaguar/basicJaguar8.3/dvb_sp_hf.out#L144

The exponents for hydrogen are OK, but the coefficients are different than the standard values. Notice how the last primitive has a coefficient of 1.0. That leads me to think this is actually two contractions, but that wouldn't make much sense.

dipole moments (and/or higher multipoles)

I can think of many reasons it would be great to parse dipole moment vectors and/or higher multipoles (via numpy perhaps?)

I can help implement some of the parsing for Gaussian, GAMESS-US, etc.

Add unit test for Firefly and retire PC-GAMESS

I now added outputs for unit tests from Firefly 8.0.1 (d61da18 which is included in PR #111). I think that means we can retire the PC-GAMESS unit tests and make them regressions, like we did with WinGAMES a couple of years ago. Any objections?

Make sure optdone is emtpy but set when geometry is not converged

So we've turned optdone into a list, and I've made sure it is not set by default, because it is not appropriate for most jobs (such as SP). Acutally added a test for this in 80e27d7.

It would be nice to make sure, however, that it is set to an empty list for geometry optimizations that do not converge. Perhaps a regression or two would be a nice way to check this.

Unit tests for methods

The methods are not currently tested systematically and it is becoming apparent the code would benefit from that (start from #60 and #67).

Support for parsing multiple files

(moved manually from Sourceforge feautre request 1)

Many computational chemistry programs print output across multiple files (most notably GAMESS, Molpro, Turbomole). Parsing more than one can provide better data (more precision) and in some cases is necessary. Concatenating files is a solution, but it should also be possible to pass multipole file names to parsers and utility scripts.

Many ORCA unit tests do not pass

Many ORCA unit tests still do not pass, for both 2.9 and 3.0, as discussed in #37. In particular, the raman and IR values are sometimes quite different than expected.

Review of unit tests

Many of the unit tests were updated for 1.2b, and it would be worthwhile to quickly check whether they can be made more comparable with little effort. I'm thinking mainly about checking whether input coordinates and basis sets are the same everywhere. For example, the TD unit tests for GAMESS-US2012 use 6-31G, but STO-3G is used most of the time for the dvb* series.

There are also things of the sort we discussed in #44, which may or not be caused by similar differences in the input.

progress information appears to be broken

The following no longer works:

import sys
from cclib.parser import ccopen
from cclib.progress import TextProgress

progress = TextProgress()
parser = ccopen(sys.argv[1], progress)
data = parser.parse(cupdate=1.0)

Output:

adam@mqair:~/cclib/src/cclib/parser$ python3 test_gaussian.py dvb_gopt.log
Traceback (most recent call last):
  File "test_gaussian.py", line 9, in <module>
    data = parser.parse(cupdate=1.0)
  File "/usr/local/lib/python3.3/site-packages/cclib/parser/logfileparser.py", line 223, in parse
    self.updateprogress(inputfile, "Unsupported information", cupdate)
  File "/usr/local/lib/python3.3/site-packages/cclib/parser/logfileparser.py", line 287, in updateprogress
    newstep = inputfile.tell()
OSError: telling position disabled by next() call

It's somewhat related to the io.open call introduced with pull #34. Removing it fixes the problem for Python2. However, calling open with Python3 has the same problem because open is an alias for io.open.

Before Python3, were we calling inputfile.next() or inputfile.readline()? What was the rational to changing to next(inputfile)?

I think to fix this problem, we need to change all instances of next(inputfile) to inputfile.readline().decode().

Support more types of archives, and multiple files

This is an afterthought of #12 which would be useful. Currently, if a zip archive is given to cclib with multiple files inside, it complains. It should be able to iterate over the files inside, though. This should also, in principle, work for tar archives.

One question to be settled is whether to treat the files inside in multifile mode by default. I would say no.

Molpro: parser crashes on some files when parsed alone

The Molpro parser needs bout the .out and .log file to parse some attributes, which is reflected in the unit tests. It should not crash, however, even if it cannot extract something from a file:

data/Molpro/basicMolpro2012$ ccget -l dvb_gopt.log
Attempting to parse dvb_gopt.log
cclib can parse the following attributes from dvb_gopt.log:
  aonames
  atombasis
  atomcoords
  atomnos
  charge
  coreelectrons
  gbasis
  geotargets
  homos
  mult
  natom
  nbasis
  nmo
  scftargets
  scfvalues

data/Molpro/basicMolpro2012$ ccget -l dvb_gopt.out
Attempting to parse dvb_gopt.out
Traceback (most recent call last):
  File "/usr/local/bin/ccget", line 119, in <module>
    main()
  File "/usr/local/bin/ccget", line 100, in main
    data = log.parse()
  File "/usr/local/lib/python3.2/dist-packages/cclib/parser/logfileparser.py", line 229, in parse
    self.extract(inputfile, line)
  File "/usr/local/lib/python3.2/dist-packages/cclib/parser/molproparser.py", line 392, in extract
    moenergy = float(line.split()[2])
ValueError: could not convert string to float: '='

Unicode errors should not stop parser

This was raised in #28 but I think it warrants a separate issue if not solved right away. I think the simplest way to handle this is to somehow ignore the unicode characters that trigger an error, so the parser cna continue. Generally the unicode will not be within the data we parse, anyway.

GAMESS 2012 water_cis_det.out fails to parse

The parser crashes when attempting to parse optically inactive transitions:

adam@mqair:~/cclib/data/GAMESS/basicGAMESS-US2012$ ccget -l water_cis_dets.out 
Attempting to parse water_cis_dets.out
Traceback (most recent call last):
  File "/usr/local/bin/ccget", line 145, in <module>
    main()
  File "/usr/local/bin/ccget", line 126, in main
    data = log.parse()
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cclib/parser/logfileparser.py", line 248, in parse
    self.extract(inputfile, line)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cclib/parser/gamessparser.py", line 233, in extract
    statenumber = int(line.split()[-1])
ValueError: invalid literal for int() with base 10: 'INACTIVE,'

This issue was brought to my attention by Daniel R. Haney.

ADF: scftargets is not parsed correctly for 2013.01

This came up after updating the ADF unit tests with the 2013.01 output files (see #37). Basically, the numerical intergration in the SCF is done with a different method (fuzzy cells instead of Voronoi polyhedra), hence the output is different.

It is not obvious to me how to parse scftargets in this case, which is what this issue is for. We probably want to fix this only after merging #37 into master, though.

optdone vs. optfinished

I was looking at the Gaussian parser and found optdone and optfinished flags. Only optdone is listed in data.py. The only other parser with optfinished is the GAMESS parser.

Is there a reason for optdone instead of optfinished?

Future of nmo, nbasis and similar attributes

Following #94 and similar bugs in the past, there was an idea to change nmo into an array, since it can change between SCF cycles as orbtials are dropped for various reasons. This issue is meant to collect our discussion on what to do with this.

Two aspects to consider:

  • As Noel pointed out, changing the API would infer a major version change to cclib 2.x
  • Changing nmo in this way suggests other attributes should also change that can behave similarly, including nbasis.

Release v1.2

This is means for discussing the v1.2 release, which is due on April 15th.

Things to get done by me:

  1. update changelog (finished, please double check if I missed smthg)
  2. update website and docs
  3. update versions in code

Type checks for data attributes

Since we do store the expected types of data attributes, it would be reasonable to do a check on the parsed attributes, either directly after parsing or in the course of unit tests.

Provide a 64 bit Windows installer

There is at least one person who has asked about a 64 bit Windows installer, so it would be worth the few extra minutes to organize one sometime.

Various fixes and additions from drhaney

This is meant as a placeholder for merging changes and additions from drhaney/cclib into the production code, in case no pull request are created for them in the future. The Changes are mostly for GAMESS-US, and were initially mentioned here in #66.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.