Comments (25)
How about this:
Set up two buckets: a) Staging bucket, b) Production bucket. All the newly added files are uploaded to Staging bucket, and people only have write permission to the Staging bucket but no read permission. After the file is uploaded (which they cannot read), it needs to be reviewed by us, then we move it t o the Production bucket if the file is good. And finally, everyone is able to read from the Production bucket (but only we have the write permission to the Production bucket).
Also, even for the write permission to Staging bucket, we could control - we only give write permission to vetted people.
https://cloud.google.com/storage/docs/access-control/iam-permissions
Otherwise, people could always send the file to our private bucket, we review it, then we add to the Production bucket.
from openfermion.
I'm still confused about why you would want to store the Gaussian integrals. The point of using Gaussians is that the integrals can be generated on the fly. Depending on molecular size and integral usage, it's probably faster to generate the necessary integral when its required. This is how Psi4 and most electronic structure packages work.
from openfermion.
There are a number of reasons for this. One of the reasons is simply that we have noticed that less advanced users are not using our electronic structure package plugins. Having a library with the integrals precomputed makes things very convenient for them. Furthermore, it can be challenging for non-chemists to get SCF convergence in some cases. As you know, this often involves fiddling around with the type of SCF routines used, knowing the spin multiplicity of the system ahead of time, adjusting convergence criteria, etc. This especially true for non-equilibrium geometries. Such a library provides standards that anyone can use without needing to worry about running the calculations themselves. But perhaps we can also write the download scripts to provide the option to only pull down the molecule specification (e.g. basis set, geometry, multiplicity, etc.).
from openfermion.
Hi Ryan, Jhonathan and I have been working on something similar so we might be able to contribute some examples soon. Also, is there a "github-like" platform for data? If so, could we put the data there and link OpenFermion?
from openfermion.
I don't know what would be the best platform for this. Anyone else? Looping in the software engineering core:
@Spaceenter
@maffoo
@dabacon
@Spaceenter
from openfermion.
I think the easiest thing might be Google Cloud Storage. It's very easy to configure and use, and it's very cheap.
cloud.google.com/storage
(We used it to store millions of publicly-available videos and pictures and it works well.)
from openfermion.
Hi Ryan!
I work with Kanav Setia on testing and implementing fermion-to-qubit mappings on molecules. I have, actually, been doing exactly what you have suggested above. I use the NIST Webbook for equilibrium geometries (I use the HF optimized geometries, but I can easily move to a better geometry), and then run calculations on them. In fact, I have been running them in different bases - the MO, the AO symmetrically orthogonalized, and the AO canonically orthogonalized basis. I have the MolecularData files for quite a few molecules at this point (for instance, the AE6 ensemble). To the best of my knowledge, the files aren't that large and should not pose great storage issues.
Let me know if/how you'd like to proceed!
Tarini
from openfermion.
Perhaps the data files should be in a plugin stored in a separate Github repository?
from openfermion.
Hi Tarini,
We definitely do want to proceed with this. Let me push the issue internally at Google and try to find a nice solution. The thing is that I don't think we should use GitHub for a variety of reasons. First, many of the computed files are actually quite large and we should take care not to blow up the size of the repo. Stay tuned!
from openfermion.
Sounds good pull via plugin from http://webbook.nist.gov/ there used to be some work Standards and Reference Data for First-Principles Simulations as re-use would be of interest as this field moves along HD5
from openfermion.
An update: we are still looking into options for this. We are considering setting up GitHub LFS and also looking at some other options involving Google Cloud. One cannot simply use this repo for this purpose. GitHub does not allow files larger than 100Mb and they get upset if your repo is over 1Gb. Storing the integrals for even some small molecules is larger than 1Gb.
from openfermion.
I still feel Google Cloud Storage is the best option, easy to use, cheap, and reliable. You can think it as a giant Google Drive that everyone can have access. If needed I could help set it up (takes 15min at most).
from openfermion.
Wei, how would you recommend we control access to the Google Cloud Storage? For instance, how do we do a review of what people would upload? I am told that it is necessary to have some separation between when people can upload and when they can download (during which time we could review the submission); otherwise, the internet will spam our account for free storage.
from openfermion.
I think this is a good suggestion. I'll look into it now. One other thing we should have is perhaps a few lines of code in OpenFermion which can be used to automatically pull things from the Cloud library. The idea is that when people want to add a new dataset they edit those lines in OpenFermion which is the prompt for us to take a look at what they put in the staging bucket. We can also have discussions about the content of what they uploaded on the pull request in OpenFermion.
from openfermion.
Ryan,
I was wondering, most of the experimental data and calculations for small molecules have been stored in the Computational Chemistry Comparison and Benchmark DataBase at http://cccbdb.nist.gov/introx.asp
It contains experimental and theoretical geometries and energies for many levels of theory (DFT, MP2, MP4, CCSD) and basis sets (cc-pVDZ, cc-pVTZ, cc-pVQZ, aug-cc-pVDZ, aug-cc-pVTZ, aug-cc-pVQZ). Its a bit hard to navigate (one should use the FAQ) but it contains very rich information.
I wonder if this could be used as a starting resource example for geometries (example CO): http://cccbdb.nist.gov/geom2x.asp or if calculations need to performed natively to populate the MolecularData objects.
Here is a complete table of all finished calculations: http://cccbdb.nist.gov/calcdonex.asp
I am sure there is a way to get all such data in an automated way, potentially asking permission
from the CCCBD maintainer (https://www.nist.gov/people/russell-d-johnson-iii)
from openfermion.
@tobigithub , I believe that's a good starting point for the geometries and I've had success in the distant past getting information from the CCCBD maintainers in a computationally friendly format. Loading the energies might be a different story, just because we'd want the integrals to be present for people as well, and something done in an aug-cc-pvqz basis set is probably larger than what we want to serve via the cloud at the moment. However, starting from the geometries and a basic early cap on number of orbitals should provide a quick way to get things initially populated.
from openfermion.
We've made some progress on this and should have everything configured by end of next week at the latest. We have setup a Google Cloud account where the data will be stored. The workflow for contributing to the molecule data library will be roughly that once a pull request is opened, we will email the pull requester a signed URL that can be used to upload the data to a staging bucket. We will then inspect the data for security and other compliance issues and then transfer it into the Google Cloud storage location from where the public may download it. We will explain all of this in the README and provide examples next week (presumably). Thanks to @Spaceenter who has been instrumental in helping us figure this out.
from openfermion.
Exactly. It should be possible to embed the integral generation routines into OpenFermion or at least the test cases. There are standalone integral generation packages that can be embedded into the test case libraries (libxc and libint come to mind immediately) to avoid downloading and working with needlessly large integral files.
from openfermion.
So you are suggesting saving the molecular orbitals but not the integrals and then providing code which can quickly regenerate the integrals. Is that right?
from openfermion.
I think it is a good suggestion to give people the freedom to just download a) molecule information b) the molecular orbitals or c) everything. I think more advanced users would prefer (a) or (b) since I agree that in most cases it will be faster to run Psi4 with a good initial guess for the orbitals, or just feed them to libint, etc.) but I know for a fact that some users would prefer to skip all the electronic structure packages / integral packages and just use option (c). This should be fairly easy given that HDF5 files allow you to load part of the file without loading the whole thing.
from openfermion.
Sure. I would select only smaller molecules for option (c) otherwise deflating the integral file will be as bad as anything.
Even, in the case that one has no interest in running electronic structure, if the computation is fast enough it can be considered part of the pre-processing. Since these are test-cases, the electronic structure part can be hardcoded into the test case so that a user would just run e.g. test_silane
and the electronic structure is embedded as part of the program.
from openfermion.
Upon thinking about it some more, I doubt that one can download just part of the HDF5 file in a straightforward fashion. Thus, it might be best if one were to upload both the HDF5 file with integrals and HDF5 file without integrals (or if it is large molecules, just the HDF5 file without integrals).
from openfermion.
So we now have a working prototype of what this data repository might look like. Jarrod and I put up a few examples. Checkout cloud_library (from the top level of the GitHub repo). We'll add some more datasets over the coming weeks and I hope you will too if you're interested. The procedure for uploading the files is a bit elaborate but we were constrained in some of the design decisions by security considerations / liability associated with us sponsoring a database where external parties can upload data that is downloaded by others.
from openfermion.
@TariniHardikar @kanavsetia @hsim13372 did you have any thoughts on our implementation? I know you seemed interested in contributing to the molecule library.
from openfermion.
the cloud library is live. I am going to close this issue and open a new one about populating the library.
from openfermion.
Related Issues (20)
- multiplicity limitation in molecular data
- Help with one-body and two-body coefficients for orbital removal
- UHF energy with openfermion HOT 1
- scipy > 1.9.3 breaks QuarticFermionicSimulationGate decompose method. HOT 5
- Incorrect Bounds on Trotter Error
- Incorrect formula to calculate required Trotter steps HOT 1
- Resource estimation code not tested as part of the CI
- Should move to black for formatting.
- Why does MajoranaOperator not subclass SymbolicOperator? HOT 1
- Some inconsistencies in molecular single factorization costings HOT 1
- Inconsistencies in the double factorized chemistry resource estimate costing function
- 91 tests fail HOT 7
- Nightly tests are broken HOT 1
- slight modification to function generate_hamiltonian ?
- Operation between MajoranaOperator and numbers? HOT 5
- QuadraticFermionicSimulationGate tests fail with cirq == 1.3.0 HOT 5
- Hubbard model notebook is flaky
- Trotter evolution time may be off by a factor of 2 HOT 2
- 1 test fails HOT 1
- get_sparse_operator fails on non-simplified QubitOperators
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openfermion.