GithubHelp home page GithubHelp logo

Comments (25)

Spaceenter avatar Spaceenter commented on July 2, 2024 1

How about this:
Set up two buckets: a) Staging bucket, b) Production bucket. All the newly added files are uploaded to Staging bucket, and people only have write permission to the Staging bucket but no read permission. After the file is uploaded (which they cannot read), it needs to be reviewed by us, then we move it t o the Production bucket if the file is good. And finally, everyone is able to read from the Production bucket (but only we have the write permission to the Production bucket).
Also, even for the write permission to Staging bucket, we could control - we only give write permission to vetted people.
https://cloud.google.com/storage/docs/access-control/iam-permissions

Otherwise, people could always send the file to our private bucket, we review it, then we add to the Production bucket.

@babbush

from openfermion.

jdwhitfield avatar jdwhitfield commented on July 2, 2024 1

I'm still confused about why you would want to store the Gaussian integrals. The point of using Gaussians is that the integrals can be generated on the fly. Depending on molecular size and integral usage, it's probably faster to generate the necessary integral when its required. This is how Psi4 and most electronic structure packages work.

from openfermion.

babbush avatar babbush commented on July 2, 2024 1

There are a number of reasons for this. One of the reasons is simply that we have noticed that less advanced users are not using our electronic structure package plugins. Having a library with the integrals precomputed makes things very convenient for them. Furthermore, it can be challenging for non-chemists to get SCF convergence in some cases. As you know, this often involves fiddling around with the type of SCF routines used, knowing the spin multiplicity of the system ahead of time, adjusting convergence criteria, etc. This especially true for non-equilibrium geometries. Such a library provides standards that anyone can use without needing to worry about running the calculations themselves. But perhaps we can also write the download scripts to provide the option to only pull down the molecule specification (e.g. basis set, geometry, multiplicity, etc.).

from openfermion.

hsim13372 avatar hsim13372 commented on July 2, 2024

Hi Ryan, Jhonathan and I have been working on something similar so we might be able to contribute some examples soon. Also, is there a "github-like" platform for data? If so, could we put the data there and link OpenFermion?

from openfermion.

babbush avatar babbush commented on July 2, 2024

I don't know what would be the best platform for this. Anyone else? Looping in the software engineering core:
@Spaceenter
@maffoo
@dabacon
@Spaceenter

from openfermion.

Spaceenter avatar Spaceenter commented on July 2, 2024

I think the easiest thing might be Google Cloud Storage. It's very easy to configure and use, and it's very cheap.
cloud.google.com/storage
(We used it to store millions of publicly-available videos and pictures and it works well.)

from openfermion.

TariniHardikar avatar TariniHardikar commented on July 2, 2024

Hi Ryan!

I work with Kanav Setia on testing and implementing fermion-to-qubit mappings on molecules. I have, actually, been doing exactly what you have suggested above. I use the NIST Webbook for equilibrium geometries (I use the HF optimized geometries, but I can easily move to a better geometry), and then run calculations on them. In fact, I have been running them in different bases - the MO, the AO symmetrically orthogonalized, and the AO canonically orthogonalized basis. I have the MolecularData files for quite a few molecules at this point (for instance, the AE6 ensemble). To the best of my knowledge, the files aren't that large and should not pose great storage issues.

Let me know if/how you'd like to proceed!

Tarini

from openfermion.

kevinsung avatar kevinsung commented on July 2, 2024

Perhaps the data files should be in a plugin stored in a separate Github repository?

from openfermion.

babbush avatar babbush commented on July 2, 2024

Hi Tarini,

We definitely do want to proceed with this. Let me push the issue internally at Google and try to find a nice solution. The thing is that I don't think we should use GitHub for a variety of reasons. First, many of the computed files are actually quite large and we should take care not to blow up the size of the repo. Stay tuned!

from openfermion.

QuantumLeaves avatar QuantumLeaves commented on July 2, 2024

Sounds good pull via plugin from http://webbook.nist.gov/ there used to be some work Standards and Reference Data for First-Principles Simulations as re-use would be of interest as this field moves along HD5

from openfermion.

babbush avatar babbush commented on July 2, 2024

An update: we are still looking into options for this. We are considering setting up GitHub LFS and also looking at some other options involving Google Cloud. One cannot simply use this repo for this purpose. GitHub does not allow files larger than 100Mb and they get upset if your repo is over 1Gb. Storing the integrals for even some small molecules is larger than 1Gb.

from openfermion.

Spaceenter avatar Spaceenter commented on July 2, 2024

I still feel Google Cloud Storage is the best option, easy to use, cheap, and reliable. You can think it as a giant Google Drive that everyone can have access. If needed I could help set it up (takes 15min at most).

from openfermion.

babbush avatar babbush commented on July 2, 2024

Wei, how would you recommend we control access to the Google Cloud Storage? For instance, how do we do a review of what people would upload? I am told that it is necessary to have some separation between when people can upload and when they can download (during which time we could review the submission); otherwise, the internet will spam our account for free storage.

from openfermion.

babbush avatar babbush commented on July 2, 2024

I think this is a good suggestion. I'll look into it now. One other thing we should have is perhaps a few lines of code in OpenFermion which can be used to automatically pull things from the Cloud library. The idea is that when people want to add a new dataset they edit those lines in OpenFermion which is the prompt for us to take a look at what they put in the staging bucket. We can also have discussions about the content of what they uploaded on the pull request in OpenFermion.

from openfermion.

tobigithub avatar tobigithub commented on July 2, 2024

Ryan,
I was wondering, most of the experimental data and calculations for small molecules have been stored in the Computational Chemistry Comparison and Benchmark DataBase at http://cccbdb.nist.gov/introx.asp

It contains experimental and theoretical geometries and energies for many levels of theory (DFT, MP2, MP4, CCSD) and basis sets (cc-pVDZ, cc-pVTZ, cc-pVQZ, aug-cc-pVDZ, aug-cc-pVTZ, aug-cc-pVQZ). Its a bit hard to navigate (one should use the FAQ) but it contains very rich information.

I wonder if this could be used as a starting resource example for geometries (example CO): http://cccbdb.nist.gov/geom2x.asp or if calculations need to performed natively to populate the MolecularData objects.

Here is a complete table of all finished calculations: http://cccbdb.nist.gov/calcdonex.asp

I am sure there is a way to get all such data in an automated way, potentially asking permission
from the CCCBD maintainer (https://www.nist.gov/people/russell-d-johnson-iii)

from openfermion.

jarrodmcc avatar jarrodmcc commented on July 2, 2024

@tobigithub , I believe that's a good starting point for the geometries and I've had success in the distant past getting information from the CCCBD maintainers in a computationally friendly format. Loading the energies might be a different story, just because we'd want the integrals to be present for people as well, and something done in an aug-cc-pvqz basis set is probably larger than what we want to serve via the cloud at the moment. However, starting from the geometries and a basic early cap on number of orbitals should provide a quick way to get things initially populated.

from openfermion.

babbush avatar babbush commented on July 2, 2024

We've made some progress on this and should have everything configured by end of next week at the latest. We have setup a Google Cloud account where the data will be stored. The workflow for contributing to the molecule data library will be roughly that once a pull request is opened, we will email the pull requester a signed URL that can be used to upload the data to a staging bucket. We will then inspect the data for security and other compliance issues and then transfer it into the Google Cloud storage location from where the public may download it. We will explain all of this in the README and provide examples next week (presumably). Thanks to @Spaceenter who has been instrumental in helping us figure this out.

from openfermion.

jdwhitfield avatar jdwhitfield commented on July 2, 2024

Exactly. It should be possible to embed the integral generation routines into OpenFermion or at least the test cases. There are standalone integral generation packages that can be embedded into the test case libraries (libxc and libint come to mind immediately) to avoid downloading and working with needlessly large integral files.

from openfermion.

babbush avatar babbush commented on July 2, 2024

So you are suggesting saving the molecular orbitals but not the integrals and then providing code which can quickly regenerate the integrals. Is that right?

from openfermion.

babbush avatar babbush commented on July 2, 2024

I think it is a good suggestion to give people the freedom to just download a) molecule information b) the molecular orbitals or c) everything. I think more advanced users would prefer (a) or (b) since I agree that in most cases it will be faster to run Psi4 with a good initial guess for the orbitals, or just feed them to libint, etc.) but I know for a fact that some users would prefer to skip all the electronic structure packages / integral packages and just use option (c). This should be fairly easy given that HDF5 files allow you to load part of the file without loading the whole thing.

from openfermion.

jdwhitfield avatar jdwhitfield commented on July 2, 2024

Sure. I would select only smaller molecules for option (c) otherwise deflating the integral file will be as bad as anything.

Even, in the case that one has no interest in running electronic structure, if the computation is fast enough it can be considered part of the pre-processing. Since these are test-cases, the electronic structure part can be hardcoded into the test case so that a user would just run e.g. test_silane and the electronic structure is embedded as part of the program.

from openfermion.

babbush avatar babbush commented on July 2, 2024

Upon thinking about it some more, I doubt that one can download just part of the HDF5 file in a straightforward fashion. Thus, it might be best if one were to upload both the HDF5 file with integrals and HDF5 file without integrals (or if it is large molecules, just the HDF5 file without integrals).

from openfermion.

babbush avatar babbush commented on July 2, 2024

So we now have a working prototype of what this data repository might look like. Jarrod and I put up a few examples. Checkout cloud_library (from the top level of the GitHub repo). We'll add some more datasets over the coming weeks and I hope you will too if you're interested. The procedure for uploading the files is a bit elaborate but we were constrained in some of the design decisions by security considerations / liability associated with us sponsoring a database where external parties can upload data that is downloaded by others.

from openfermion.

babbush avatar babbush commented on July 2, 2024

@TariniHardikar @kanavsetia @hsim13372 did you have any thoughts on our implementation? I know you seemed interested in contributing to the molecule library.

from openfermion.

babbush avatar babbush commented on July 2, 2024

the cloud library is live. I am going to close this issue and open a new one about populating the library.

from openfermion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.