GithubHelp home page GithubHelp logo

Full training set? about ase_ani HOT 12 CLOSED

isayev avatar isayev commented on August 27, 2024
Full training set?

from ase_ani.

Comments (12)

ghutchis avatar ghutchis commented on August 27, 2024

For one, you probably need a separate repository. How much data is it?

from ase_ani.

Jussmith01 avatar Jussmith01 commented on August 27, 2024

Hello,

We are currently in the process of publishing the data descriptor and data set. We will be submitting before the weekend. The data should be available shortly after. We will add a link on this repo's readme when we have it to share. Thanks!

from ase_ani.

isayev avatar isayev commented on August 27, 2024

Hey, @proteneer and @ghutchis : this is multigiabyte data set. We will provide a simple python package to read and slice data.

from ase_ani.

ghutchis avatar ghutchis commented on August 27, 2024

@isayev - I was guessing. But GitHub (even with LFS) may not be the ideal place to store it.

from ase_ani.

hlwoodcock avatar hlwoodcock commented on August 27, 2024

from ase_ani.

andersx avatar andersx commented on August 27, 2024

For a permanent solution I suggest storing the dataset somewhere that allows for a DOI. Not sure if you can get this using Google drive. Maybe datadryad.org? I guess it also depends on how much data we are talking about, and if you are willing to spend money on hosting at all.

from ase_ani.

proteneer avatar proteneer commented on August 27, 2024

What type of information is in the training set? Atomic coordinates, types, and predicted QM energies? Bond orders? Topologies? SMILES?

We could also consider hosting it ourselves as well as mirror.

from ase_ani.

isayev avatar isayev commented on August 27, 2024

@andersx yup, we will host it with DOI!
@proteneer this data is xyz file like. We have 3D array containing cartesian coordinates for each conformer of the molecule, vector of atom species and vector of energies. We don't use bond orders or topologies.

from ase_ani.

proteneer avatar proteneer commented on August 27, 2024

I see - I presume you're tossing out formal charges as well then? If you're also throwing away bond orders/topologies it might be a little difficult to do reconstruction/debugging but we can live with xyz-like for now. You can probably get away with 16bits of precision + compression to reduce the file sizes (internally we use something like the gromacs XTC format + gzip with 16bits to drastically reduce sizes).

from ase_ani.

isayev avatar isayev commented on August 27, 2024

from ase_ani.

proteneer avatar proteneer commented on August 27, 2024

Okay - works for us.

from ase_ani.

proteneer avatar proteneer commented on August 27, 2024

Closing - thanks guys!

from ase_ani.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.