GithubHelp home page GithubHelp logo

Comments (7)

jboes avatar jboes commented on June 26, 2024

Thanks for letting me know. Can you provide an example unit cell and usage? I'll look into it.

from catkit.

mhangaard avatar mhangaard commented on June 26, 2024

Here is my example:

import ase.db
from catkit.gen.utils.connectivity import get_voronoi_neighbors

# Connect the ase-db.
c = ase.db.connect('../input/some_li_atoms.db')
s = c.select()

# Connect to output.
c_cm = ase.db.connect('../input/some_atoms_with_cm_2.db')

N = 0
for row in s:
    # Get atoms.
    atoms = row.toatoms()
    # Get connectivity.
    cm = get_voronoi_neighbors(atoms)
    # Write to ase db.
    atoms.connectivity = cm
    c_cm.write(atoms, key_value_pairs=row.key_value_pairs,
               data={'connectivity': cm})
    N += 1
    if N % 100 == 0:
        print(N)
print('pulled {} structures from db'.format(N))

I'll send you the data directly

from catkit.

jboes avatar jboes commented on June 26, 2024

Ok, looks like this is being caused because too many atoms are being passed to qhull. This is because of the automatic expansion of the cell from catkit.gen.utils.expand_cell is either not intelligent enough, or there is simply no way to guarantee correct bonding identification of these "needle-like" structure.

Possible solutions:

  • If it's a bulk structure, standardization of the cell will make it more orthogonal, and thus to get a proper number of atoms surrounding all atoms in the cell.
  • Manually specify the number of unit cell repetitions. I've exposed the padding kwarg which will be passed to the catkit.gen.utils.expand_cell function, allowing the user to set their own repetitions of the cell. Only recommended for expert users, as this could result in returning incorrect connectivity.
  • There may be a more intelligent way to get smaller padding that still guarantees correct bonding. I can look into this option further.

Currently, there is a warning that provides some instruction if the repeated cell returned is quite large. This will give the user some information on how to proceed until I can find out if a better padding solution exists. #85

from catkit.

mhangaard avatar mhangaard commented on June 26, 2024

Making the get_standardized_cell transformation on the atoms helps a lot, but does not solve the problem in all cases.

I am still getting some index from expand_cell that are over 100k in length.

import ase.db
from catkit.gen.utils.connectivity import get_voronoi_neighbors
from catkit.gen.symmetry import get_standardized_cell
import warnings


# Connect the ase-db.
c = ase.db.connect('../input/some_li_atoms.db')
s = c.select()

# Connect to output.
c_cm = ase.db.connect('../input/some_atoms_with_cm.db')

N = 0
for row in s:
    # Get atoms.
    atoms = row.toatoms()

    # Conventional standard cell.
    atoms = get_standardized_cell(atoms, primitive=False)
    if len(atoms) != int(row.natoms):
        warnings.warn(str(row.natoms) + ' != ' + str(len(atoms)))

    # Get connectivity.
    cm = get_voronoi_neighbors(atoms)

    # Write to ase db.
    c_cm.write(atoms, key_value_pairs=row.key_value_pairs,
               data={'connectivity': cm})

    # Print progress.
    N += 1
    if N % 100 == 0:
        print(N)
print('pulled {} structures from db'.format(N))

If I also make the following check, after standardizing the cell, most of my structures make it through and the most unreasonable ones get skipped.

    # Check size of expanded cell.
    index, coords, offsets = expand_cell(atoms)
    if len(index) > 30000:
        warnings.warn(str(len(index)))
        continue

but the limit should be either user input or be estimated from the available memory with psutil. I just don't know the relation between len(index) and required memory. It's a property of qhull.

from catkit.

jboes avatar jboes commented on June 26, 2024

Having a user-defined memory flag is going to be a really ugly feature. If most other nearest-neighbor functions from other programs are working for this, there must be a simpler solution which involves simply making the algorithm more efficient. I'll look into this in the very near future.

from catkit.

jboes avatar jboes commented on June 26, 2024

Progress so far:

memory profiling on the bulk structures provided. Values reported for structures requiring over 5 MiB:

Primitive form:
MiB memory used: [ 41.9 58.2 175.4]
Structure ID: [190 12 101]

Standard form:
MiB memory used: [ 9.9 10.9 11.1 12.1 13.6 14.8 16.2 20. 28.6]
Structure ID: [185 100 207 222 179 218 149 163 190]

Taking the primitive help more often than standard, but when the primitive form does make it worse, it's really bad. Wouldn't expect 30MiB to crash any modern system, but it more than likely doesn't need to be this expensive.

This function gets called by every structure generator, and sometimes multiple times, so its efficiency is important.

The worst structure (shows up in both cases above) is 190:
sample-190

The automatically assigned padding to the extended cell is [7, 7, 7], but [1, 1, 1] will produce the same connectivity, so the current automatically assigned padding is excessive.

from catkit.

jboes avatar jboes commented on June 26, 2024

Ok, a user-defined cutoff radius is now the default for determining the number of expansions to the cell, similar to ASE. The default cutoff is 5 angstroms, which is probably more than sufficient for any structure which using a Voronoi method is actually likely to lead to a reasonable representation of the connectivity matrix.

Gives the same connectivity matrix for all provided examples and reduces the memory requirement to less than 0.1 MiB for all cases. #86

from catkit.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.