This seems to happen for high aspect ratio unit cells with many atoms, when the call i

Here is my example: <div class="snippet-clipboard-content notranslate position-rel

get_voronoi_neighbors exhausts memory about catkit HOT 7 CLOSED

suncat-center commented on June 26, 2024

get_voronoi_neighbors exhausts memory

from catkit.

Comments (7)

jboes commented on June 26, 2024

Thanks for letting me know. Can you provide an example unit cell and usage? I'll look into it.

from catkit.

mhangaard commented on June 26, 2024

Here is my example:

import ase.db
from catkit.gen.utils.connectivity import get_voronoi_neighbors

# Connect the ase-db.
c = ase.db.connect('../input/some_li_atoms.db')
s = c.select()

# Connect to output.
c_cm = ase.db.connect('../input/some_atoms_with_cm_2.db')

N = 0
for row in s:
    # Get atoms.
    atoms = row.toatoms()
    # Get connectivity.
    cm = get_voronoi_neighbors(atoms)
    # Write to ase db.
    atoms.connectivity = cm
    c_cm.write(atoms, key_value_pairs=row.key_value_pairs,
               data={'connectivity': cm})
    N += 1
    if N % 100 == 0:
        print(N)
print('pulled {} structures from db'.format(N))

I'll send you the data directly

from catkit.

jboes commented on June 26, 2024

Ok, looks like this is being caused because too many atoms are being passed to qhull. This is because of the automatic expansion of the cell from catkit.gen.utils.expand_cell is either not intelligent enough, or there is simply no way to guarantee correct bonding identification of these "needle-like" structure.

Possible solutions:

If it's a bulk structure, standardization of the cell will make it more orthogonal, and thus to get a proper number of atoms surrounding all atoms in the cell.
Manually specify the number of unit cell repetitions. I've exposed the padding kwarg which will be passed to the catkit.gen.utils.expand_cell function, allowing the user to set their own repetitions of the cell. Only recommended for expert users, as this could result in returning incorrect connectivity.
There may be a more intelligent way to get smaller padding that still guarantees correct bonding. I can look into this option further.

Currently, there is a warning that provides some instruction if the repeated cell returned is quite large. This will give the user some information on how to proceed until I can find out if a better padding solution exists. #85

from catkit.

mhangaard commented on June 26, 2024

Making the get_standardized_cell transformation on the atoms helps a lot, but does not solve the problem in all cases.

I am still getting some index from expand_cell that are over 100k in length.

import ase.db
from catkit.gen.utils.connectivity import get_voronoi_neighbors
from catkit.gen.symmetry import get_standardized_cell
import warnings


# Connect the ase-db.
c = ase.db.connect('../input/some_li_atoms.db')
s = c.select()

# Connect to output.
c_cm = ase.db.connect('../input/some_atoms_with_cm.db')

N = 0
for row in s:
    # Get atoms.
    atoms = row.toatoms()

    # Conventional standard cell.
    atoms = get_standardized_cell(atoms, primitive=False)
    if len(atoms) != int(row.natoms):
        warnings.warn(str(row.natoms) + ' != ' + str(len(atoms)))

    # Get connectivity.
    cm = get_voronoi_neighbors(atoms)

    # Write to ase db.
    c_cm.write(atoms, key_value_pairs=row.key_value_pairs,
               data={'connectivity': cm})

    # Print progress.
    N += 1
    if N % 100 == 0:
        print(N)
print('pulled {} structures from db'.format(N))

If I also make the following check, after standardizing the cell, most of my structures make it through and the most unreasonable ones get skipped.

    # Check size of expanded cell.
    index, coords, offsets = expand_cell(atoms)
    if len(index) > 30000:
        warnings.warn(str(len(index)))
        continue

but the limit should be either user input or be estimated from the available memory with psutil. I just don't know the relation between len(index) and required memory. It's a property of qhull.

from catkit.

jboes commented on June 26, 2024

Having a user-defined memory flag is going to be a really ugly feature. If most other nearest-neighbor functions from other programs are working for this, there must be a simpler solution which involves simply making the algorithm more efficient. I'll look into this in the very near future.

from catkit.

jboes commented on June 26, 2024

Progress so far:

memory profiling on the bulk structures provided. Values reported for structures requiring over 5 MiB:

Primitive form:
MiB memory used: [ 41.9 58.2 175.4]
Structure ID: [190 12 101]

Standard form:
MiB memory used: [ 9.9 10.9 11.1 12.1 13.6 14.8 16.2 20. 28.6]
Structure ID: [185 100 207 222 179 218 149 163 190]

Taking the primitive help more often than standard, but when the primitive form does make it worse, it's really bad. Wouldn't expect 30MiB to crash any modern system, but it more than likely doesn't need to be this expensive.

This function gets called by every structure generator, and sometimes multiple times, so its efficiency is important.

The worst structure (shows up in both cases above) is 190:

The automatically assigned padding to the extended cell is [7, 7, 7], but [1, 1, 1] will produce the same connectivity, so the current automatically assigned padding is excessive.

from catkit.

jboes commented on June 26, 2024

Ok, a user-defined cutoff radius is now the default for determining the number of expansions to the cell, similar to ASE. The default cutoff is 5 angstroms, which is probably more than sufficient for any structure which using a Voronoi method is actually likely to lead to a reasonable representation of the connectivity matrix.

Gives the same connectivity matrix for all provided examples and reduces the memory requirement to less than 0.1 MiB for all cases. #86

from catkit.

get_voronoi_neighbors exhausts memory about catkit HOT 7 CLOSED

Comments (7)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs