Comments (7)
Thanks for letting me know. Can you provide an example unit cell and usage? I'll look into it.
from catkit.
Here is my example:
import ase.db
from catkit.gen.utils.connectivity import get_voronoi_neighbors
# Connect the ase-db.
c = ase.db.connect('../input/some_li_atoms.db')
s = c.select()
# Connect to output.
c_cm = ase.db.connect('../input/some_atoms_with_cm_2.db')
N = 0
for row in s:
# Get atoms.
atoms = row.toatoms()
# Get connectivity.
cm = get_voronoi_neighbors(atoms)
# Write to ase db.
atoms.connectivity = cm
c_cm.write(atoms, key_value_pairs=row.key_value_pairs,
data={'connectivity': cm})
N += 1
if N % 100 == 0:
print(N)
print('pulled {} structures from db'.format(N))
I'll send you the data directly
from catkit.
Ok, looks like this is being caused because too many atoms are being passed to qhull. This is because of the automatic expansion of the cell from catkit.gen.utils.expand_cell
is either not intelligent enough, or there is simply no way to guarantee correct bonding identification of these "needle-like" structure.
Possible solutions:
- If it's a bulk structure, standardization of the cell will make it more orthogonal, and thus to get a proper number of atoms surrounding all atoms in the cell.
- Manually specify the number of unit cell repetitions. I've exposed the
padding
kwarg which will be passed to thecatkit.gen.utils.expand_cell
function, allowing the user to set their own repetitions of the cell. Only recommended for expert users, as this could result in returning incorrect connectivity. - There may be a more intelligent way to get smaller padding that still guarantees correct bonding. I can look into this option further.
Currently, there is a warning that provides some instruction if the repeated cell returned is quite large. This will give the user some information on how to proceed until I can find out if a better padding solution exists. #85
from catkit.
Making the get_standardized_cell
transformation on the atoms helps a lot, but does not solve the problem in all cases.
I am still getting some index
from expand_cell
that are over 100k in length.
import ase.db
from catkit.gen.utils.connectivity import get_voronoi_neighbors
from catkit.gen.symmetry import get_standardized_cell
import warnings
# Connect the ase-db.
c = ase.db.connect('../input/some_li_atoms.db')
s = c.select()
# Connect to output.
c_cm = ase.db.connect('../input/some_atoms_with_cm.db')
N = 0
for row in s:
# Get atoms.
atoms = row.toatoms()
# Conventional standard cell.
atoms = get_standardized_cell(atoms, primitive=False)
if len(atoms) != int(row.natoms):
warnings.warn(str(row.natoms) + ' != ' + str(len(atoms)))
# Get connectivity.
cm = get_voronoi_neighbors(atoms)
# Write to ase db.
c_cm.write(atoms, key_value_pairs=row.key_value_pairs,
data={'connectivity': cm})
# Print progress.
N += 1
if N % 100 == 0:
print(N)
print('pulled {} structures from db'.format(N))
If I also make the following check, after standardizing the cell, most of my structures make it through and the most unreasonable ones get skipped.
# Check size of expanded cell.
index, coords, offsets = expand_cell(atoms)
if len(index) > 30000:
warnings.warn(str(len(index)))
continue
but the limit should be either user input or be estimated from the available memory with psutil. I just don't know the relation between len(index) and required memory. It's a property of qhull.
from catkit.
Having a user-defined memory flag is going to be a really ugly feature. If most other nearest-neighbor functions from other programs are working for this, there must be a simpler solution which involves simply making the algorithm more efficient. I'll look into this in the very near future.
from catkit.
Progress so far:
memory profiling on the bulk structures provided. Values reported for structures requiring over 5 MiB:
Primitive form:
MiB memory used: [ 41.9 58.2 175.4]
Structure ID: [190 12 101]
Standard form:
MiB memory used: [ 9.9 10.9 11.1 12.1 13.6 14.8 16.2 20. 28.6]
Structure ID: [185 100 207 222 179 218 149 163 190]
Taking the primitive help more often than standard, but when the primitive form does make it worse, it's really bad. Wouldn't expect 30MiB to crash any modern system, but it more than likely doesn't need to be this expensive.
This function gets called by every structure generator, and sometimes multiple times, so its efficiency is important.
The worst structure (shows up in both cases above) is 190:
The automatically assigned padding to the extended cell is [7, 7, 7], but [1, 1, 1] will produce the same connectivity, so the current automatically assigned padding is excessive.
from catkit.
Ok, a user-defined cutoff radius is now the default for determining the number of expansions to the cell, similar to ASE. The default cutoff is 5 angstroms, which is probably more than sufficient for any structure which using a Voronoi method is actually likely to lead to a reasonable representation of the connectivity matrix.
Gives the same connectivity matrix for all provided examples and reduces the memory requirement to less than 0.1 MiB for all cases. #86
from catkit.
Related Issues (18)
- Query for Delaunay triangulation
- Recommendations about merge
- AdsorptionSites incompatible with newest numpy version
- Setup.py broken HOT 1
- .get_adsorption_sites() fails when called more than once HOT 4
- Support of connectivity update when repeating slabs with periodic boundary conditions. HOT 3
- Docstrings for all functions, classes
- add_adsorbate fails for linear molecules with 3 or more atoms HOT 2
- Improve molecule geometry prediction HOT 1
- Typo? catkit.gen.surface.SlabGenerator.make_symmetric() calls wrong function HOT 1
- Gratoms attribute with _ missing? (_cell, _pbc) HOT 1
- Reporting 'Gratoms' object has no attribute '_cell' when instantiate surface object HOT 1
- CatGen documentation example code doesn't work
- AttributeError: 'Gratoms' object has no attribute '_cell' (with the latest version of ase.3.22.0 and catakit.0.5.4) HOT 7
- reporting alerts when import catkit
- Adsorbate splits in example script HOT 1
- ReactionNerwork(Catgen) error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from catkit.