GithubHelp home page GithubHelp logo

royallgroup / tcc Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 5.0 9.36 MB

The Topological Cluster Classification algorithm

Home Page: https://royallgroup.github.io/TCC/

License: GNU General Public License v3.0

C 81.10% C++ 0.75% CMake 0.27% Python 17.88%

tcc's Introduction

README

Latest version of the Topological Cluster Classification (TCC) code.

For documentation, open index.html in the docs folder or https://royallgroup.github.io/TCC/.

Citation

If this software is used in the prepration of published work please cite:
Malins A, Williams SR, Eggers J & Royall CP "Identification of Structure in Condensed Matter with the Topological Cluster Classification", J. Chem. Phys. (2013). 139 234506

Licenses

This software is distributed under the GNU General Public License v3. For more details see the LICENSE file.

This software makes use of libraries released under other licenses:

tcc's People

Contributors

franciturci avatar fturci avatar merrygoat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tcc's Issues

Additional file format: DynamO config.*.xml.bz2

The DynamO package has been extensively used to generate hard sphere data. Among its output is "config.*.xml.bz2"

It would be nice to be able to read this into the tcc or perhaps a Python script is more appropriate?)

Ben has the file format....

Unit tests for detection of individual structures

Running the TCC on isolated cluster geometries should return exactly 1 of the structure, and possibly multiple subunits if it is a composite structure. Having unit tests would facilitate implementation of new structures through test-focused design.

11F performance improvement

When detecting 11F clusters, in the function get_bonded_6As, can loop over mmemsp4c[5a_common_particle] instead of all 6A's.

Standardisation of coordinate file format

It would be good to have a standard format for xyz type files.

The dinosaur approach of VMD has that the line after N can be anything. This is the situation as it stands, although an earlier version used PDB format.

Ovito uses an enriched xyz format, which is presumably compatible with the format above (but we could read in much that is in the *.in files etc, from the coordinate files.

Create key for cluster output

Create a key which describes the order order of particle IDs in cluster output.

E.g. for 11C the cluster is output in the order [s_com, s_i, s_j, r_ca, r_cb, d_i, d_i, d_j, d_j, unc_i, unc_j]

This should take into account the sorting that goes on for each cluster type after it is written to the hc array.

Decide which new clusters to add to the TCC

Hi Josh

Peter has been refactoring the TCC to be able to take new clusters.

Could we test this by adding one from the your HS-morphometric FEL minima.

And/or a cluster from the CuZr stuff - that is not already in the TCC.

Output clusters to an XYZ file

This is a common procedure that is done all the time for visualization purposes. At the moment it requires a script to match the RAW files and the coordinates. This would be especially useful for undergrads since they like to see the clusters they have produced.

Improvement to 12K method

To reduce the number of calls to the bond method in the first part of the 12K identification symmetry can be used.

It appears to be the case that the particles in the two rings are ordered sequentially anticlockwise (from the perspective of the center particle) with 0-3 in the first ring and 4-7 in the second ring. This is observed but not rigorously proved yet.

This means that only one pair of bonds (one particle bonded to two from the other ring) is needed to uniquely identify the configuration. This would decrease the calls to the bonds method from 24 per cluster to 3 per cluster.

Build test should fail when build fails

At the moment it is possible for the build to fail but the build test to pass.

This is probably because the return code from subprocess.run does not reflect the return code from cmake.

Python API to TCC

Many users already have implemented simple python interfaces to handle the input/output with a python frontend, so it would be good to have a canonical version. A simple implementation would create a temporary working directory, execute the TCC within there, extract the data and then delete the directory.

Consider removing net clusters

This bit of code is horribly implemented. This is something that can and is done as a post analysis method in python.

Removing (or fixing) net clusters will also allow testing do_clust on memset in setup.c to improve performance.

Build should include install option

@FTurci comment in #50 thread:

Other remark: "make" should be followed by "make install" and we should not have plenty of executables copied everywhere. "cmake" has also a prefix option. This was a source of confusion when we tried to identify the issue.

Unclear output

If only a subset of particles are analysed the static clust output file reports those not analysed as having a population of zero. This could be misleading. The static clust file should show if a cluster was not analysed.

Clusters can be detected over PBCs in small boxes

Clusters can be detected twice over the PBCs if the box is less than twice the cutoff length in any dimension.

This is because cluster A can be connected to cluster B twice, once over the PBC. This is exposed when the mem array is used to loop over clusters.

Compilation fails on Mac OS

The latest development version does not compile on the Mac. The problem is that Mac OS is not contemplated in the make_directory function.

Here is the error during make

/Users/ft14968/Documents/GitHub/TCC/tcc/src/tools.c: In function 'make_directory':
/Users/ft14968/Documents/GitHub/TCC/tcc/src/tools.c:105:12: warning: implicit declaration of function '_mkdir' [-Wimplicit-function-declaration]
         if(_mkdir(name) != 0) {
            ^~~~~~
[ 97%] Building C object tcc/src/CMakeFiles/tcc.dir/voronoi_bonds.c.o
[100%] Linking C executable ../../../bin/tcc
Undefined symbols for architecture x86_64:
  "__mkdir", referenced from:
      _make_directory in tools.c.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
make[2]: *** [../bin/tcc] Error 1
make[1]: *** [tcc/src/CMakeFiles/tcc.dir/all] Error 2
make: *** [all] Error 2

Unit tests require tcc executable in path

Unit tests fail on @merrygoat's windows machine because the unit test assumes the tcc executable is in the system path. Solution requires either an install feature (e.g. make install as suggested in #59), instructions for adding to the path or a workaround this requirement e.g. by directly locating the executable within the python front end.

Remove centers magic numbers

11A and 13A centers can be output to xyz files. The variables to do this are set using magic numbers which are unstable to changing the cluster lists.

Only the centers of a limited number of clusters are interesting - not all clusters even have centers.

There are multiple ways of fixing this:

  • Output centers of all clusters using the "S" of the 's' clust lists. This would produce a centers xyz file for each of the cluster types. While this would be messy (many empty files) it would also be the most general (all clusters with centers and all future clusters would be output).
  • Iterate over the cluster name list to identify the position of 11A and 13A to set the magic numbers at runtime. This reduces the files output and is most like the original behavior. The disadvantage is that outputting the centers of other cluster types would not be supported without further changes.
  • Remove all center xyz output and make sure that the centers are output as a separate species in the full cluster xyz files. A python script could be supplied to strip the centers from the xyz and make center xyzs if required. This is the cleanest method and reduces the number of output types, however for people processing cluster centers this increases the number of steps required and complexity to get the data.

Which is best depends on how much the xyz centers are used and whether the centers of any other clusters are interesting.

Static cluster file broken

The BCC_15 row has one too many columns, which causes attempts to read the table with pandas to fail.

11A performance improvement

At the moment 11A's loop over all pairs of 6A to find linked spindles. Since a linked spindle should be stored in the mmem_sp4c array it should be possible to loop over all 6A_i, loop over the two spindles and then loop over the spindles in mmem. This is essentially the same as the detection used for the 9K.

13K clusters missed

Sometimes a 13K is detected but the independent particles not in 11F cannot be identified. In this case the cluster is not recorded. It is not currently known if this is an incorrect 13K identification or a correct 13K that is not reported correctly.

Linked to issue #2 which has a configuration which can produce these clusters.

Net cluster script input variables

At the moment the cluster priority list is set directly in the script. The priority list should either be read from the command line or a file.

BCC15 Clusters not working

The detection of BCC clusters was turned off in commit 464b28f due to a segfault.

This is due to an index out of range in clusters.c function Clusters_GetBCC_15, array "sj" is referenced by particle number even though it is a fixed length array. It is not clear what function this variable has.

Need to identify function of sj and either remove it or correct index.

Static cluster file annoying

The final two rows containing nrows and other information are very annoying, as one must know precisely how many structures one expects to read the central table correctly. These should be placed at the top, so the user has the option of ignoring them easily and reading the table becomes more flexible when e.g. new structures are added.

Create a generic XYZ output function

Combine the functions which output 11A centers and 13A centers into a generic function which can output any cluster type. This should be relatively easy given that all raw coordinates are stored in the s_clust name global arrays.

Too many files output

The TCC outputs raw cluster files for every cluster regardless of whether the cluster is selected for analysis. Only those clusters which are analysed should have an output file.

11C not correctly detected

The 11C detection algorithm does not check for bonds between the two pairs of non-common bonded particles in the 7A rings (labelled rd1, rd2, rd3 and rd4 in the methods paper). This allows detection of 11Cs with unbonded pairs. This is not a new issue, it has been present in the code for some years.

Enforcing this bond condition will decrease the number of detected 11C clusters.

Bond cutoff documentation

It should be made much more clear that the bond length cutoff still applies when the Voronoi construction is used to determine the bond network. One would naturally assume that this setting would have no effect if the Voronoi was used.

Perhaps fixed length cutoff should be turned off by default when the Voronoi bond detection is turned on?

Implement 7T clusters

A 7Z cluster is a 6Z cluster with an extra particle.

There are two types of 7Z cluster depending on where the particle is attached.

The symmetric type is identified by having an extra particle bonded to 2 ring particles and a non-common spindle - this equates to a bond to (hc6z[0]) and (hc6z[2]) and (hc6z[4] or hc6z[5])

The asymmetric type is identified by having an extra particle bonded to 2 ring particles and a common spindle - this equates to a bond to (hc6z[0] or hc6z[3]) and (hc6z[1] or hc6z[2]) and (hc6z[4] or hc6z[5])

A 7A is created when the extra particle is bonded to both common ring particles and both distinct spindle particles - this is not a valid 7Z

Constant density box only valid when N is constant

Specifying a density via the ini file requires a constant number of particles from frame to frame otherwise the density will change. Check for density mode = 0 if num particles varies during XYZ parse.

12E performance improvement

Since the new 5A must have the two spindle particles common with the uncommon spindles of the 6As in the parent can loop over mmemsp4c[5a_spindle_1] and mmemsp4c[5a_spindle_2] instead of over all 5A.

Write a proper XYZ parser

Should be able to read XYZ files with varying numbers of particles in each frame
Should be able to parse XYZ file to check it is valid
Should automatically read in number of particles and type

mmem arrays only use spindles

Although all of a cluster is stored in the mmem array, it seems only the spindles are queried for the 6A and 7A clusters. Removing non-spindle particles from the mmem array will improve performance by decreasing looping on mmem access, however before doing this it is essential to check that no other function uses the non-spindle particles as part of a cluster creation.

Improve integration tests

At the moment the integration tests just directly compare the output files with a sample output file. This is very vulnerable to any changes in the input file format, addition of new clusters and floating point differences on different platforms.

An interpreter should be created to read in each output file type and parse the results comparing to the known values.

Check validity of coordinates on XYZ parse, not frame read

At the moment the XYZ parser does not check the validity of the coordinates as it builds the list of frame offsets. If a long dataset has an invalid frame near the end this means the TCC could run for a long time before failing as it reads the bad frame. Checking the coordinates as the frame offsets are determined would detect this right at the start of the analysis.

This is less important now that output files are written on a per frame basis since most data will still be output on an unexpected exit, just not averages.

It would also be good to have some logic that does not check frames that will not be read. It is possible not to read all frames by using the sample_frequency parameter to not analyse every frame.

13B performance improvement

Can improve speed of 13B detection by looping over mmemsp5c[7A_i_spindle_1] and mmemsp5c[7A_i_spindle_2] instead of all sp5c when selecting 7A_j.

Decide on license for TCC

Ideally the TCC should have a software license before it is released to the public properly. This is not a high priority issue as the repository is private, but it is something that should be decided at some point. There are many options available which we should discuss, perhaps next time we meet with @chryswoods ?

@ursacavebear @merrygoat @FTurci

Add a minimum distance cut off

This would be useful for measuring systems such as ideal gases where we do not want to consider overlapping particles.

Distinguish between isomers of 6A

5A, 6A, 7A have been replaced by sp3c, sp4c and sp5c. While it is good to include these base structures as well, the 6A and 7A in previous versions contained useful point group number conversions to avoid counting them multiple times due to their symmetries. I suggest they be put back in, in addition to the base structures.

Should be able to analyse a subset of clusters

For some analyses only a subset of clusters need to be found. This would speed up analysis.

This would require some sort of interface to select the clusters which are desired and some internal logic to determine which prerequisite clusters need to be calculated to find the selected cluster.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.