GithubHelp home page GithubHelp logo

ricci's People

Contributors

micah541 avatar mwfd541 avatar siudej avatar

Stargazers

 avatar

Watchers

 avatar

ricci's Issues

gcc, OpenMP and Anaconda MKL issue

It seems impossible to run OpenMP 4.0 code generated by gcc inside Anaconda with MKL in Linux. The issue is that MKL is linked against libiomp5, while gcc is using incompatible libgomp. This breaks task depend clauses.

When libgomp is imported before numpy , then depend clauses work, but MKL is no longer multithreaded.

Strangely, there is no conflict on Mac. Maybe Anaconda libiomp5 on Mac does not implement libgomp symbols at all, so the libraries can coexist?

Two possible fixes:

  • Forget depend and revert to taskwait. Then Floyd-Warshall will be slightly slower.
  • Switch to clang-3.8, which generates libiomp5 compatible code. Hopefully this does not break anything else.

In either case, we should use ctypes to import a shared library.

latest commit crashes (in Anaconda 2.3)

I'm getting the following. I've no idea what it means

C:\Users\Micah Warren\Anaconda\Ricci\exp37\clustering\Ricci>python localricci.py
Exception in thread Thread-3:
Traceback (most recent call last):
File "C:\Users\Micah Warren\Anaconda\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Users\Micah Warren\Anaconda\lib\threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "C:\Users\Micah Warren\Anaconda\lib\site-packages\numba\dispatcher.py", line 203, in _explain_matching_error
raise TypeError(msg)
TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(float64, 2d, C), array(float64, 2d, C)
Exception in thread Thread-4:
Traceback (most recent call last):
File "C:\Users\Micah Warren\Anaconda\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Users\Micah Warren\Anaconda\lib\threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "C:\Users\Micah Warren\Anaconda\lib\site-packages\numba\dispatcher.py", line 203, in _explain_matching_error
raise TypeError(msg)
TypeError: No matching definition for argument type(s) array(float64, 2d, C), array(float64, 2d, C), array(float64, 2d, C)

Temporary matrices

We have reached a point where functions are fast enough to fill memory with temporary computations, especially in the main script. For example

  • Ricci is computed using Laplace and 3 other temporary matrices. The three could probably be reduced to 2, but Laplacian matrix might also count as a temporary storage.
  • Localizing kernel is also computed before laplacian, instead of being directly added to Ricci.
  • All these matrices are stored after Ricci is applied.
  • Then we create more matrices to check the progress.

With 1000 points or more we will run into insufficient memory soon.

Thresholds for clusters

Perhaps this could be connected to #15. Definitely to Sam's ideas.

Threshold could be a small number, say 0.001 of the entry value or absolute, and we could walk the matrix as a graph and find connected components based on threshold distance between points. This way all points would start separated (and could be plotted using different colors). Then points would start joining into clusters and fewer colors would show up on the plots. We could possibly see the clusters forming, and later joining together.

At first clusters would not be full graphs, but eventually we could report when they turn into clusters according to our current definition.

Idea to deal with instability

Perhaps we should be embracing the instability as it may help cluster, but in principle we expect that Ricci flow should not behave chaotically when starting near Einstein type metrics. With our approach we are not properly recomputing the implied volume form at each step, and my guess this leads to some of the instability. To counter this, we could try to use Perelman's approach. If I understand this right, we take a weight function f and evolve this by a heat type flow. The weight of each point, which to begin is 1/n, would necessarily evolve. So all of the integrals would evolve, and the Ricci curvature would mellow out a bit.

Some of the mathematical details are in http://arxiv.org/abs/math/0605667v5 in the discussion starting around page 27.

My vague hope is that the Ricci flow breaks pieces apart, but then each part converges to it's own Einstein part. This is the general idea in many geometric flows - in particular geometric flows with surgeries. A simple example is a mean curvature flow that has two bulbs connected along a thin neck. The neck should pinch, but after a small surgery, the two ends should contract to "round" points so that the pieces of the manifold are becoming Einstein. While this is not necessary for clustering, it would be comforting to observe.

metricize uses various names for the same matrix

In metricize we have olddist=dist and the same with d_ij. These do not create copies of the original matrix, just fresh reference (pointer) to the same numpy array.

So the algorithm is changing entries of the matrix and uses the changed entries to compute other entries, in one while run. Also, olddist==dist is automatically true, so while never runs more than once.

Adding np.copy(dist) should fix this, but will also slow down the algorithm significantly. And it is already very slow.

Metricization

Is there an efficient way to metricize the distance function so that it satisfies the triangle inequality? In many tests I'm getting limiting spaces that are nowhere near distance metrics.

Localization

Shouldn't localization be done based on sqdist from previous round? Right now Ricci only modifies points which are originally close.

thresholds

We normalize sqdist in L1, but the thresholds for clustering are set absolutely. So when the matrix gets large, the elements get small and thresholds become too large.

Shouldn't thresholds be relative to initial_L1 norm?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.