GithubHelp home page GithubHelp logo

PAM does not implement PAM about jsat HOT 18 OPEN

kno10 avatar kno10 commented on June 26, 2024
PAM does not implement PAM

from jsat.

Comments (18)

EdwardRaff avatar EdwardRaff commented on June 26, 2024

Can you provide a link/reference to what you think is the correct way to do PAM?

from jsat.

kno10 avatar kno10 commented on June 26, 2024

BUILD and SWAP, as in the PAM paper.

Kaufman, L. and Rousseeuw, P.J. (1987), Clustering by means of Medoids, in Statistical Data Analysis Based on the L1–Norm and Related Methods, edited by Y. Dodge, North-Holland, 405–416.

BUILD is a greedy initialization that always adds the medoid that reduces TD most.
SWAP searches the pair of objects (medoid,non-medoid), such that replacing an existing medoid with a non-medoid (hence keeping the number of clusters constant) yields the maximum decrease in TD; so similar to a steepest-descent approach. When no decrease can be found, the algorithm stops.
As you can see, this is fairly different from k-means.

This finds much better results than the naive k-means-style approach.

from jsat.

EdwardRaff avatar EdwardRaff commented on June 26, 2024

I think what is in the paper is equivalent to the code I have for PAM.

BUILD basically describes just the seed selection step, where the first seed is the medoid, and subsequent seeds are selected in a kind of greedy fashion (its not clear to me from the original paper exactly how that happens). While I don't have this implemented in the exact same style, I would suspect Kmeans++ selection to produce good results of comparable initialization.

the SWAP selects points which should move clusters with the most "profitable" pair, and "do not depend upon the order", which means its not a greedy selection updated on-the-fly, but after checking all points. This is what my current code does with the "k-means" style as you call it.

Ignoring the exact nature of the BUILD step, since its not super precisely specified, can you help me understand a scenario where what is implemented does not produce the same result of PAM as described in the paper?

from jsat.

kno10 avatar kno10 commented on June 26, 2024

No, it is not equivalent.
The SWAP procedure can reassign points to other clusters when swapping. Hence it finds much better improvements, in my experiments the results were even significantly better. Because of the discrete nature of the median, the k-means-style approach does not work well; it gets stuck easily.
BUILD is very clearly specified - greedily add the next medoid that minimizes TD - and it is also very important. BUILD itself usually finds a better solution than the entire k-means-style procedure, and the PAM authors suggested that often you will not need to run SWAP at all. If you are missing detail in the first paper, you can check out the corresponding chapter in the later book:
http://dx.doi.org/10.1002/9780470316801.ch2

from jsat.

EdwardRaff avatar EdwardRaff commented on June 26, 2024

from jsat.

kno10 avatar kno10 commented on June 26, 2024

The new medoid will always be chosen as to cover all currently assigned points.
This misses good medoid candidates when points can be easily moved to neighboring clusters instead. The SWAP procedure does not have this restriction, hence it finds better (and more) changes.
Just benchmark. Get one of the orlib data sets such as kmed20, and try running your procedure, compute the cost. Then try the original PAM from the R cluster package (eventually translated from Fortran to C, but derived from the original code), and it will find substantially better solutions.
I am preparing a paper right now that will include a very simple example for this (which should also prove that in the worst case, the k-means-style approach can be arbitrarily worse than SWAP).

from jsat.

EdwardRaff avatar EdwardRaff commented on June 26, 2024

Erich, I’m just not on the same page in understanding how the SWAP procedure works. If your paper has this in more explicit detail I’m happy to add it to my reading list.

At the moment, I don’t plan on doing the tests you suggest. It’s a good amount of work and I’ve got other items higher on my property list for JSAT. Especially since TRIKMEDS is implemented with the same kind of approach as the current PAM, but with n sqrt(n) complexity. Their paper also mentions that it is the same approach as PAM, so I'm getting conflicting information on this.

If you have tested this previously and have this information well documented, and are interested in doing a pull request, I'm happy to work with you on that!

from jsat.

kno10 avatar kno10 commented on June 26, 2024

My suggestion is to rename the class for now, and adjust documentation, to make clear it is not PAM.

The algorithm you implemented is known in facility location literature by the name "alternate", because it has these two alternating steps (and e.g. Teitz and Bart 1968, but also several other authors in that domain observed that it performs rather poor).
So AlternateKMedians may be an appropriate name; or just KMedians, because that is how the ESL book introduced kmedians as a variation of kmeans (they don't talk about quality, though...).

In my experiments, on the ORlib problems, the "alternate" algorithm with k-means++ initialization at 37%. BUILD alone is 3.14% random, BUILD+PAM is 0.51%; where 0% means always finding the optimum solution (which is known for the ORlib problems), and where 100% is the average of 100x picking medoids randomly. Doing the k-means++-Alternate combination 10 times, and keeping the best, gets you down to 9% on average, but there is a large standard deviation on the data sets, too.

from jsat.

kno10 avatar kno10 commented on June 26, 2024

Thanks for pointing out the TRIKMEDS paper. I wasn't aware of it, but I have seen other papers of the authors, and they were very good; based on that and the abstract I can imagine how they use metric bounds here. They also confirm some observation that I made (that Park's initialization performs poorly), so I'll definitely cite this. I cannot include it in my current experiments, because my data is given as a graph, and supposedly does not fulfill the triangle inequality.
They only mention PAM once, and do not claim they are doing the same. On the contrary, they write "In this paper we do not compare cluster qualities of previous algorithms, but focus on accelerating the lloyd equivalent for K-medoids"; so they are aware that PAM works differently. Also beware that "exact" only means they find the exact 1-medoid of each partition, not the optimum k-medoids partitioning which is NP-hard.
In the (cited) NIPS 2017 paper "K-Medoids For K-Means Seeding" they observe that CLARANS yields significantly better results than lloyd-style k-medoids.

from jsat.

kno10 avatar kno10 commented on June 26, 2024

I just noticed that they also give an example (technically not really for k-medoids though) where CLARANS can reach a better optimum by swapping than the lloyd-style approach. They use it for motivating to use CLARANS for initializing k-means. Figure 1 in https://papers.nips.cc/paper/7104-k-medoids-for-k-means-seeding.pdf

from jsat.

EdwardRaff avatar EdwardRaff commented on June 26, 2024

I am not going to introduce a breaking change like renaming a file without understanding the situation.

I really appreciate your interest, but I am simply not making any changes until I understand what is "wrong". Again, if you are aware of any resources that provide more detailed math/pesudo code as to the nature of SWAP and BUILD, I'm happy to add them to my reading list. I'm afraid from your descriptions in this thread/the original paper, I don't understand what the distinction is on SWAP or how I would implement it. Perhaps your background/training makes this all very obvious to you, and if you would like to write code to fix this and show the difference in results, I'm happy to work with you on merging it. But it is simply not obviously/clear to me from what I have read thus far.

from jsat.

kno10 avatar kno10 commented on June 26, 2024
  • The Wikipedia article describes swap in very easy language:
    https://en.wikipedia.org/wiki/K-medoids
    It's literally trying every combination of medoid and non-medoid and check which swap (making the medoid a regular data point, and using the regular data point as new medoid instead) yields the best decrease in cost, reassigning all points during this process (not as with k-means only considering points in the current cluster). But it doesn't discuss how to implement it efficiently.
  • The classic textbook Han, Pei, Kamber "Data Mining: Concepts and Techniques" also explains PAM briefly, but not completely correct (its not randomly picking objects, its not random started). It discusses the different cases that you need to handle.
  • The CLARANS paper also includes a detailed (2 page?) explanation of the PAM (because they do a randomized approximation to PAM), again they ignore the starting conditions not being random:
    Ng, R. T., & Han, J. (2002). CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge & Data Engineering, (5), 1003-1016.

Likely the best source remains the book by the authors... I'm sure you can find a way to access it.

Given that your "PAM" currently isn't "PAM", I'd argue that renaming wouldn't break more than what already is broken. ;-) In particular since the quality of the results is pretty poor with the k-means-style algorithm.

from jsat.

EdwardRaff avatar EdwardRaff commented on June 26, 2024

I read the wiki page, and did not find it to be very comprehensible. But what you've just written is much more cogent. However, it also does not match what is listed as the PAM algorithm in the book you just mentioned, which I copy below:
Screen Shot 2019-12-29 at 3 28 30 PM

The above also doesn't match the verbiage of the original paper which states order does not matter, as this is a randomized process.

from jsat.

kno10 avatar kno10 commented on June 26, 2024

Yes, PAM tries all of them, not randomly; and it doesn't include BUILD.
Randomly is what CLARANS does.
I only checked the google books preview, and saw that it discusses some four cases that you need to handle when computing the cost in line (5).

from jsat.

kno10 avatar kno10 commented on June 26, 2024

But you see how many wrong versions of PAM are in literature and software; which is really annoying when you want to benchmark and compare them, and half of them turn out to be a different algorithm, which is fast but yields much worse results.

from jsat.

EdwardRaff avatar EdwardRaff commented on June 26, 2024

If you are trying to benchmark and compare, I totally understand the frustration. But please also understand from my side, I have no clarity one what is the real PAM at the moment. Should I trust you, or this book, or poorly worded original paper, or some other paper? So I will not be changing anything for some time as I both 1) do my own research on this, and 2) adress other items on my TODO list that are higher priorities.

from jsat.

kno10 avatar kno10 commented on June 26, 2024

You can check out the R version that is as far as I can tell derived from the original Fortran code, but now includes extensions (pamonce >0) and is not very readable (likely because of fortran-to-c code style, as you can tell from the use of "goto L60").
https://github.com/cran/cluster/blob/45695c0f1232711cd34bcaeba975f8a552e69f52/src/pam.c#L428-L445
https://github.com/cran/cluster/blob/45695c0f1232711cd34bcaeba975f8a552e69f52/src/pam.c#L469-L490
Or you can go to the actual Fortran version:
https://github.com/cran/cluster/blob/25f9de803433055c4632edf5f1997758b73970b8/src/pam.f
There you can see exactly what BUILD and SWAP are supposed to do.

from jsat.

kno10 avatar kno10 commented on June 26, 2024

And sorry, I beg to disagree that the original paper is unclear or poorly worded.
Page 7 describes the algorithm quite precisely; there just is no pseudocode.
It clearly states that BUILD selects the object "which decreases the sum (over all objects) of the dissimilarities to the most similar selected object as much as possible", i.e. the one which minimizes the total deviation if you add it to the medoid set.
Similarly for SWAP, it states "considering all pairs of object (i,h) for which object i has been selected and h has not. It is determined what effect is obtained when a swap is carried out; i.e. when i is no longer selected as representative object but object h is. The most profitable pair is then indeed swapped."
What seems to be missing from a quick glance is the discussion how to implement this more efficiently than for each of the k*(n-k) combinations computing the cost in k*(n-k), but in 'just' O(k(n-k)²) by storing the nearest and second nearest medoid each.

from jsat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.