GithubHelp home page GithubHelp logo

Comments (3)

dm13450 avatar dm13450 commented on August 31, 2024

Hey, thanks for using the package.

dp$numberClusters is the number of clusters for the last iteration of the fitting process, so will vary, it's just one sample of the DP process. As its a Bayesian model, each data point will have a distribution of likely cluster assignment, rather than just one label.

Does that help?

from dirichletprocess.

wseis avatar wseis commented on August 31, 2024

Hi, thanks for your answer. So when I understood correctly, ``dp$numberClusters` refers only to the last iteration of the MCMC fit. Given that, how can I determine the most likely cluster label for a given data point? In your blog about "Dirichlet Process Cluster Probabilities," you use

numClusters <- dp$numberClusters
clusterLabelProbs <- matrix(nrow=nrow(faithfulTrans), ncol=numClusters + 1)

to determine the required number of columns. As you see in my case, this won't work since the number of clusters strongly varies between iterations.

Does it make sense that when I have n-iterations for my cluster labels contained in the dp$labelsChain, to extract always i-th element, for the i-th data point and use the most frequently occurring cluster assignment?

Something like:

library(purrr) # includes the map function to extract ith element from a list of lists
labels1 <- list()

for(i in 1:length(dp$clusterLabels)){
labels1[[as.character(i)]] <- as.numeric(rownames(as.matrix(sort(table(unlist(map(dp$labelsChain, i))),
decreasing = T)))[1])

unlist(labels1)

Moreover, I have a similar problem with the ClusterLabelPredict function.

When I apply it twice to the same newData and the same fitted dp-object I get varying results. I assume this is also because it is a random draw from the posterior. However, when I repeat the prediction several times and take the most frequent clusterLabel prediction is stabilized, something like:

## Generating prediction for first two rows of the training set
prediction1 <- replicate(1000, list(ClusterLabelPredict(dp, newData = dp$data[1:2,])))
prediction2 <- replicate(1000, list(ClusterLabelPredict(dp, newData = dp$data[1:2,])))

ยด## Extracting Frequencies for newData item 2`

t1 <- sort(table(unlist(map(map(prediction1, 1),2))), decreasing = T)[1]
t2 <- sort(table(unlist(map(map(prediction2, 1),2))), decreasing = T)[1]

Or is there any more convenient way to extract the most likely cluster label?

from dirichletprocess.

dm13450 avatar dm13450 commented on August 31, 2024

Yeah everything above is correct, you ClusterLabelPredict is just one sample from the posterior and therefore need to do multiple draws, like you have.

Unfortunately, nothing more convenient, but your code looks fine.

from dirichletprocess.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.