utrechtuniversity / scca Goto Github PK
View Code? Open in Web Editor NEWSpectral Clustering Correspondence Analysis
License: Other
Spectral Clustering Correspondence Analysis
License: Other
The raw datasets are not available in the package. Therefore, it is not possible to reproduce the packaging. Can we access the datasets from the web?
In SCCA_compute, when the eigenvalues are computed using the 'trick' (i.e. first doing decomposition of the 'small matrix' to obtain the matrix of vectors U and then obtaining the eigenvectors by pre-multiplying with D^-1 A, so that D^-1 A U = V. This gives the correct eigenvectors but not normalized properly. They should be normalized by setting v = v / (v^T D v ) (where D is the appropriate diagonal matrix with either row or column sums on the diagonal)
Normalization of eigenvectors: The normalization of the eigenvectors is currently done within the create_y function. It would be more efficient to normalize them directly when they are caclulated, so that eigenvectors are also normalized in other outputs. Perhaps even give the 'standardized' vectors (i.e. multiplie normalized eigenvalues by sqrt of eigenvalue).
Computation of eigenvectors: This concerns also the question of symmetry of the Laplacian mentioned in the function. Looking at the rARPACK documentation, it might be computationally efficient to compute the eigenvectors from a symmetric matrix instead of the laplacian, and then post-processing them to obtain eigenvectors of L, as follows:
D^{-1/2}SD^{-1/2} = U \Lambda U^T, where U^T U = I
so that
L = D^{-1} S = D^{-1/2} U \Lambda U^T D^{1/2}
so the right eigenvectors are given by
V = D^{-1/2}U and then normalized setting v = v / sqrt(v^T D v) so that V^T D V = I (as required). Normalization is unnecessary if rARPACK returns normalized eigenvectors (to be checked)
We would like to export a flat file of the tree.
CSV files are in SurfDrive.
See python code (last function in the class).
scca <- scca_compute(carnivora, decomp = 'svds')
Error in fun(A, k, nu, nv, opts, mattype = "dgCMatrix") :
nrow(A) and ncol(A) should be at least 3
In addition: Warning messages:
1: In fun(A, k, nu, nv, opts, mattype = "dgCMatrix") :
all singular values are requested, svd() is used instead
2: In fun(A, k, nu, nv, opts, mattype = "dgCMatrix") :
all singular values are requested, svd() is used instead
Put dataset in folder data/
.
The format should be .RData
.
Do not include the CSV. But include a snippet to convert the data (for example in this issue).
There is a simple trick to publish the vignette on gh-pages (https://hafen.github.io/packagedocs/#more_on_vignettes). This might be our best solution for hosting the vignette. Shall we give this a try? We need to set-up Travis or GH Actions, but this straightforward.
@qubixes You do have experience with gh-pages and Travis, interested in contributing on this?
scca_stability_test implements the overlap measure for stability. There are others: AD, ADM, and FOM. See https://cran.r-project.org/web/packages/clValid/vignettes/clValid.pdf. Should scca_stability_test compute these measure too?
Under what license conditions will we publish our package?
Hi, I succeeded in installing the package. I thought I'd just list here what I ran into:
I can load the data 'carnivora' and 'exports'
The exports dataset seems to contain only the labels (products in entry 1 and countries in entry 2)
Upon running scca_compute on the exprots I get back the labels.
Upon running scca_compute on the carnivora I get the error message 'M must contain row and
column labels'
Let me know if this helps..I might be looking at and older version?
Only the skeleton. The content follows later on.
It look like we could do a bit better on the variability of K-means by setting the nstart parameter to x (which reruns k-means x times and picks the best solution).
Sensitivity tests can then still be done to see how variable te clustering obtained is.
@aljevandam What do you prefer for the trade flow dataset?
unilaterally_trade
?
Warning messages:
1: In max(distance[c1, c1]) : no non-missing arguments to max; returning -Inf
2: In min(distance[c1, c2]) : no non-missing arguments to min; returning I
in:validity <- scca_validity_test(scca = scca_species, dist = d_species)
Should we include the coordinates of carnivora? Is this part of the analysis?
Proposal:
carnivora
and carnivora_coords
) to replace the list structure (complicated for non-R programmers).carnivora
returns the matrix directly.@KvEijden can you give feedback on this?
By Alje:
drop <- sample(ncol(carnivora), ncol(carnivora) %/% 10)
stability <- scca_stability_test(m = carnivora, drop_vars = drop)
Error in clustering_overlap(cl.x, cl.y, plot = plot) :
Clusterings x and y not from the same dataset and category.
Hi Kees, I'm leaving an issue here regarding the vignette. It should be corrected as the part where it shows how to load the package is written:
library(sccar)
and should show instead:
library(SCCA)
Let me know if you want me to correct it.
Sometimes in the clustering process there will be sub matrices with disconnected columns and/or disconnected rows. How to handle these cases?
If disconnections occur, the function will print a warning to the console.
Error: Can't convert to .
Run rlang::last_error()
to see where the error occurred.
15.
stop(fallback)
14.
signal_abort(cnd)
13.
abort(message, class = c(class, "vctrs_error"), ...)
12.
stop_vctrs(message, class = c(class, "vctrs_error_incompatible"),
x = x, y = y, details = details, ...)
11.
stop_incompatible(x, y, x_arg = x_arg, y_arg = y_arg, details = details,
..., message = message, class = c(class, "vctrs_error_incompatible_type"))
10.
stop_incompatible_type(x = x, y = to, ..., x_arg = x_arg, y_arg = to_arg,
action = "convert", details = details, message = message,
class = class)
9.
stop_incompatible_cast(x, to, x_arg = x_arg, to_arg = to_arg,
vctrs:::from_dispatch
= match_from_dispatch(...))
8.
vec_default_cast(x = x, to = to, x_arg = x_arg, to_arg = to_arg,
vctrs:::from_dispatch
= vctrs:::from_dispatch
, vctrs:::df_fallback
= vctrs:::df_fallback
,
vctrs:::s3_fallback
= vctrs:::s3_fallback
)
7.
(function ()
vec_default_cast(x = x, to = to, x_arg = x_arg, to_arg = to_arg,
vctrs:::from_dispatch
= vctrs:::from_dispatch
, vctrs:::df_fallback
= vctrs:::df_fallback
,
vctrs:::s3_fallback
= vctrs:::s3_fallback
))()
6.
vec_cast(fill, val)
5.
pivot_wider_spec(data, spec, !!id_cols, names_repair = names_repair,
values_fill = values_fill, values_fn = values_fn)
4.
tidyr::pivot_wider(data = overlap_xy, names_from = .data$cluster.y,
names_prefix = "y_", values_from = .data$edge, values_fill = list(edge = 0)) at scca_overlap_test.R#187
3.
plot_overlap(cl.xy) at scca_overlap_test.R#157
2.
clustering_overlap(cl.x, cl.y, plot = plot) at scca_overlap_test.R#116
1.
scca_overlap_test(x = scca, y = scca1, plot = TRUE)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.