GithubHelp home page GithubHelp logo

big matrices? about upsetjs_r HOT 9 CLOSED

upsetjs avatar upsetjs commented on June 14, 2024
big matrices?

from upsetjs_r.

Comments (9)

sgratzl avatar sgratzl commented on June 14, 2024

how many columns / sets does your dataset have? can you give me an example R script that I can use as a starting point?

from upsetjs_r.

Adafede avatar Adafede commented on June 14, 2024

Hi again,

Sorry for the late reply I tried to give you all the needed info here:

## example for S Gratzl

# loading libraries
library(tidyverse)
library(UpSetR)
library(upsetjs)

# loading example files
toyset_1 <- read_delim(
  file = gzfile("~/Downloads/toyset_1.tsv.gz"),
  delim = "\t",
  escape_double = FALSE,
  trim_ws = TRUE
) %>% 
  data.frame()

toyset_2 <- read_delim(
  file = gzfile("~/Downloads/toyset_2.tsv.gz"),
  delim = "\t",
  escape_double = FALSE,
  trim_ws = TRUE
) %>% 
  data.frame()

# for upsetR aesthetics
count <- toyset_1 %>%
  group_by(attribute1) %>%
  count() %>%
  arrange(attribute1)

# upsetR version of toyset_1
## basic
start <- Sys.time()
upset(
  data = toyset_1,
  sets = c(
    "toy1",
    "toy2",
    "toy3",
    "toy4",
    "toy5",
    "toy6"
  ),
  order.by = "freq",
  set_size.show = TRUE,
  set_size.scale_max = 20000,
)
end <- Sys.time()
cat("Plotted  in", format(end - start), "\n")

## advanced (would be really nice to have such coloring options)
start <- Sys.time()
upset(
  data = toyset_1,
  sets = c(
    "toy1",
    "toy2",
    "toy3",
    "toy4",
    "toy5",
    "toy6"
  ),
  query.legend = "top",
  queries = list(
    list(
      query = elements,
      params = list(
        "attribute1",
        c(
          count[1, 1],
          count[2, 1],
          count[3, 1]
        )
      ),
      active = TRUE,
      color = "#b2df8a",
      query.name = "kin"
    ),
    list(
      query = elements,
      params = list(
        "attribute1",
        c(
          count[3, 1],
          count[2, 1]
        )
      ),
      active = TRUE,
      color = "#1f78b4",
      query.name = "ord"
    ),
    list(
      query = elements,
      params = list(
        "attribute1",
        c(count[3, 1])
      ),
      active = TRUE,
      color = "#a6cee3",
      query.name = "spe"
    )
  ),
  order.by = "freq",
  set_size.show = TRUE,
  set_size.scale_max = 20000
)
end <- Sys.time()
cat("Plotted  in", format(end - start), "\n")

## bigger matrix () 209'301 x 33 (still not that big imho)
start <- Sys.time()
upset(
  toyset_2,
  order.by = "freq",
  set_size.show = TRUE,
  set_size.scale_max = 250000
)
end <- Sys.time()
cat("Plotted  in", format(end - start), "\n")

# upsetjs version of toyset_1
## works nicely
start <- Sys.time()
upsetjs() %>% 
  fromDataFrame(toyset_1[,1:6]) %>% 
  interactiveChart()
end <- Sys.time()
cat("Plotted  in", format(end - start), "\n")

# upsetjs version of toyset_2
## last for ages, no idea why... never had the patience to wait until the end
start <- Sys.time()
upsetjs() %>% 
  fromDataFrame(toyset_2) %>% 
  interactiveChart()
end <- Sys.time()
cat("Plotted  in", format(end - start), "\n")

# Thanks a lot

toyset_1.tsv.gz
toyset_2.tsv.gz

If something remains unclear just let me know!

Thank you very much

from upsetjs_r.

sgratzl avatar sgratzl commented on June 14, 2024

one of the reasons is that UpSetJS doesn't automatically limits the number of (visible) sets. Thus, in the second case, UpSetJS tries to compute and render all 33 sets and all their possible combinations.

whereas UpSsetR seems to limit itself to the top 5 sets by default:

image

One possible way to compute the combinations and sets yourself and then use the expression input option (https://upset.js.org/integrations/r/articles/basic.html#expression-input) to give it to UpSet.js

from upsetjs_r.

sgratzl avatar sgratzl commented on June 14, 2024

re coloring: https://upset.js.org/integrations/r/articles/basic.html#queries go into this direction

from upsetjs_r.

Adafede avatar Adafede commented on June 14, 2024

Thanks a lot for your answers!

Regarding the top5, it is the default parameter but you can quickly plot all 33 without problems:

> start <- Sys.time()
> upset(
+   toyset_2,
+   order.by = "freq",
+   set_size.show = TRUE,
+   set_size.scale_max = 250000,
+   nsets = 33
+ )
> end <- Sys.time()
> cat("Plotted  in", format(end - start), "\n")
Plotted  in 2.818649 secs 

from upsetjs_r.

Adafede avatar Adafede commented on June 14, 2024

regarding coloring, I was able to obtain what I wanted thanks to your advice but I am wondering if it would be possible to place the legend elsewhere or to increase export padding since it gets cut when exporting ;(

see here

from upsetjs_r.

sgratzl avatar sgratzl commented on June 14, 2024

re scalability: I'm happy to include any PR that will improve the scalability, see

generateCombinationsImpl = function(sets,
c_type,
min,
max,
empty,
order.by,
limit,
colors = NULL,
symbol = "&") {
combinations = list()
set_f = if (c_type == "union")
union
else
intersect
distinct = (c_type == 'distinctIntersection')
lsets = length(sets)
all_indices = 1:lsets
cc = colorLookup(colors)
for (l in min:(if (is.null(max)) lsets else max)) {
combos = combn(all_indices, l, simplify = FALSE)
for (combo in combos) {
indices = unlist(combo)
set_names = sapply(indices, function(i) sets[[i]]$name)
if (is.list(set_names)) {
set_names = unlist(set_names)
}
if (length(indices) == 0) {
elems = c()
} else {
elems = sets[[indices[1]]]$elems
for (index in indices) {
elems = set_f(elems, sets[[index]]$elems)
}
}
if (distinct) {
not_indices = setdiff(all_indices, indices)
for (index in not_indices) {
elems = setdiff(elems, sets[[index]]$elems)
}
}
if (empty || length(elems) > 0) {
c_name = paste(set_names, collapse = symbol)
combination = structure(list(
name = c_name,
color = cc(c_name),
type = c_type,
elems = elems,
setNames = set_names
),
class = "upsetjs_combination")
combinations = c(combinations, list(combination))
}
}
}
names(combinations) = NULL
sortSets(combinations, order.by, limit)
for computing all combinations. I'm not an expert in R such it is quite procedural approach.

from upsetjs_r.

Adafede avatar Adafede commented on June 14, 2024

maybe have a look at:

https://jokergoo.github.io/ComplexHeatmap-reference/book/upset-plot.html

(make_comb_mat)

or:

https://github.com/hms-dbmi/UpSetR/tree/master/R

(more precisely: https://github.com/hms-dbmi/UpSetR/blob/master/R/upset.R)

from upsetjs_r.

sgratzl avatar sgratzl commented on June 14, 2024

tested with latest v1.9.0:

> toyset_1 <- read_delim(
+   file = gzfile("./r_package/tests/testthat/data/toyset_1.tsv.gz"),
+   delim = "\t",
+   escape_double = FALSE,
+   trim_ws = TRUE
+ ) %>% 
+   data.frame()
> 
> toyset_2 <- read_delim(
+   file = gzfile("./r_package/tests/testthat/data/toyset_2.tsv.gz"),
+   delim = "\t",
+   escape_double = FALSE,
+   trim_ws = TRUE
+ ) %>% 
+   data.frame()
> # for upsetR aesthetics
> count <- toyset_1 %>%
+   group_by(attribute1) %>%
+   count() %>%
+   arrange(attribute1)
> ## basic
> start <- Sys.time()
> upset(
+   data = toyset_1,
+   sets = c(
+     "toy1",
+     "toy2",
+     "toy3",
+     "toy4",
+     "toy5",
+     "toy6"
+   ),
+   order.by = "freq",
+   set_size.show = TRUE,
+   set_size.scale_max = 20000,
+ )
> end <- Sys.time()
> cat("Plotted  in", format(end - start), "\n")
Plotted  in 1.149752 secs 
> ## works nicely
> start <- Sys.time()
> upsetjs() %>%
+   fromDataFrame(toyset_1[,1:6], c_type="distinctIntersection") %>%
+   interactiveChart()
> end <- Sys.time()
> cat("Plotted  in", format(end - start), "\n")
Plotted  in 0.749186 secs 
> start <- Sys.time()
> upset(
+  toyset_2,
+    order.by = "freq",
+    set_size.show = TRUE,
+    set_size.scale_max = 250000,
+    nsets = 33
+ )
> end <- Sys.time()
> cat("Plotted  in", format(end - start), "\n")
Plotted  in 3.841624 secs 
> start <- Sys.time()
> 
> upsetjs() %>%
+   fromDataFrame(toyset_2, c_type="distinctIntersection", store.elems=FALSE, limit = 40) %>%
+   interactiveChart()
> 
> end <- Sys.time()
> cat("Plotted  in", format(end - start), "\n")
Plotted  in 3.92589 secs

toyset_1
upsetR: Plotted in 1.149752 secs
upsetjs: Plotted in 0.749186 secs

toyset_2
upsetR: Plotted in 3.841624 secs
upsetjs: Plotted in 3.92589 secs

from upsetjs_r.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.