GithubHelp home page GithubHelp logo

cmap / cmapr Goto Github PK

View Code? Open in Web Editor NEW
83.0 20.0 34.0 8.76 MB

Tools for manipulating annotated data matrices

License: BSD 3-Clause "New" or "Revised" License

R 99.85% Dockerfile 0.15%
bioconductor cmap bioinformatics

cmapr's People

Contributors

dedavison avatar grimbough avatar jananiravi avatar jasiedu avatar kant avatar nturaga avatar nuno-agostinho avatar oena avatar oganm avatar tnat1031 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cmapr's Issues

subset.gct is not available

Hello, the program subset.gct is not available in the recent version of cmapR. Can you please provide an alternative for this?

Problem regarding get a column metadata

Hi:

I try your example code to get column metadata from the sample .gctx file. It gives me an error as follow:
Error in withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) :
invalid multibyte string at '<84>'
I am not sure what happens?

Level 4 replicates do not match a specific Level 5 signature

I am trying to plot some genes using data level 4 for my compound (BRD-K55591206) on HepG2 cells.

There are two signatures with HepG2 cells at level 5:
LJP008_HEPG2_24H:J01
POL001_HEPG2_24H:J09
To make sure I was using the same data from these level 5 signatures I checked the replicates at level 4 of each of these signatures above. The average of the two LJP008 experiments (distil_ids: LJP008_HEPG2_24H_X2_B20:J01|LJP008_HEPG2_24H_X3_B20:J01) matches the signature of each gene at level 5. Perfect.

However, the level 4 data for signature POL001_HEPG2 (distil_ids: POL001_HEPG2_24H_X1.L2_B23:J07|POL001_HEPG2_24H_X2.L2_B23:J07|POL001_HEPG2_24H_X3.L2_B23:J07) does not match level 5.

If we use the NAT2 gene as an example, we have the following level 5 value: 0.004413

On the other hand, the values for the level 4 replicates are:
POL001_HEPG2_24H_X1.L2_B23:J07 = -0.386299998
POL001_HEPG2_24H_X2.L2_B23:J07 = 0.110600002
POL001_HEPG2_24H_X3.L2_B23:J07 =0.38409999
The avg 0.036133 does not match level 5 0.004413

The compound is BRD-K55591206, 10 µM, 24 h.

Why don’t they match?

I am using cmapR to retrieve the data from these files:
https://clue.io/releases/data-dashboard
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/level5/level5_beta_trt_cp_n720216x12328.gctx
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/level4/level4_beta_all_n3026460x12328.gctx
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/siginfo_beta.txt
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/instinfo_beta.txt

Thank you,
Alex

parse.gctx error

I'm using GSE92743_Broad_Affymetrix_training_Level3_Q2NORM_n100000x12320.gctx file. I can successfully read the file using RStudio in Linux. The problem occurs when I run the source code containing a call for parse.gctx function by using Rscript command in linux terminal. The error is shown below.

$ Rscript thecode.R

Error in initialize(value, ...) :
invalid names for slots of class “GCT”: set_annot_rownames, matrix_only
Calls: source ... -> -> initialize -> initialize
Execution halted

I call the parse.gctx in the source code as follows
parse.gctx(fname = ds_file, rid = dataidx$testidx, matrix_only = T)

It'is totally running in RStudio, I used several times. But in linux terminal, I got that error. Why is the reason?

tutorial section: Parsing a subset of a GCTX file

In the tutorial, section "Parsing a subset of a GCTX file":
sig_ids <- col_meta$sig_id[idx]
This will not work because two lines earlier the metadata file has been loaded with row.names=1. So, there is no column sig_id anymore! Therefore sig_ids will be NULL (empty).

One solution would be to replace
sig_ids <- col_meta$sig_id[idx]
with
sig_ids <- row.names(col_meta[idx,]).

sig_ids is now a list of rownames.

Error reading GCT downloaded from CLUE

Hi CMap team,

I am trying to access results from a CMap batch query. I downloaded them and read them into R using cmapR::parse.gctx( my_gct_path ), but parse.gctx errs, saying invalid class “GCT” object: cid must be unique. I don't want to upload the whole file, but I find I can reproduce the error with just the head:

demo.txt

Is the error because of all the ... in the metadata? Is this file (the start of) a valid GCT file?

Thanks for your help! I'm really excited to try using CMap on my data.

parse.gctx() dimnames error

I am getting an error in parse.gctx() which I can't figure out.
I run parse.gctx("../fpkm_limit_baseline_cd138.gct") and get this error:

parsing as GCT v1.2
../fpkm_limit_baseline_cd138.gct 55861 rows, 770 cols, 0 row descriptors, 0 col descriptors
Error in dimnames(mat) = list(rid, cid) :
length of 'dimnames' [2] not equal to array extent
In addition: Warning messages:
1: In matrix(mat, nrow = nrmat, ncol = ncmat + nrhd + col_offset, byrow = TRUE) :
data length [43068831] is not a sub-multiple or multiple of the number of columns [772]
2: In matrix(as.numeric(mat[, (1 + col_offset):ncol(mat)]), nrow = nrmat, :
NAs introduced by coercion

Any ideas how to get past this? My file is a gct not gctx file. It's extension is .gct, and here is a snapshot of the first few rows/columns:

image

I am using cmapR version 1.0.1, and here is my session info:

$platform
[1] "x86_64-apple-darwin15.6.0"

$arch
[1] "x86_64"

$os
[1] "darwin15.6.0"

$system
[1] "x86_64, darwin15.6.0"

$status
[1] ""

$major
[1] "3"

$minor
[1] "5.1"

$year
[1] "2018"

$month
[1] "07"

$day
[1] "02"

$svn rev
[1] "74947"

$language
[1] "R"

$version.string
[1] "R version 3.5.1 (2018-07-02)"

$nickname
[1] "Feather Spray"

parse.gctx fails to work unless user explicitly import rhdf5

Contrary to prior behaviour, where cmapR auto-magically imports rhdf5, in the latest iteration, it fails to do so, and when executing parse.gctx, it returns with an error:

Error in h5closeAll() : could not find function "h5closeAll"

For the uninitiated, they may not realize this is an rhdf5 function, which wouldn't be great for first-time users.

exported GCT does not contain all metadata tracks

I'm using write_gct() to export a GCT object with 8 tracks of column metadata. When I open the exported file, only one metadata track is displayed (in additional to column names), even though the dimensions in the file denote 8.

Inspecting the GCT object itself shows that all 8 tracks are there in cdesc. Any idea why this might be happening? Thanks!

merge.gct should NA pad

if rows/columns don't align, should fill with NA instead of reducing to the row/col space of the first GCT object

Installing cmapR requires R >= 4.0

Hello,

I'm trying to install cmapR on R 3.6.2 and I get this error:

devtools::install_github("cmap/cmapR")
Downloading GitHub repo cmap/cmapR@master
✓ checking for file ‘/tmp/RtmpUSPwSG/remotes10de422e46bd/cmap-cmapR-f4d3879/DESCRIPTION’ ...
─ preparing ‘cmapR’:
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ looking to see if a ‘data/datalist’ file should be added
─ building ‘cmapR_0.99.18.tar.gz’
Warning: invalid uid value replaced by that for user 'nobody'
Warning: invalid gid value replaced by that for user 'nobody'
Installing package into ‘/home/ferreen2/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
ERROR: this R is version 3.6.2, package 'cmapR' requires R >= 4.0
Error: Failed to install 'cmapR' from GitHub:
(converted from warning) installation of package ‘/tmp/RtmpUSPwSG/file10de472bd25dd/cmapR_0.99.18.tar.gz’ had non-zero exit status

As of today, R 4.0 has not even be released yet.
Can you please let me know how can I install the package on R 3.6.2?

Thank you,
Enrico

cmapR in CRAN or Bioconductor

I need to parse GCTX files in my own BIoconductor package, but their guidelines only allow to depend on packages available on CRAN and Bioconductor.

As a workaround, I directly include your code to parse those GCTX files, while crediting your GitHub repository for that part of the code in the documentation of the respective functions. However, I would much prefer if you could publish your package so I could simply use the functions I need.

Do you have any plans on distributing cmapR through either CRAN or Bioconductor? Thank you!

suggestion on simple installation

It may be even easier to install using devtools:

install.packages("devtools") ## if needed
devtools::install_github("cmap/cmapR")

parse_gctx throwing error which I am not able to understand

parse_gctx is not able to read the gct file for me with this command. cmap version is cmapR_1.9.0 .

parse_gctx("BLM_Aged_n83x18577.gct")
parsing as GCT v1.2
BLM_Aged_n83x18577.gct NA rows, NA cols, 0 row descriptors, 0 col descriptors
Error in matrix(m, nrow = nrmat, ncol = ncmat + nrhd + col_offset, byrow = TRUE) :
invalid 'nrow' value (too large or NA)

sessioninfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] rhdf5_2.38.1 CePa_0.8.0 shiny_1.7.1 gridExtra_2.3 biomaRt_2.50.3
[6] scales_1.2.0 ggpubr_0.4.0.999 ggplot2_3.3.5 tidyr_1.2.0 dplyr_1.0.9
[11] GEOquery_2.62.2 cmapR_1.9.0 GSEABase_1.56.0 graph_1.72.0 annotate_1.72.0
[16] XML_3.99-0.8 AnnotationDbi_1.56.2 IRanges_2.28.0 S4Vectors_0.32.4 Biobase_2.54.0
[21] BiocGenerics_0.40.0 limma_3.50.0

loaded via a namespace (and not attached):
[1] colorspace_2.0-2 ggsignif_0.6.3 ellipsis_0.3.2 cytolib_2.6.2
[5] XVector_0.34.0 GenomicRanges_1.46.1 rstudioapi_0.13 ggrepel_0.9.1
[9] bit64_4.0.5 fansi_1.0.3 xml2_1.3.3 cachem_1.0.6
[13] jsonlite_1.8.0 broom_0.7.12 dbplyr_2.2.0 png_0.1-7
[17] BiocManager_1.30.18 readr_2.1.2 compiler_4.1.2 httr_1.4.3
[21] backports_1.4.1 assertthat_0.2.1 Matrix_1.3-4 fastmap_1.1.0
[25] cli_3.3.0 later_1.3.0 htmltools_0.5.2 prettyunits_1.1.1
[29] tools_4.1.2 igraph_1.2.11 gtable_0.3.0 glue_1.6.2
[33] GenomeInfoDbData_1.2.7 rappdirs_0.3.3 Rcpp_1.0.8.3 carData_3.0-5
[37] jquerylib_0.1.4 vctrs_0.4.1 Biostrings_2.62.0 rhdf5filters_1.6.0
[41] stringr_1.4.0 mime_0.12 lifecycle_1.0.1 rstatix_0.7.0
[45] zlibbioc_1.40.0 RProtoBufLib_2.6.0 hms_1.1.1 promises_1.2.0.1
[49] MatrixGenerics_1.6.0 parallel_4.1.2 SummarizedExperiment_1.24.0 curl_4.3.2
[53] memoise_2.0.1 sass_0.4.1 stringi_1.7.6 RSQLite_2.2.14
[57] flowCore_2.6.0 filelock_1.0.2 GenomeInfoDb_1.30.1 rlang_1.0.2
[61] pkgconfig_2.0.3 matrixStats_0.62.0 bitops_1.0-7 lattice_0.20-45
[65] purrr_0.3.4 Rhdf5lib_1.16.0 bit_4.0.4 tidyselect_1.1.2
[69] magrittr_2.0.3 R6_2.5.1 generics_0.1.2 DelayedArray_0.20.0
[73] DBI_1.1.2 pillar_1.7.0 withr_2.5.0 KEGGREST_1.34.0
[77] abind_1.4-5 RCurl_1.98-1.6 tibble_3.1.7 crayon_1.5.1
[81] car_3.0-12 utf8_1.2.2 BiocFileCache_2.2.1 tzdb_0.2.0
[85] progress_1.2.2 grid_4.1.2 data.table_1.14.2 blob_1.2.3
[89] Rgraphviz_2.38.0 digest_0.6.29 xtable_1.8-4 httpuv_1.6.5
[93] RcppParallel_5.1.5 munsell_0.5.0 bslib_0.3.1

@kant @jasiedu @tnat1031 Could you please help?

melt.gct returns empty data.frame when rdesc and cdesc emtpy

example:

build <- parse.gctx('/cmap/projects/PROS/builds/pros_a_pr500/PROS_A_PR500_LEVEL5_MODZ.ZSPC.COMBAT_n2308x489.gct')
ds_melted <- melt.gct(build)

this fails b/c rdesc and cdesc are empty, so merging the melted matrix with them results in an empty data.frame

Error that vector memory exhausted (limit reached?) occurs when parsing GCTX files

There is a memory exhausted issue when I am trying to parsing a level 5 gctx file.

Error: vector memory exhausted (limit reached?)
Error: Error in h5checktype(). H5Identifier not valid.

And my session info is

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5

I have found some answers to this error. Some suggest to use 64-bit R, but I am using it. Others say setting the memory limit may help, but I have no idea how to do that. I was wondering if anyone might know how to solve this problem. Thanks!

L1000 Inference function

Hi Ted,

I came from the link website--https://clue.io/contest, which mentioned that an implementation of INFERENCE CHALLENGE was in cmapR.

But I did not find the function about that in cmapR. I was wondering if that really implemented in cmapR or you know any other R packages implement it?

Thank you,
Chen

Changing version during write_gct causes "unrecognised format specification '%'"

I identified an error in io.R, that was preventing me from writing out a 1.2 gct file;

Error in sprintf("#1.%d\n%d\t%", ver, nr, nc) : unrecognised format specification '%'

Code:
sprintf("#1.%d\n%d\t%", ver, nr, nc)

Is missing the d after the %:

sprintf("#1.%d\n%d\t%d", ver, nr, nc)

It works for me now, I hope this helps!

MODZ calculation in R?

Hi,

I'm not sure if I should ask this question here, but I wonder if there is any way to calculate MODZ for a certain expression matrix (i.e., from microarray or RNA-seq) if you already have expression values calculated as robust z-score. I've been trying to do spearman pairwise correlations with cor() but I'm not sure if there's any similar code to the one available in matlab.

I want to do this because I have some expression signatures that I want to correlate with L1000 profiles in cMAP database (i.e., to do PCA) and I guess I should have my expression data calculated as MODZ prior comparison.

Thank you very much in advance.

Best
Gema

tutorial error with R-devel ... second read from gctx fails

this is a heads-up concerning runnability with R-devel

following the tutorial code

> col_meta_path <- "GSE70138_Broad_LINCS_sig_info_2017-03-06.txt.gz"

> col_meta <- read.delim(col_meta_path, sep="\t", stringsAsFactors=F)

> # figure out which signatures correspond to vorinostat by searching the 'pert_iname' column
> idx <- which(col_meta$pert_iname=="vorinostat")

> # and get the corresponding sig_ids
> sig_ids <- col_meta$sig_id[idx]

> # read only those columns from the GCTX file by using the 'cid' parameter
> 
> 
> ds_path <- "GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328_2017-0 ..." ... [TRUNCATED] 

> vorinostat_ds <- parse.gctx(ds_path, cid=sig_ids)
reading GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328_2017-03-06.gctx
done

all is well. then

> row_meta_from_gctx <- read.gctx.meta(ds_path, dim="row")
HDF5-DIAG: Error detected in HDF5 (1.8.19) thread 0:
  #000: H5O.c line 249 in H5Oopen(): unable to open object
    major: Symbol table
    minor: Can't open object
  #001: H5O.c line 1366 in H5O_open_name(): unable to open object
    major: Symbol table
    minor: Can't open object
  #002: H5O.c line 1407 in H5O_open_by_loc(): unable to open object
    major: Object header
    minor: Can't open object
  #003: H5Goh.c line 226 in H5O_group_open(): unable to register group
    major: Object atom
    minor: Unable to register new atom
...

attempt to continue session, repeating previously successful read, fails with similar errors.

This error does not occur with R 3.4 ... so it could be an issue with changes to
rhdf5 infrastructure. I'll pass this information to the rhdf5 developer.

If you have fixes against R-devel please let me know.

The sessionInfo was

R Under development (unstable) (2017-11-08 r73690)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.9 (Final)

Matrix products: default
BLAS/LAPACK: /app/intelMKL-2017.0.098_i86-rhel6.0/intelMKL/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale:
[1] C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] cmapR_1.0 data.table_1.10.4-3 rhdf5_2.23.5
[4] rmarkdown_1.8

loaded via a namespace (and not attached):
[1] compiler_3.5.0 backports_1.1.2 magrittr_1.5 rprojroot_1.3-1
[5] tools_3.5.0 htmltools_0.3.6 Rcpp_0.12.14 stringi_1.1.6
[9] knitr_1.18 stringr_1.2.0 digest_0.6.13 evaluate_0.10.1
[13] Rhdf5lib_1.1.5

Error in parse.gctx(gct_file, matrix_only = TRUE) : could not find function "parse.gctx", error message shows up for all other functions too

I recently spent 8+ hours trying to get your package functions to work. At first, all that worked was the parse.gctx function. After reinstalling your package and the other packages necessary, now the parse.gctx function doesnt work either.

library(cmapR)
library(rhdf5)
gct_file <- system.file("/Users/jamesordaya/Downloads/GinnysBioinformaticsFolder/GSE70138_Broad_LINCS_Level3_INF_mlr12k_n78980x22268_2015-06-30.gct", package="cmapR")
(ds <- parse.gctx(gct_file, matrix_only=TRUE))
print(my_ds) is my code. I also tried all other functions which get a similar error message

"Error in parse.gctx(gct_file, matrix_only = FALSE) :
could not find function "parse.gctx""

I dont know what to do. I must be honest, I am still pretty new to R developer packages from GitHub so I may need a bit more direction than others but, like I said, I was able to get the parse function to work before the reinstall and now it has the same errors all the other functions had before (and afterwards).

Questions about connectivity map

Hi. I had a few questions about the Connectivity map, so I contacted the Broad Institute several times through the Contact us session of clue.io but I couldn't get any answer from it. Thus I'm addressing an issue here. Here're my questions below.

  1. There are only 8,969 perturbagens in Touchstone version 1.0 even though there are 51,219 perturbagens in LINCS phase I (GSE92742) data. I think it's because those perturbagens are well annotated with MOAs and protein targets so they're included in the Touchstone reference data. Is it right?

  2. How did you collapse the replicates of a perturbagen into a signature, if it was tested in several time points and several doses? In order to understand how, I've read your paper (Subramanian A, Narayan R, Corsello SM, et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell. 2017;171(6):1437-1452.e17. doi:10.1016/j.cell.2017.10.049) and Connectopedia session in clue.io but I got no information except for information on the process how to collapse the replicates into a signature, which are tested in the same condition (dose and duration of treatment).

  3. It's not working to analyze in the latest version of Touchstone. The 1.0 version of Touchstone works well with the same query as I did in the latest version. The same Error message keeps popping up and it says it may be because of the memory problem.

Due to time lag, I cannot participate in the zoom meetings held during your work hours. Please understand me addressing an issue like this.
Any information you could give me that sheds light on these important issues would be greatly appreciated.

Warning when loading package

Hello, I am getting the following warning when I load cmapR, either directly with library(cmapR) or by importing cmapR as a dependency for an R package:

Warning messages:
1: multiple methods tables found for ‘aperm’ 
2: replacing previous import ‘BiocGenerics::aperm’ by ‘DelayedArray::aperm’ when loading ‘SummarizedExperiment’ 

This is preventing me from having a passing R-CMD-check for an R package I am developing with cmapR as a dependency.

sessionInfo:

R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.0.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cmapR_1.8.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9                  lattice_0.20-45             matrixStats_0.63.0          IRanges_2.30.1              RProtoBufLib_2.8.0         
 [6] bitops_1.0-7                grid_4.2.2                  GenomeInfoDb_1.32.4         stats4_4.2.2                RcppParallel_5.1.6         
[11] zlibbioc_1.42.0             XVector_0.36.0              flowCore_2.8.0              S4Vectors_0.34.0            Matrix_1.5-3               
[16] tools_4.2.2                 Biobase_2.58.0              RCurl_1.98-1.9              DelayedArray_0.22.0         MatrixGenerics_1.8.1       
[21] compiler_4.2.2              BiocGenerics_0.44.0         cytolib_2.8.0               GenomicRanges_1.48.0        SummarizedExperiment_1.26.1
[26] GenomeInfoDbData_1.2.8     

R CMD check fails (issue with 'testthat')

Hello.

I'm trying to follow the instructions on the readme, and I get stuck on the check.

This is the error I get:

'library' or 'require' call not declared from: ‘testthat’
* checking tests ...
  Running ‘testthat.R’
 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  Saving file to foo.gct
  Dimensions of matrix: [10x5]
  Setting precision to 4
  Saved.
  Error in parse(text = lines, n = -1, srcfile = srcfile) : 
    testthat/test_GCT_class.R:3:70: unexpected '{'
  2: 
  3: test_that("GCT throws error if rid/cid not unique character vectors" {
                                                                          ^
  Calls: test_check ... FUN -> with_reporter -> force -> source_file -> parse
  testthat results ================================================================
  OK: 87 SKIPPED: 0 FAILED: 0
  Execution halted
* checking PDF version of manual ... OK
* DONE

I guess there is a comma missing there. I put it, and the check finishes without errors, but I wanted to make sure that it is the right thing to do.

`write.gctx` for incremental output

Dear developers,

Thanks for developing this software. I wonder if the output function write.gctx supports incremental save, so that there is no need to create a GCT object containing full data set in memory. Thanks.

Annotating level 3 or level 4 data

Hi there,

Thanks for sharing the code and it works perfectly for me in terms of manipulating the level 5 data. When I was trying to use the level 3 or level 4 data, it seems that I couldn't find a good source for the annotation file. I was wondering where I can find those annotation files so that I can use other levels of data.

Thanks in advance!

Best,
Jerome

Prada dependency error

I apologize if this has been addressed previously, but I couldn't find it mentioned in the closed or open issues.

BiocManager::install("cmapR")

returns

ERROR: dependency ‘prada’ is not available for package ‘cmapR’

Same outcome with devtools. Tried a direct install of Prada as well.

Any suggestions?

Thanks!

Marc

parse_gctx Error: segfault from C stack overflow

Hi, i need help to use the parsing function of cmapR

I was testing the cmapR library by following the tutorial

and i have problems with the function parse_gctx when i try to parse the small 77kb "modzs_n25x50.gctx" file provided with the cmapR library.

ds_path <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR")
my_ds <- parse_gctx(ds_path)
reading /home/usr/R/x86_64-pc-linux-gnu-library/4.0/cmapR/extdata/modzs_n25x50.gctx
Error: segfault from C stack overflow

same error if i try to parse a subset of the file as described in the tutorial
my_ds_10_columns <- parse_gctx(ds_path, cid=1:10)

I checked my memory usage and it's all set to infinity,

> library(unix)
> rlimit_all() 
$cur
      as     core      cpu     data    fsize  memlock   nofile    nproc    stack 
     Inf        0      Inf      Inf      Inf 67108864     8192    63355  8388608 

$max
      as     core      cpu     data    fsize  memlock   nofile    nproc    stack 
     Inf      Inf      Inf      Inf      Inf 67108864  1048576    63355      Inf 

my operative system is Ubuntu 20.04
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

Thank you

how to retrieve the connectivity score (tau) for any given combination?

I know that for a given L1000 perturbagen, I can get a list of all other perturbagens sorted by CMap connectivity score (tau) in clue.io.

However, that is interactive, and way too tedious to retrieve connectivity scores for more than a few queries. How can I programmatically (i.e. in batch) get the connectivy score list for a perturbagen of interest?

I want to generate a matrix of all possible connectivity scores between any given shRNA knock-downs. This would be a matrix whith 3,799 x 3,799 entries with tau for any combination.

This is not feasible to do with the interactive interface. I am trying to program this and starting from the level 5 files with MODZ scores. Do I have to reproduce the method from the original paper from scratch? Or can the API or cmapR help me a bit with this?

I have contacted L1000 team via web form, received no answer, now reaching out here on GitHub.

Difference between the example object and the one parsed from file

Hi, thank you for the package!

The structure of object from the example is different from a object that is parsed from a file.

In the second I can't retrieve the columns metadata correctly and all the metadata is on the format "REP.A001_A375_24H:A03"

What I am doing wrong?

library(cmapR)
Example_dsPath <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR")
Example_ds <- parse_gctx(Example_dsPath)
reading ......Documents/R/win-library/4.1/cmapR/extdata/modzs_n25x50.gctx
done

GSE70138_Level5_Path <- "./GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx"
GSE70138_Level5_ds <- parse_gctx(GSE70138_Level5_Path)
reading ./GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx
done

Example_ds
Formal class 'GCT' [package "cmapR"] with 7 slots
..@ mat : num [1:50, 1:25] -1.145 -1.165 0.437 0.139 -0.673 ...
.. ..- attr(, "dimnames")=List of 2
.. .. ..$ : chr [1:50] "200814_at" "222103_at" "201453_x_at" "204131_s_at" ...
.. .. ..$ : chr [1:25] "CPC004_PC3_24H:BRD-A51714012-001-03-1:10" "BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625" "CPC006_HT29_24H:BRD-U88459701-000-01-8:10" "CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10" ...
..@ rid : chr [1:50] "200814_at" "222103_at" "201453_x_at" "204131_s_at" ...
..@ cid : chr [1:25] "CPC004_PC3_24H:BRD-A51714012-001-03-1:10" "BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625" "CPC006_HT29_24H:BRD-U88459701-000-01-8:10" "CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10" ...
..@ rdesc :'data.frame': 50 obs. of 6 variables:
.. ..$ id : chr [1:50] "200814_at" "222103_at" "201453_x_at" "204131_s_at" ...
.. ..$ is_bing : int [1:50] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ is_lm : int [1:50] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ pr_gene_id : int [1:50] 5720 466 6009 2309 387 3553 427 5898 23365 6657 ...
.. ..$ pr_gene_symbol: chr [1:50] "PSME1" "ATF1" "RHEB" "FOXO3" ...
.. ..$ pr_gene_title : chr [1:50] "proteasome (prosome, macropain) activator subunit 1 (PA28 alpha)" "activating transcription factor 1" "Ras homolog enriched in brain" "forkhead box O3" ...
..@ cdesc :'data.frame': 25 obs. of 16 variables:
.. ..$ brew_prefix : chr [1:25] "CPC004_PC3_24H" "BRAF001_HEK293T_24H" "CPC006_HT29_24H" "CVD001_HEPG2_24H" ...
.. ..$ cell_id : chr [1:25] "PC3" "HEK293T" "HT29" "HEPG2" ...
.. ..$ distil_cc_q75 : num [1:25] 0.05 0.1 0.17 0.45 0.24 ...
.. ..$ distil_nsample : int [1:25] 5 9 4 3 4 5 2 3 2 2 ...
.. ..$ distil_ss : num [1:25] 2.9 1.88 2.71 4.06 3.83 ...
.. ..$ id : chr [1:25] "CPC004_PC3_24H:BRD-A51714012-001-03-1:10" "BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625" "CPC006_HT29_24H:BRD-U88459701-000-01-8:10" "CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10" ...
.. ..$ is_gold : int [1:25] 0 0 0 1 1 0 1 0 0 0 ...
.. ..$ ngenes_modulated_dn_lm: int [1:25] 11 3 8 38 36 23 12 11 33 13 ...
.. ..$ ngenes_modulated_up_lm: int [1:25] 10 7 25 40 16 17 23 14 37 22 ...
.. ..$ pct_self_rank_q25 : num [1:25] 26.904 17.125 7.06 0.229 4.686 ...
.. ..$ pert_id : chr [1:25] "BRD-A51714012" "BRD-U73308409" "BRD-U88459701" "BRD-U88459701" ...
.. ..$ pert_idose : chr [1:25] "10 M" "500 nM" "10 M" "10 M" ...
.. ..$ pert_iname : chr [1:25] "venlafaxine" "vemurafenib" "atorvastatin" "atorvastatin" ...
.. ..$ pert_itime : chr [1:25] "24 h" "24 h" "24 h" "24 h" ...
.. ..$ pert_type : chr [1:25] "trt_cp" "trt_cp" "trt_cp" "trt_cp" ...
.. ..$ pool_id : chr [1:25] "epsilon" "epsilon" "epsilon" "epsilon" ...
..@ version: chr(0)
..@ src : chr "....../Documents/R/win-library/4.1/cmapR/extdata/modzs_n25x50.gctx"
GSE70138_Level5_ds
Formal class 'GCT' [package "cmapR"] with 7 slots
..@ mat : num [1:12328, 1:118050] 4.2641 0.0572 -1.0125 0.3089 -0.1041 ...
.. ..- attr(
, "dimnames")=List of 2
.. .. ..$ : chr [1:12328] "780" "7849" "2978" "2049" ...
.. .. ..$ : chr [1:118050] "REP.A001_A375_24H:A03" "REP.A001_A375_24H:A04" "REP.A001_A375_24H:A05" "REP.A001_A375_24H:A06" ...
..@ rid : chr [1:12328] "780" "7849" "2978" "2049" ...
..@ cid : chr [1:118050] "REP.A001_A375_24H:A03" "REP.A001_A375_24H:A04" "REP.A001_A375_24H:A05" "REP.A001_A375_24H:A06" ...
..@ rdesc :'data.frame': 12328 obs. of 1 variable:
.. ..$ id: chr [1:12328] "780" "7849" "2978" "2049" ...
..@ cdesc :'data.frame': 118050 obs. of 1 variable:
.. ..$ id: chr [1:118050] "REP.A001_A375_24H:A03" "REP.A001_A375_24H:A04" "REP.A001_A375_24H:A05" "REP.A001_A375_24H:A06" ...
..@ version: chr(0)
..@ src : chr "./GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx"

sessionInfo()
R version 4.1.0 Patched (2021-05-29 r80415)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] cmapR_1.5.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 XVector_0.33.0
[3] GenomicRanges_1.45.0 BiocGenerics_0.39.0
[5] zlibbioc_1.39.0 IRanges_2.27.0
[7] flowCore_2.4.0 lattice_0.20-44
[9] GenomeInfoDb_1.29.0 tools_4.1.0
[11] SummarizedExperiment_1.23.0 parallel_4.1.0
[13] grid_4.1.0 rhdf5_2.36.0
[15] Biobase_2.53.0 matrixStats_0.59.0
[17] RcppParallel_5.1.4 Matrix_1.3-4
[19] GenomeInfoDbData_1.2.6 Rhdf5lib_1.14.0
[21] cytolib_2.4.0 RProtoBufLib_2.4.0
[23] rhdf5filters_1.4.0 S4Vectors_0.31.0
[25] bitops_1.0-7 RCurl_1.98-1.3
[27] DelayedArray_0.19.0 compiler_4.1.0
[29] MatrixGenerics_1.5.0 stats4_4.1.0

Installation fails

Hi,
Trying to install the package on R-3.6.3
Always the same error:
Error: Failed to install 'cmapR' from GitHub: (converted from warning) installation of package ..... had non-zero exit status
How should I solve this problem?

merge.gct error

even when matrix_only = T, merge.gct will add metadata of one GCT object to output. Can be fixed by setting cdesc to an empty dataframe as follows, but shouldn't be occurring in the first place:

overlap_tsne_modz <- merge.gct(merge.gct(xpr_b_for_tsne, xpr_lm_for_tsne,
dimension = "column", matrix_only = T),
xpr_c_for_tsne, dimension = "column", matrix_only = T)
overlap_tsne_modz@cdesc <- data.frame(id=overlap_tsne_modz@cid)
overlap_tsne_modz <- annotate.gct(overlap_tsne_modz, overlap_sigs, dimension="column", keyfield = "sig_id")

Connectivity Score implementation

Hi Ted,

Not really an issue per se, but I was wondering if you have any plans to extend the package to also compute the Connectivity Score as implemented by Subramanian et al., 2017?

Alternatively, are you aware of any other R package or code snippets that implement it?

Thank you,
Enrico

Some questions about function parse_gctx

Hi Ted,

I met some error in reading GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx.
Do you ever meet this kind error or have some advice about this.
Below is my code:

source("/tools/R_pakages/master/cmapR/R/GCT.R")
in method for "coerce" with signature "GCT","SummarizedExperiment": no definition for class
“SummarizedExperiment”
source("/tools/R_pakages/master/cmapR/R/io.R")
source("/tools/R_pakages/master/cmapR/R/utils.R")
source("/tools/R_pakages/master/cmapR/R/data.R")
coef_set <-parse_gctx("/GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx")
reading /GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx
Error: Unable to read dataset.
Not all required filters available.
Missing filters: deflate

Thank you,
Chen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.