cmap / cmapr Goto Github PK
View Code? Open in Web Editor NEWTools for manipulating annotated data matrices
License: BSD 3-Clause "New" or "Revised" License
Tools for manipulating annotated data matrices
License: BSD 3-Clause "New" or "Revised" License
According to the manual (https://github.com/cmap/cmapR/blob/master/cmapR_tutorial.ipynb)
mat <- matrix(stats::rnorm(100), ncol=10)
rownames(mat) <- letters[1:10]
colnames(mat) <- LETTERS[1:10]
(my_ds <- new("GCT", mat=mat))
new("GCT", ...)
requires at minimum a matrix as input, but the current version failed with an error Error in grepl(".gct$", src) : argument "src" is missing, with no default
.
R version 3.4.1
cmapR version 1.0
Hello, the program subset.gct is not available in the recent version of cmapR. Can you please provide an alternative for this?
Hi:
I try your example code to get column metadata from the sample .gctx file. It gives me an error as follow:
Error in withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) :
invalid multibyte string at '<84>'
I am not sure what happens?
I am trying to plot some genes using data level 4 for my compound (BRD-K55591206) on HepG2 cells.
There are two signatures with HepG2 cells at level 5:
LJP008_HEPG2_24H:J01
POL001_HEPG2_24H:J09
To make sure I was using the same data from these level 5 signatures I checked the replicates at level 4 of each of these signatures above. The average of the two LJP008 experiments (distil_ids: LJP008_HEPG2_24H_X2_B20:J01|LJP008_HEPG2_24H_X3_B20:J01) matches the signature of each gene at level 5. Perfect.
However, the level 4 data for signature POL001_HEPG2 (distil_ids: POL001_HEPG2_24H_X1.L2_B23:J07|POL001_HEPG2_24H_X2.L2_B23:J07|POL001_HEPG2_24H_X3.L2_B23:J07) does not match level 5.
If we use the NAT2 gene as an example, we have the following level 5 value: 0.004413
On the other hand, the values for the level 4 replicates are:
POL001_HEPG2_24H_X1.L2_B23:J07 = -0.386299998
POL001_HEPG2_24H_X2.L2_B23:J07 = 0.110600002
POL001_HEPG2_24H_X3.L2_B23:J07 =0.38409999
The avg 0.036133 does not match level 5 0.004413
The compound is BRD-K55591206, 10 µM, 24 h.
Why don’t they match?
I am using cmapR to retrieve the data from these files:
https://clue.io/releases/data-dashboard
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/level5/level5_beta_trt_cp_n720216x12328.gctx
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/level4/level4_beta_all_n3026460x12328.gctx
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/siginfo_beta.txt
https://s3.amazonaws.com/macchiato.clue.io/builds/LINCS2020/instinfo_beta.txt
Thank you,
Alex
I'm using GSE92743_Broad_Affymetrix_training_Level3_Q2NORM_n100000x12320.gctx file. I can successfully read the file using RStudio in Linux. The problem occurs when I run the source code containing a call for parse.gctx function by using Rscript command in linux terminal. The error is shown below.
$ Rscript thecode.R
Error in initialize(value, ...) :
invalid names for slots of class “GCT”: set_annot_rownames, matrix_only
Calls: source ... -> -> initialize -> initialize
Execution halted
I call the parse.gctx in the source code as follows
parse.gctx(fname = ds_file, rid = dataidx$testidx, matrix_only = T)
It'is totally running in RStudio, I used several times. But in linux terminal, I got that error. Why is the reason?
In the tutorial, section "Parsing a subset of a GCTX file":
sig_ids <- col_meta$sig_id[idx]
This will not work because two lines earlier the metadata file has been loaded with row.names=1
. So, there is no column sig_id
anymore! Therefore sig_ids
will be NULL (empty).
One solution would be to replace
sig_ids <- col_meta$sig_id[idx]
with
sig_ids <- row.names(col_meta[idx,])
.
sig_ids
is now a list of rownames.
Hi CMap team,
I am trying to access results from a CMap batch query. I downloaded them and read them into R using cmapR::parse.gctx( my_gct_path )
, but parse.gctx
errs, saying invalid class “GCT” object: cid must be unique
. I don't want to upload the whole file, but I find I can reproduce the error with just the head:
Is the error because of all the ...
in the metadata? Is this file (the start of) a valid GCT file?
Thanks for your help! I'm really excited to try using CMap on my data.
Hi,
I used to use the below link but now could not access it:
https://github.com/cmap/cmapR/blob/master/cmapR_tutorial.ipynb
Could you help?
It was describe more details about GSE70138 data
I am getting an error in parse.gctx() which I can't figure out.
I run parse.gctx("../fpkm_limit_baseline_cd138.gct")
and get this error:
parsing as GCT v1.2
../fpkm_limit_baseline_cd138.gct 55861 rows, 770 cols, 0 row descriptors, 0 col descriptors
Error in dimnames(mat) = list(rid, cid) :
length of 'dimnames' [2] not equal to array extent
In addition: Warning messages:
1: In matrix(mat, nrow = nrmat, ncol = ncmat + nrhd + col_offset, byrow = TRUE) :
data length [43068831] is not a sub-multiple or multiple of the number of columns [772]
2: In matrix(as.numeric(mat[, (1 + col_offset):ncol(mat)]), nrow = nrmat, :
NAs introduced by coercion
Any ideas how to get past this? My file is a gct not gctx file. It's extension is .gct, and here is a snapshot of the first few rows/columns:
I am using cmapR version 1.0.1, and here is my session info:
$platform
[1] "x86_64-apple-darwin15.6.0"$arch
[1] "x86_64"$os
[1] "darwin15.6.0"$system
[1] "x86_64, darwin15.6.0"$status
[1] ""$major
[1] "3"$minor
[1] "5.1"$year
[1] "2018"$month
[1] "07"$day
[1] "02"$
svn rev
[1] "74947"$language
[1] "R"$version.string
[1] "R version 3.5.1 (2018-07-02)"$nickname
[1] "Feather Spray"
Contrary to prior behaviour, where cmapR auto-magically imports rhdf5
, in the latest iteration, it fails to do so, and when executing parse.gctx, it returns with an error:
Error in h5closeAll() : could not find function "h5closeAll"
For the uninitiated, they may not realize this is an rhdf5
function, which wouldn't be great for first-time users.
method should be able to append data to an existing gctx file on disk
Hi I also got this installation error, I used to be able to install and use it with no problem. Thank you!
Error: Failed to install 'cmapR' from GitHub:
(converted from warning) unable to access index for repository https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6:
cannot open URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.6/PACKAGES'
Line 439 in b804d22
I'm using write_gct() to export a GCT object with 8 tracks of column metadata. When I open the exported file, only one metadata track is displayed (in additional to column names), even though the dimensions in the file denote 8.
Inspecting the GCT object itself shows that all 8 tracks are there in cdesc. Any idea why this might be happening? Thanks!
The documentation here doesn't include a link to a file or file repository.
if rows/columns don't align, should fill with NA instead of reducing to the row/col space of the first GCT object
Hello,
I'm trying to install cmapR on R 3.6.2 and I get this error:
devtools::install_github("cmap/cmapR")
Downloading GitHub repo cmap/cmapR@master
✓ checking for file ‘/tmp/RtmpUSPwSG/remotes10de422e46bd/cmap-cmapR-f4d3879/DESCRIPTION’ ...
─ preparing ‘cmapR’:
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ looking to see if a ‘data/datalist’ file should be added
─ building ‘cmapR_0.99.18.tar.gz’
Warning: invalid uid value replaced by that for user 'nobody'
Warning: invalid gid value replaced by that for user 'nobody'
Installing package into ‘/home/ferreen2/R/x86_64-pc-linux-gnu-library/3.6’
(as ‘lib’ is unspecified)
ERROR: this R is version 3.6.2, package 'cmapR' requires R >= 4.0
Error: Failed to install 'cmapR' from GitHub:
(converted from warning) installation of package ‘/tmp/RtmpUSPwSG/file10de472bd25dd/cmapR_0.99.18.tar.gz’ had non-zero exit status
As of today, R 4.0 has not even be released yet.
Can you please let me know how can I install the package on R 3.6.2?
Thank you,
Enrico
I need to parse GCTX files in my own BIoconductor package, but their guidelines only allow to depend on packages available on CRAN and Bioconductor.
As a workaround, I directly include your code to parse those GCTX files, while crediting your GitHub repository for that part of the code in the documentation of the respective functions. However, I would much prefer if you could publish your package so I could simply use the functions I need.
Do you have any plans on distributing cmapR through either CRAN or Bioconductor? Thank you!
It may be even easier to install using devtools:
install.packages("devtools") ## if needed
devtools::install_github("cmap/cmapR")
for example, when updating cid, this method should handle updating the matrix column names and the id field in cdesc
parse_gctx is not able to read the gct file for me with this command. cmap version is cmapR_1.9.0 .
parse_gctx("BLM_Aged_n83x18577.gct")
parsing as GCT v1.2
BLM_Aged_n83x18577.gct NA rows, NA cols, 0 row descriptors, 0 col descriptors
Error in matrix(m, nrow = nrmat, ncol = ncmat + nrhd + col_offset, byrow = TRUE) :
invalid 'nrow' value (too large or NA)
sessioninfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] rhdf5_2.38.1 CePa_0.8.0 shiny_1.7.1 gridExtra_2.3 biomaRt_2.50.3
[6] scales_1.2.0 ggpubr_0.4.0.999 ggplot2_3.3.5 tidyr_1.2.0 dplyr_1.0.9
[11] GEOquery_2.62.2 cmapR_1.9.0 GSEABase_1.56.0 graph_1.72.0 annotate_1.72.0
[16] XML_3.99-0.8 AnnotationDbi_1.56.2 IRanges_2.28.0 S4Vectors_0.32.4 Biobase_2.54.0
[21] BiocGenerics_0.40.0 limma_3.50.0
loaded via a namespace (and not attached):
[1] colorspace_2.0-2 ggsignif_0.6.3 ellipsis_0.3.2 cytolib_2.6.2
[5] XVector_0.34.0 GenomicRanges_1.46.1 rstudioapi_0.13 ggrepel_0.9.1
[9] bit64_4.0.5 fansi_1.0.3 xml2_1.3.3 cachem_1.0.6
[13] jsonlite_1.8.0 broom_0.7.12 dbplyr_2.2.0 png_0.1-7
[17] BiocManager_1.30.18 readr_2.1.2 compiler_4.1.2 httr_1.4.3
[21] backports_1.4.1 assertthat_0.2.1 Matrix_1.3-4 fastmap_1.1.0
[25] cli_3.3.0 later_1.3.0 htmltools_0.5.2 prettyunits_1.1.1
[29] tools_4.1.2 igraph_1.2.11 gtable_0.3.0 glue_1.6.2
[33] GenomeInfoDbData_1.2.7 rappdirs_0.3.3 Rcpp_1.0.8.3 carData_3.0-5
[37] jquerylib_0.1.4 vctrs_0.4.1 Biostrings_2.62.0 rhdf5filters_1.6.0
[41] stringr_1.4.0 mime_0.12 lifecycle_1.0.1 rstatix_0.7.0
[45] zlibbioc_1.40.0 RProtoBufLib_2.6.0 hms_1.1.1 promises_1.2.0.1
[49] MatrixGenerics_1.6.0 parallel_4.1.2 SummarizedExperiment_1.24.0 curl_4.3.2
[53] memoise_2.0.1 sass_0.4.1 stringi_1.7.6 RSQLite_2.2.14
[57] flowCore_2.6.0 filelock_1.0.2 GenomeInfoDb_1.30.1 rlang_1.0.2
[61] pkgconfig_2.0.3 matrixStats_0.62.0 bitops_1.0-7 lattice_0.20-45
[65] purrr_0.3.4 Rhdf5lib_1.16.0 bit_4.0.4 tidyselect_1.1.2
[69] magrittr_2.0.3 R6_2.5.1 generics_0.1.2 DelayedArray_0.20.0
[73] DBI_1.1.2 pillar_1.7.0 withr_2.5.0 KEGGREST_1.34.0
[77] abind_1.4-5 RCurl_1.98-1.6 tibble_3.1.7 crayon_1.5.1
[81] car_3.0-12 utf8_1.2.2 BiocFileCache_2.2.1 tzdb_0.2.0
[85] progress_1.2.2 grid_4.1.2 data.table_1.14.2 blob_1.2.3
[89] Rgraphviz_2.38.0 digest_0.6.29 xtable_1.8-4 httpuv_1.6.5
[93] RcppParallel_5.1.5 munsell_0.5.0 bslib_0.3.1
example:
build <- parse.gctx('/cmap/projects/PROS/builds/pros_a_pr500/PROS_A_PR500_LEVEL5_MODZ.ZSPC.COMBAT_n2308x489.gct')
ds_melted <- melt.gct(build)
this fails b/c rdesc and cdesc are empty, so merging the melted matrix with them results in an empty data.frame
There is a memory exhausted issue when I am trying to parsing a level 5 gctx file.
Error: vector memory exhausted (limit reached?)
Error: Error in h5checktype(). H5Identifier not valid.
And my session info is
> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5
I have found some answers to this error. Some suggest to use 64-bit R, but I am using it. Others say setting the memory limit may help, but I have no idea how to do that. I was wondering if anyone might know how to solve this problem. Thanks!
Hi,
I want to read whole matrix of level 3 of LINCS dataset, but it's to large to fit to my RAM. I was wondering if there is any trick (e.g., spark) to handle such large data.
Hi Ted,
I came from the link website--https://clue.io/contest, which mentioned that an implementation of INFERENCE CHALLENGE was in cmapR.
But I did not find the function about that in cmapR. I was wondering if that really implemented in cmapR or you know any other R packages implement it?
Thank you,
Chen
I identified an error in io.R, that was preventing me from writing out a 1.2 gct file;
Error in sprintf("#1.%d\n%d\t%", ver, nr, nc) : unrecognised format specification '%'
Code:
sprintf("#1.%d\n%d\t%", ver, nr, nc)
Is missing the d after the %:
sprintf("#1.%d\n%d\t%d", ver, nr, nc)
It works for me now, I hope this helps!
right now, the GCT initialize method doesn't do any checking to ensure this if the user passes a predefined rdesc or cdesc object
Hi,
I'm not sure if I should ask this question here, but I wonder if there is any way to calculate MODZ for a certain expression matrix (i.e., from microarray or RNA-seq) if you already have expression values calculated as robust z-score. I've been trying to do spearman pairwise correlations with cor() but I'm not sure if there's any similar code to the one available in matlab.
I want to do this because I have some expression signatures that I want to correlate with L1000 profiles in cMAP database (i.e., to do PCA) and I guess I should have my expression data calculated as MODZ prior comparison.
Thank you very much in advance.
Best
Gema
this is a heads-up concerning runnability with R-devel
following the tutorial code
> col_meta_path <- "GSE70138_Broad_LINCS_sig_info_2017-03-06.txt.gz"
> col_meta <- read.delim(col_meta_path, sep="\t", stringsAsFactors=F)
> # figure out which signatures correspond to vorinostat by searching the 'pert_iname' column
> idx <- which(col_meta$pert_iname=="vorinostat")
> # and get the corresponding sig_ids
> sig_ids <- col_meta$sig_id[idx]
> # read only those columns from the GCTX file by using the 'cid' parameter
>
>
> ds_path <- "GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328_2017-0 ..." ... [TRUNCATED]
> vorinostat_ds <- parse.gctx(ds_path, cid=sig_ids)
reading GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328_2017-03-06.gctx
done
all is well. then
> row_meta_from_gctx <- read.gctx.meta(ds_path, dim="row")
HDF5-DIAG: Error detected in HDF5 (1.8.19) thread 0:
#000: H5O.c line 249 in H5Oopen(): unable to open object
major: Symbol table
minor: Can't open object
#001: H5O.c line 1366 in H5O_open_name(): unable to open object
major: Symbol table
minor: Can't open object
#002: H5O.c line 1407 in H5O_open_by_loc(): unable to open object
major: Object header
minor: Can't open object
#003: H5Goh.c line 226 in H5O_group_open(): unable to register group
major: Object atom
minor: Unable to register new atom
...
attempt to continue session, repeating previously successful read, fails with similar errors.
This error does not occur with R 3.4 ... so it could be an issue with changes to
rhdf5 infrastructure. I'll pass this information to the rhdf5 developer.
If you have fixes against R-devel please let me know.
The sessionInfo was
R Under development (unstable) (2017-11-08 r73690)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.9 (Final)
Matrix products: default
BLAS/LAPACK: /app/intelMKL-2017.0.098_i86-rhel6.0/intelMKL/compilers_and_libraries_2017.0.098/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cmapR_1.0 data.table_1.10.4-3 rhdf5_2.23.5
[4] rmarkdown_1.8
loaded via a namespace (and not attached):
[1] compiler_3.5.0 backports_1.1.2 magrittr_1.5 rprojroot_1.3-1
[5] tools_3.5.0 htmltools_0.3.6 Rcpp_0.12.14 stringi_1.1.6
[9] knitr_1.18 stringr_1.2.0 digest_0.6.13 evaluate_0.10.1
[13] Rhdf5lib_1.1.5
I recently spent 8+ hours trying to get your package functions to work. At first, all that worked was the parse.gctx function. After reinstalling your package and the other packages necessary, now the parse.gctx function doesnt work either.
library(cmapR)
library(rhdf5)
gct_file <- system.file("/Users/jamesordaya/Downloads/GinnysBioinformaticsFolder/GSE70138_Broad_LINCS_Level3_INF_mlr12k_n78980x22268_2015-06-30.gct", package="cmapR")
(ds <- parse.gctx(gct_file, matrix_only=TRUE))
print(my_ds) is my code. I also tried all other functions which get a similar error message
"Error in parse.gctx(gct_file, matrix_only = FALSE) :
could not find function "parse.gctx""
I dont know what to do. I must be honest, I am still pretty new to R developer packages from GitHub so I may need a bit more direction than others but, like I said, I was able to get the parse function to work before the reinstall and now it has the same errors all the other functions had before (and afterwards).
Hi. I had a few questions about the Connectivity map, so I contacted the Broad Institute several times through the Contact us
session of clue.io but I couldn't get any answer from it. Thus I'm addressing an issue here. Here're my questions below.
There are only 8,969 perturbagens in Touchstone version 1.0 even though there are 51,219 perturbagens in LINCS phase I (GSE92742) data. I think it's because those perturbagens are well annotated with MOAs and protein targets so they're included in the Touchstone reference data. Is it right?
How did you collapse the replicates of a perturbagen into a signature, if it was tested in several time points and several doses? In order to understand how, I've read your paper (Subramanian A, Narayan R, Corsello SM, et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell. 2017;171(6):1437-1452.e17. doi:10.1016/j.cell.2017.10.049) and Connectopedia
session in clue.io but I got no information except for information on the process how to collapse the replicates into a signature, which are tested in the same condition (dose and duration of treatment).
It's not working to analyze in the latest version of Touchstone. The 1.0 version of Touchstone works well with the same query as I did in the latest version. The same Error message keeps popping up and it says it may be because of the memory problem.
Due to time lag, I cannot participate in the zoom meetings held during your work hours. Please understand me addressing an issue like this.
Any information you could give me that sheds light on these important issues would be greatly appreciated.
Hello, I am getting the following warning when I load cmapR
, either directly with library(cmapR)
or by importing cmapR
as a dependency for an R package:
Warning messages:
1: multiple methods tables found for ‘aperm’
2: replacing previous import ‘BiocGenerics::aperm’ by ‘DelayedArray::aperm’ when loading ‘SummarizedExperiment’
This is preventing me from having a passing R-CMD-check
for an R package I am developing with cmapR
as a dependency.
sessionInfo:
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.0.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cmapR_1.8.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lattice_0.20-45 matrixStats_0.63.0 IRanges_2.30.1 RProtoBufLib_2.8.0
[6] bitops_1.0-7 grid_4.2.2 GenomeInfoDb_1.32.4 stats4_4.2.2 RcppParallel_5.1.6
[11] zlibbioc_1.42.0 XVector_0.36.0 flowCore_2.8.0 S4Vectors_0.34.0 Matrix_1.5-3
[16] tools_4.2.2 Biobase_2.58.0 RCurl_1.98-1.9 DelayedArray_0.22.0 MatrixGenerics_1.8.1
[21] compiler_4.2.2 BiocGenerics_0.44.0 cytolib_2.8.0 GenomicRanges_1.48.0 SummarizedExperiment_1.26.1
[26] GenomeInfoDbData_1.2.8
Hello.
I'm trying to follow the instructions on the readme, and I get stuck on the check.
This is the error I get:
'library' or 'require' call not declared from: ‘testthat’
* checking tests ...
Running ‘testthat.R’
ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
Saving file to foo.gct
Dimensions of matrix: [10x5]
Setting precision to 4
Saved.
Error in parse(text = lines, n = -1, srcfile = srcfile) :
testthat/test_GCT_class.R:3:70: unexpected '{'
2:
3: test_that("GCT throws error if rid/cid not unique character vectors" {
^
Calls: test_check ... FUN -> with_reporter -> force -> source_file -> parse
testthat results ================================================================
OK: 87 SKIPPED: 0 FAILED: 0
Execution halted
* checking PDF version of manual ... OK
* DONE
I guess there is a comma missing there. I put it, and the check finishes without errors, but I wanted to make sure that it is the right thing to do.
Dear developers,
Thanks for developing this software. I wonder if the output function write.gctx
supports incremental save, so that there is no need to create a GCT object containing full data set in memory. Thanks.
The below should be fixed:
# TODO: fill in once package has been accepted on bioconductor
on this page:
https://www.bioconductor.org/packages/release/bioc/vignettes/cmapR/inst/doc/tutorial.html
Hi there,
Thanks for sharing the code and it works perfectly for me in terms of manipulating the level 5 data. When I was trying to use the level 3 or level 4 data, it seems that I couldn't find a good source for the annotation file. I was wondering where I can find those annotation files so that I can use other levels of data.
Thanks in advance!
Best,
Jerome
This link is dead for me https://clue.io/cmapR/index.html
I apologize if this has been addressed previously, but I couldn't find it mentioned in the closed or open issues.
BiocManager::install("cmapR")
returns
ERROR: dependency ‘prada’ is not available for package ‘cmapR’
Same outcome with devtools. Tried a direct install of Prada as well.
Any suggestions?
Thanks!
Marc
Hi, i need help to use the parsing function of cmapR
I was testing the cmapR library by following the tutorial
and i have problems with the function parse_gctx when i try to parse the small 77kb "modzs_n25x50.gctx" file provided with the cmapR library.
ds_path <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR")
my_ds <- parse_gctx(ds_path)
reading /home/usr/R/x86_64-pc-linux-gnu-library/4.0/cmapR/extdata/modzs_n25x50.gctx
Error: segfault from C stack overflow
same error if i try to parse a subset of the file as described in the tutorial
my_ds_10_columns <- parse_gctx(ds_path, cid=1:10)
I checked my memory usage and it's all set to infinity,
> library(unix)
> rlimit_all()
$cur
as core cpu data fsize memlock nofile nproc stack
Inf 0 Inf Inf Inf 67108864 8192 63355 8388608
$max
as core cpu data fsize memlock nofile nproc stack
Inf Inf Inf Inf Inf 67108864 1048576 63355 Inf
my operative system is Ubuntu 20.04
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Thank you
I know that for a given L1000 perturbagen, I can get a list of all other perturbagens sorted by CMap connectivity score (tau) in clue.io.
However, that is interactive, and way too tedious to retrieve connectivity scores for more than a few queries. How can I programmatically (i.e. in batch) get the connectivy score list for a perturbagen of interest?
I want to generate a matrix of all possible connectivity scores between any given shRNA knock-downs. This would be a matrix whith 3,799 x 3,799 entries with tau for any combination.
This is not feasible to do with the interactive interface. I am trying to program this and starting from the level 5 files with MODZ scores. Do I have to reproduce the method from the original paper from scratch? Or can the API or cmapR help me a bit with this?
I have contacted L1000 team via web form, received no answer, now reaching out here on GitHub.
Hi, thank you for the package!
The structure of object from the example is different from a object that is parsed from a file.
In the second I can't retrieve the columns metadata correctly and all the metadata is on the format "REP.A001_A375_24H:A03"
What I am doing wrong?
library(cmapR)
Example_dsPath <- system.file("extdata", "modzs_n25x50.gctx", package="cmapR")
Example_ds <- parse_gctx(Example_dsPath)
reading ......Documents/R/win-library/4.1/cmapR/extdata/modzs_n25x50.gctx
doneGSE70138_Level5_Path <- "./GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx"
GSE70138_Level5_ds <- parse_gctx(GSE70138_Level5_Path)
reading ./GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx
doneExample_ds
Formal class 'GCT' [package "cmapR"] with 7 slots
..@ mat : num [1:50, 1:25] -1.145 -1.165 0.437 0.139 -0.673 ...
.. ..- attr(, "dimnames")=List of 2
.. .. ..$ : chr [1:50] "200814_at" "222103_at" "201453_x_at" "204131_s_at" ...
.. .. ..$ : chr [1:25] "CPC004_PC3_24H:BRD-A51714012-001-03-1:10" "BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625" "CPC006_HT29_24H:BRD-U88459701-000-01-8:10" "CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10" ...
..@ rid : chr [1:50] "200814_at" "222103_at" "201453_x_at" "204131_s_at" ...
..@ cid : chr [1:25] "CPC004_PC3_24H:BRD-A51714012-001-03-1:10" "BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625" "CPC006_HT29_24H:BRD-U88459701-000-01-8:10" "CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10" ...
..@ rdesc :'data.frame': 50 obs. of 6 variables:
.. ..$ id : chr [1:50] "200814_at" "222103_at" "201453_x_at" "204131_s_at" ...
.. ..$ is_bing : int [1:50] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ is_lm : int [1:50] 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ pr_gene_id : int [1:50] 5720 466 6009 2309 387 3553 427 5898 23365 6657 ...
.. ..$ pr_gene_symbol: chr [1:50] "PSME1" "ATF1" "RHEB" "FOXO3" ...
.. ..$ pr_gene_title : chr [1:50] "proteasome (prosome, macropain) activator subunit 1 (PA28 alpha)" "activating transcription factor 1" "Ras homolog enriched in brain" "forkhead box O3" ...
..@ cdesc :'data.frame': 25 obs. of 16 variables:
.. ..$ brew_prefix : chr [1:25] "CPC004_PC3_24H" "BRAF001_HEK293T_24H" "CPC006_HT29_24H" "CVD001_HEPG2_24H" ...
.. ..$ cell_id : chr [1:25] "PC3" "HEK293T" "HT29" "HEPG2" ...
.. ..$ distil_cc_q75 : num [1:25] 0.05 0.1 0.17 0.45 0.24 ...
.. ..$ distil_nsample : int [1:25] 5 9 4 3 4 5 2 3 2 2 ...
.. ..$ distil_ss : num [1:25] 2.9 1.88 2.71 4.06 3.83 ...
.. ..$ id : chr [1:25] "CPC004_PC3_24H:BRD-A51714012-001-03-1:10" "BRAF001_HEK293T_24H:BRD-U73308409-000-01-9:0.625" "CPC006_HT29_24H:BRD-U88459701-000-01-8:10" "CVD001_HEPG2_24H:BRD-U88459701-000-01-8:10" ...
.. ..$ is_gold : int [1:25] 0 0 0 1 1 0 1 0 0 0 ...
.. ..$ ngenes_modulated_dn_lm: int [1:25] 11 3 8 38 36 23 12 11 33 13 ...
.. ..$ ngenes_modulated_up_lm: int [1:25] 10 7 25 40 16 17 23 14 37 22 ...
.. ..$ pct_self_rank_q25 : num [1:25] 26.904 17.125 7.06 0.229 4.686 ...
.. ..$ pert_id : chr [1:25] "BRD-A51714012" "BRD-U73308409" "BRD-U88459701" "BRD-U88459701" ...
.. ..$ pert_idose : chr [1:25] "10 M" "500 nM" "10 M" "10 M" ...
.. ..$ pert_iname : chr [1:25] "venlafaxine" "vemurafenib" "atorvastatin" "atorvastatin" ...
.. ..$ pert_itime : chr [1:25] "24 h" "24 h" "24 h" "24 h" ...
.. ..$ pert_type : chr [1:25] "trt_cp" "trt_cp" "trt_cp" "trt_cp" ...
.. ..$ pool_id : chr [1:25] "epsilon" "epsilon" "epsilon" "epsilon" ...
..@ version: chr(0)
..@ src : chr "....../Documents/R/win-library/4.1/cmapR/extdata/modzs_n25x50.gctx"
GSE70138_Level5_ds
Formal class 'GCT' [package "cmapR"] with 7 slots
..@ mat : num [1:12328, 1:118050] 4.2641 0.0572 -1.0125 0.3089 -0.1041 ...
.. ..- attr(, "dimnames")=List of 2
.. .. ..$ : chr [1:12328] "780" "7849" "2978" "2049" ...
.. .. ..$ : chr [1:118050] "REP.A001_A375_24H:A03" "REP.A001_A375_24H:A04" "REP.A001_A375_24H:A05" "REP.A001_A375_24H:A06" ...
..@ rid : chr [1:12328] "780" "7849" "2978" "2049" ...
..@ cid : chr [1:118050] "REP.A001_A375_24H:A03" "REP.A001_A375_24H:A04" "REP.A001_A375_24H:A05" "REP.A001_A375_24H:A06" ...
..@ rdesc :'data.frame': 12328 obs. of 1 variable:
.. ..$ id: chr [1:12328] "780" "7849" "2978" "2049" ...
..@ cdesc :'data.frame': 118050 obs. of 1 variable:
.. ..$ id: chr [1:118050] "REP.A001_A375_24H:A03" "REP.A001_A375_24H:A04" "REP.A001_A375_24H:A05" "REP.A001_A375_24H:A06" ...
..@ version: chr(0)
..@ src : chr "./GSE70138_Broad_LINCS_Level5_COMPZ_n118050x12328.gctx"sessionInfo()
R version 4.1.0 Patched (2021-05-29 r80415)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cmapR_1.5.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 XVector_0.33.0
[3] GenomicRanges_1.45.0 BiocGenerics_0.39.0
[5] zlibbioc_1.39.0 IRanges_2.27.0
[7] flowCore_2.4.0 lattice_0.20-44
[9] GenomeInfoDb_1.29.0 tools_4.1.0
[11] SummarizedExperiment_1.23.0 parallel_4.1.0
[13] grid_4.1.0 rhdf5_2.36.0
[15] Biobase_2.53.0 matrixStats_0.59.0
[17] RcppParallel_5.1.4 Matrix_1.3-4
[19] GenomeInfoDbData_1.2.6 Rhdf5lib_1.14.0
[21] cytolib_2.4.0 RProtoBufLib_2.4.0
[23] rhdf5filters_1.4.0 S4Vectors_0.31.0
[25] bitops_1.0-7 RCurl_1.98-1.3
[27] DelayedArray_0.19.0 compiler_4.1.0
[29] MatrixGenerics_1.5.0 stats4_4.1.0
Hi,
Trying to install the package on R-3.6.3
Always the same error:
Error: Failed to install 'cmapR' from GitHub: (converted from warning) installation of package ..... had non-zero exit status
How should I solve this problem?
This function should support specifying an arbitrary subset of columns to use as reference for computing the median and MAD statistics.
even when matrix_only = T
, merge.gct will add metadata of one GCT object to output. Can be fixed by setting cdesc to an empty dataframe as follows, but shouldn't be occurring in the first place:
overlap_tsne_modz <- merge.gct(merge.gct(xpr_b_for_tsne, xpr_lm_for_tsne,
dimension = "column", matrix_only = T),
xpr_c_for_tsne, dimension = "column", matrix_only = T)
overlap_tsne_modz@cdesc <- data.frame(id=overlap_tsne_modz@cid)
overlap_tsne_modz <- annotate.gct(overlap_tsne_modz, overlap_sigs, dimension="column", keyfield = "sig_id")
Hi Ted,
Not really an issue per se, but I was wondering if you have any plans to extend the package to also compute the Connectivity Score as implemented by Subramanian et al., 2017?
Alternatively, are you aware of any other R package or code snippets that implement it?
Thank you,
Enrico
Hi Ted,
I met some error in reading GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx.
Do you ever meet this kind error or have some advice about this.
Below is my code:
source("/tools/R_pakages/master/cmapR/R/GCT.R")
in method for "coerce" with signature "GCT","SummarizedExperiment": no definition for class
“SummarizedExperiment”
source("/tools/R_pakages/master/cmapR/R/io.R")
source("/tools/R_pakages/master/cmapR/R/utils.R")
source("/tools/R_pakages/master/cmapR/R/data.R")
coef_set <-parse_gctx("/GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx")
reading /GSE92743_Broad_OLS_WEIGHTS_n979x11350.gctx
Error: Unable to read dataset.
Not all required filters available.
Missing filters: deflate
Thank you,
Chen
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.