ltla / archive-csaw Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 0.0 13.09 MB

An archived version of the csaw repository, see https://github.com/LTLA/csaw for the active version.

R 71.70% Shell 1.72% C++ 26.58%

bioconductor chip-seq r

archive-csaw's Introduction

GitHub statistics

Bioconductor contributions

As maintainer

Package	BioC-devel	BioC-release
csaw
diffHic
InteractionSet
scran
beachmat
DropletUtils
cydar
batchelor
BiocNeighbors
BiocSingular
SingleR
simpleSingleCell
chipseqDB
csawUsersGuide
chipseqDBData
basilisk
basilisk.utils
scuttle
celldex
bluster
rebook
scRNAseq
DropletTestFiles
ResidualMatrix
TileDBArray
metapod
mumosa
BumpyMatrix
TrajectoryUtils
dir.expiry
DelayedRandomArray
alabaster.base
alabaster.matrix
alabaster.ranges
alabaster.se
alabaster.sce
alabaster.mae
alabaster.string
alabaster.spatial
alabaster.bumpy
alabaster.vcf
alabaster.files
alabaster
chihaya
screenCounter
gypsum

As co-maintainer

Package	BioC-devel	BioC-release
edgeR
scater
SingleCellExperiment
iSEE
iSEEu
TSCAN

Other packages I'm involved in

Package	BioC-devel	BioC-release
TENxBrainData
scDblFinder
zellkonverter
velociraptor
snifter
PCAtools
DelayedMatrixStats

archive-csaw's People

Stargazers

Watchers

archive-csaw's Issues

csaw user guide quick start script throws error

(I dunno if this is related to the csaw code, if not then just close.)

In the quickstart guide, you first load csaw, then edgeR to use the aveLogCPM function.

This is not working for me; I get the error below:

library(csaw);library(edgeR)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unlist, unsplit

Loading required package: S4Vectors
Loading required package: stats4
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: limma
Error in unloadNamespace(package) :
  namespace ‘limma’ is imported by ‘csaw’, ‘edgeR’ so cannot be unloaded
Error in library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc,  :
  Package ‘limma’ version 3.27.4 cannot be unloaded

Computing Simes gives FDR in ascending order, based on genome location

Csaw version 1.4.1.

I have the following code:

library(csaw)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(magrittr)

tss = genes(TxDb.Hsapiens.UCSC.hg38.knownGene) %>% promoters(upstream=5000, downstream=5000)

load("/local/home/endrebak/code/programmable_epigenetics/data/snakemake/csaw/divide/PolII_50.Rdata") # provides data in namespace

keep = overlapsAny(rowRanges(data), tss)

results = read.table("/local/home/endrebak/code/programmable_epigenetics/data/snakemake/csaw/divide/PolII_50_input_merged.toptable", sep=" ", colClasses=c("char", rep("numeric",6)), nrows=953067)
merged = mergeWindows(rowRanges(data[keep]), tol=1000L)

tabcom = combineTests(merged$id, results[keep,], pval.col="P.Value", fc.col=c("Sine", "Cosine"))

regions = as.data.frame(merged$region)[c("seqnames", "start", "end")]
rownames = with(regions, paste(seqnames, start, end, sep="_"))

write.table(tabcom, "/local/home/endrebak/code/programmable_epigenetics/data/snakemake/csaw/divide/PolII_50_genes_only.pvals", sep=" ", row.names=rownames)

The data object is just a regular GRange object, while the results df looks like this:

                               Sine      Cosine     AveExpr        F      P.Value adj.P.Val
chr2_43313701_43313750   -2.3167997 -0.02982553 -0.71816941 23.00540 4.928012e-06 0.5988389
chr11_18399751_18399800   2.2621293 -0.22226134  0.97321901 21.47272 8.077450e-06 0.5988389
chr17_2011501_2011550    -1.9665130 -0.04678999 -0.51708853 20.76365 1.023417e-05 0.5988389
chr5_139283801_139283850  0.5346595 -2.15734937 -0.05410586 20.75094 1.027817e-05 0.5988389
chr8_42497651_42497700   -1.7131037 -1.36821028 -0.63313392 19.63294 1.509591e-05 0.5988389
chr1_151945101_151945150 -2.0847241 -0.72914565 -0.69910641 19.21296 1.750542e-05 0.5988389

Running the code above gives a result like this:

"nWindows" "Sine.up" "Sine.down" "Cosine.up" "Cosine.down" "P.Value" "FDR"
"chr1_10101_10150" 1 0 1 0 0 4.92801174293831e-06 0.0548256926246022
"chr1_16251_16300" 1 1 0 0 0 8.07745011043863e-06 0.0548256926246022
"chr1_629851_630050" 4 0 3 1 3 2.14090895884271e-05 0.081581650856823
"chr1_633951_634100" 3 0 2 1 1 2.40387921493401e-05 0.081581650856823
"chr1_777651_780050" 30 1 25 11 14 5.49779891076081e-05 0.149265240427156
"chr1_826651_827800" 18 3 13 6 8 7.71570212938838e-05 0.158865110913651
"chr1_924701_925250" 6 1 5 0 4 8.19193942096173e-05 0.158865110913651
"chr1_958701_959450" 15 1 13 5 5 0.000101847090396946 0.160488522743873
"chr1_960501_960750" 5 1 4 1 4 0.000106401230548424 0.160488522743873
"chr1_965051_969700" 71 11 52 24 31 0.000152290752011783 0.206734695855995
"chr1_978251_978550" 5 2 2 0 5 0.000169395241959615 0.209049128145616
"chr1_995651_1002300" 109 14 82 34 47 0.00024273955102494 0.274599117096963
"chr1_1012951_1017650" 71 11 53 24 30 0.000290257012485047 0.303095303421886
"chr1_1019201_1025150" 104 19 70 35 41 0.000365066089742981 0.353983726304355
"chr1_1069301_1071100" 33 8 18 9 18 0.00042737107263999 0.374991919672754
"chr1_1072851_1074700" 30 5 20 4 14 0.00044232715868636 0.374991919672754
"chr1_1115801_1116650" 16 3 10 9 5 0.000469603140658329 0.374991919672754
"chr1_1162451_1165800" 67 7 45 25 29 0.000531889908982252 0.377697554761486
"chr1_1168301_1169100" 2 1 1 0 0 0.000532655731320748 0.377697554761486

As you can see, the FDR is just naively increasing with chromosome location, which seems like a bug to me. Is it, or am I doing something wrong?

If you want to reproduce I can send you the data.

Posted it here since this is not a usage question (most likely).

Fix filter calculations in filterWindows

Current calculations use aveLogCPM, which adds the prior count to the library sizes. This is slightly wrong with scaling in scaledAverage, as the modified library sizes are not comparable between the scaled and original abundances. The fix requires manual calculation of the average abundances, where the rescaled prior is added to the counts but not library sizes.

P.S. The same argument applies to filterDirect and friends in diffHic.

Add a wrapper function for post-hoc control

Add a wrapper function that combines mergeWindows with controlClusterFDR for easy use.

How can running windowCounts on two files together give more regions than the sum of individual runs?

In the code below I run windowCounts on two files, first individually, then together.

data <- windowCounts(file1, ext=110, width=10, param=param)
data1 <- windowCounts(file2, ext=110, width=10, param=param)
data2 <- windowCounts(c(file1, file2), ext=110, width=10, param=param)

The following are the (to me counterintuitive) results I get:

> rowRanges(data); rowRanges(data1)
GRanges object with 4942 ranges and 0 metadata columns:
...
GRanges object with 5438 ranges and 0 metadata columns:
...
> rowRanges(data2)
GRanges object with 12738 ranges and 0 metadata columns:
...

I could understand it if the last windowCounts gave a number of regions below or equal to 4942 + 5438, but not getting more regions! Why does this happen (perhaps something to include in the manual/docs)?

Bonus q: is there a way to make csaw include every region, even though the expression is low there? (If all rowRanges had the same number of regions, merging the data would be much easier.)

ltla / archive-csaw Goto Github PK

archive-csaw's Introduction

GitHub statistics

Bioconductor contributions

As maintainer

As co-maintainer

Other packages I'm involved in

archive-csaw's People

Stargazers

Watchers

archive-csaw's Issues

csaw user guide quick start script throws error

Computing Simes gives FDR in ascending order, based on genome location

Fix filter calculations in filterWindows

Add a wrapper function for post-hoc control

How can running windowCounts on two files together give more regions than the sum of individual runs?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs