GithubHelp home page GithubHelp logo

ltla / archive-csaw Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 0.0 13.09 MB

An archived version of the csaw repository, see https://github.com/LTLA/csaw for the active version.

R 71.70% Shell 1.72% C++ 26.58%
bioconductor chip-seq r

archive-csaw's Introduction

archive-csaw's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

archive-csaw's Issues

csaw user guide quick start script throws error

(I dunno if this is related to the csaw code, if not then just close.)

In the quickstart guide, you first load csaw, then edgeR to use the aveLogCPM function.

This is not working for me; I get the error below:

library(csaw);library(edgeR)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    do.call, duplicated, eval, evalq, Filter, Find, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unlist, unsplit

Loading required package: S4Vectors
Loading required package: stats4
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: limma
Error in unloadNamespace(package) :
  namespace ‘limma’ is imported by ‘csaw’, ‘edgeR’ so cannot be unloaded
Error in library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc,  :
  Package ‘limma’ version 3.27.4 cannot be unloaded

Computing Simes gives FDR in ascending order, based on genome location

Csaw version 1.4.1.

I have the following code:

library(csaw)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(magrittr)

tss = genes(TxDb.Hsapiens.UCSC.hg38.knownGene) %>% promoters(upstream=5000, downstream=5000)

load("/local/home/endrebak/code/programmable_epigenetics/data/snakemake/csaw/divide/PolII_50.Rdata") # provides data in namespace

keep = overlapsAny(rowRanges(data), tss)

results = read.table("/local/home/endrebak/code/programmable_epigenetics/data/snakemake/csaw/divide/PolII_50_input_merged.toptable", sep=" ", colClasses=c("char", rep("numeric",6)), nrows=953067)
merged = mergeWindows(rowRanges(data[keep]), tol=1000L)

tabcom = combineTests(merged$id, results[keep,], pval.col="P.Value", fc.col=c("Sine", "Cosine"))

regions = as.data.frame(merged$region)[c("seqnames", "start", "end")]
rownames = with(regions, paste(seqnames, start, end, sep="_"))

write.table(tabcom, "/local/home/endrebak/code/programmable_epigenetics/data/snakemake/csaw/divide/PolII_50_genes_only.pvals", sep=" ", row.names=rownames)

The data object is just a regular GRange object, while the results df looks like this:

                               Sine      Cosine     AveExpr        F      P.Value adj.P.Val
chr2_43313701_43313750   -2.3167997 -0.02982553 -0.71816941 23.00540 4.928012e-06 0.5988389
chr11_18399751_18399800   2.2621293 -0.22226134  0.97321901 21.47272 8.077450e-06 0.5988389
chr17_2011501_2011550    -1.9665130 -0.04678999 -0.51708853 20.76365 1.023417e-05 0.5988389
chr5_139283801_139283850  0.5346595 -2.15734937 -0.05410586 20.75094 1.027817e-05 0.5988389
chr8_42497651_42497700   -1.7131037 -1.36821028 -0.63313392 19.63294 1.509591e-05 0.5988389
chr1_151945101_151945150 -2.0847241 -0.72914565 -0.69910641 19.21296 1.750542e-05 0.5988389

Running the code above gives a result like this:

"nWindows" "Sine.up" "Sine.down" "Cosine.up" "Cosine.down" "P.Value" "FDR"
"chr1_10101_10150" 1 0 1 0 0 4.92801174293831e-06 0.0548256926246022
"chr1_16251_16300" 1 1 0 0 0 8.07745011043863e-06 0.0548256926246022
"chr1_629851_630050" 4 0 3 1 3 2.14090895884271e-05 0.081581650856823
"chr1_633951_634100" 3 0 2 1 1 2.40387921493401e-05 0.081581650856823
"chr1_777651_780050" 30 1 25 11 14 5.49779891076081e-05 0.149265240427156
"chr1_826651_827800" 18 3 13 6 8 7.71570212938838e-05 0.158865110913651
"chr1_924701_925250" 6 1 5 0 4 8.19193942096173e-05 0.158865110913651
"chr1_958701_959450" 15 1 13 5 5 0.000101847090396946 0.160488522743873
"chr1_960501_960750" 5 1 4 1 4 0.000106401230548424 0.160488522743873
"chr1_965051_969700" 71 11 52 24 31 0.000152290752011783 0.206734695855995
"chr1_978251_978550" 5 2 2 0 5 0.000169395241959615 0.209049128145616
"chr1_995651_1002300" 109 14 82 34 47 0.00024273955102494 0.274599117096963
"chr1_1012951_1017650" 71 11 53 24 30 0.000290257012485047 0.303095303421886
"chr1_1019201_1025150" 104 19 70 35 41 0.000365066089742981 0.353983726304355
"chr1_1069301_1071100" 33 8 18 9 18 0.00042737107263999 0.374991919672754
"chr1_1072851_1074700" 30 5 20 4 14 0.00044232715868636 0.374991919672754
"chr1_1115801_1116650" 16 3 10 9 5 0.000469603140658329 0.374991919672754
"chr1_1162451_1165800" 67 7 45 25 29 0.000531889908982252 0.377697554761486
"chr1_1168301_1169100" 2 1 1 0 0 0.000532655731320748 0.377697554761486

As you can see, the FDR is just naively increasing with chromosome location, which seems like a bug to me. Is it, or am I doing something wrong?

If you want to reproduce I can send you the data.

Posted it here since this is not a usage question (most likely).

Fix filter calculations in filterWindows

Current calculations use aveLogCPM, which adds the prior count to the library sizes. This is slightly wrong with scaling in scaledAverage, as the modified library sizes are not comparable between the scaled and original abundances. The fix requires manual calculation of the average abundances, where the rescaled prior is added to the counts but not library sizes.

P.S. The same argument applies to filterDirect and friends in diffHic.

How can running windowCounts on two files together give more regions than the sum of individual runs?

In the code below I run windowCounts on two files, first individually, then together.

data <- windowCounts(file1, ext=110, width=10, param=param)
data1 <- windowCounts(file2, ext=110, width=10, param=param)
data2 <- windowCounts(c(file1, file2), ext=110, width=10, param=param)

The following are the (to me counterintuitive) results I get:

> rowRanges(data); rowRanges(data1)
GRanges object with 4942 ranges and 0 metadata columns:
...
GRanges object with 5438 ranges and 0 metadata columns:
...
> rowRanges(data2)
GRanges object with 12738 ranges and 0 metadata columns:
...

I could understand it if the last windowCounts gave a number of regions below or equal to 4942 + 5438, but not getting more regions! Why does this happen (perhaps something to include in the manual/docs)?

Bonus q: is there a way to make csaw include every region, even though the expression is low there? (If all rowRanges had the same number of regions, merging the data would be much easier.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.