Comments (12)
When I tried mart.obj <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org", dataset="hsapiens_gene_ensembl")
, the error gone.
I think that the default version is hg38 (useMart function, host="www.ensembl.org"), but the demo dataset of honeyBADGER uses the hg19 (host="grch37.ensembl.org"), and the version "jul2015.archive.ensembl.org" may be private for author's accout.
from honeybadger.
I got the exact "NULL" result from default ensembl genes (and all the following warnings etc.), but the solution is not working for me. I still got "NULL" from hg19. Any other thoughts?
from honeybadger.
I got the exact "NULL" result from default ensembl genes (and all the following warnings etc.), but the solution is not working for me. I still got "NULL" from hg19. Any other thoughts?
have you tried more steps? pls have a look at the results of calcGexpCnvBoundaries step.
from honeybadger.
If 'host="grch37.ensembl.org"' is used for the "mart.obj" object, "calcGexoCnvBoundaries" would give it a "NULL" result. And the "regions.genes" is completely empty.
from honeybadger.
If 'host="grch37.ensembl.org"' is used for the "mart.obj" object, "calcGexoCnvBoundaries" would give it a "NULL" result. And the "regions.genes" is completely empty.
use the hg19,
hb$calcGexpCnvBoundaries(init=TRUE, verbose=FALSE)
ERROR: Error: subscript contains invalid names
ERROR: Error: subscript contains invalid names
NULL
it seems get error again, however it can be ignored since the following steps can be run, and it also got some results (but different from the demoβs results...
from honeybadger.
Is "regions.genes" empty, without any genomic intervals? Then "summarizeResults" would complain about empty results? In the end, this may not matter. Custom data may run through without problems. But, the fact that the tutorial example is not reproducible (even somewhat), is kinda disturbing.
from honeybadger.
The regions.genes is not empty, but the tutorial example is indeed not reproducible...
print(regions.genes)
GRanges object with 4 ranges and 0 metadata columns:
seqnames ranges strand
[1] chr11 167784-77185680 *
[2] chr7 855528-158749438 *
[3] chr9 134000948-141019076 *
[4] chr10 320130-135187193 *
seqinfo: 52 sequences from an unspecified genome; no seqlengths
from honeybadger.
Now it seems weirder: I got "seqinfo: 28 sequences from an unspecified genome; no seqlengths" from this step with an empty "regions.genes". Clearly there are multiple versions of reference assemblies going around here. The question is where the root is. I'd assume this comes from the mart.obj, but shouldn't we get the same "hsapiens_gene_ensembl" if we use the same "host" and "dataset" at the "useMart" step?
from honeybadger.
yes, i think we shouldn't get the totally different output.
from honeybadger.
Dear Rongtingting,
Thank you for taking the initiative to address the issue you discovered and sharing the solution. Indeed, the data included with the package was aligned to hg19
. Back when this paper was originally published and this subsequent tutorial released, biomaRt
's default version was hg19
and has indeed since been updated. The exact version used for both the paper and tutorial is the assembly from July 2015! The full set of changes to the human genome since hg19vJuly2015 can be found here: http://useast.ensembl.org/info/website/archives/index.html
There are at least a few reasons why using a different genome version may produce slightly different results. One, the gene symbols/names may have changed. So a gene that is included in the built in data can no longer be found in the new assembly. Two, the gene coordinates may have changed. This will affect the exact genomic coordinates represented by the genes and subsequently the exact genomic coordinates of the CNVs identified. Three, newer genome assemblies may also have different alternative contig names (regions.genes@seqnames
that are not chromosomes 1 through 22), though this should not impact the final results, which are limited to chromosomes 1 through 22 anyway (thought you may find different numbers of sequences from unspecified genomes to what is noted in the tutorial).
The version and seed of JAGs runs you use may also play a minor role since HMMs are stochastic after all. You should also double check that JAGs is installed and running correctly since it is external to the R environment.
All this may all impact the exact coordinates of the CNVs identified, in particular before retestIdentifiedCnvs
is used to filter out spurious/non-confident identified CNVs. However, the final set of identified CNVs on chromosomes 5, 7, 20, 10 13, and 14 should be reproducible though, especially if you are able to reproduce the figure from hb$plotGexpProfile()
.
The tutorial is compiled from the Rmarkdown under https://github.com/JEFworks-Lab/HoneyBADGER/blob/master/vignettes/Getting_Started.Rmd in case you would like to recompile it from there instead of copying and pasting from the tutorial.
Hope that clarifies some things.
Stay healthy and safe,
Prof. Jean Fan
from honeybadger.
Since "jul2015.archive.ensembl.org" is not available at this point (the archive used in the Rmd), the closest ones on ensembl archive list are "may2015" and "sep2015" archives. I tried both and "sep2015" is generating the closest results to the tutorial. With a few snags, the tutorial will run up to the "using allele information" part, which I haven't tested yet. Indeed, the "amps" on chr5, 7, 20, and "dels" on chr10, 13, 14 showed up (with some extra "dels" on chr6, 9, 11). One minor suggestion might be to update the documentation with some specific information about which ensembl archive(s) might generate similar results, because the original "jul2015" is not accessible now.
from honeybadger.
Dear Rongtingting,
Thank you for taking the initiative to address the issue you discovered and sharing the solution. Indeed, the data included with the package was aligned to
hg19
. Back when this paper was originally published and this subsequent tutorial released,biomaRt
's default version washg19
and has indeed since been updated. The exact version used for both the paper and tutorial is the assembly from July 2015! The full set of changes to the human genome since hg19vJuly2015 can be found here: http://useast.ensembl.org/info/website/archives/index.htmlThere are at least a few reasons why using a different genome version may produce slightly different results. One, the gene symbols/names may have changed. So a gene that is included in the built in data can no longer be found in the new assembly. Two, the gene coordinates may have changed. This will affect the exact genomic coordinates represented by the genes and subsequently the exact genomic coordinates of the CNVs identified. Three, newer genome assemblies may also have different alternative contig names (
regions.genes@seqnames
that are not chromosomes 1 through 22), though this should not impact the final results, which are limited to chromosomes 1 through 22 anyway (thought you may find different numbers of sequences from unspecified genomes to what is noted in the tutorial).The version and seed of JAGs runs you use may also play a minor role since HMMs are stochastic after all. You should also double check that JAGs is installed and running correctly since it is external to the R environment.
All this may all impact the exact coordinates of the CNVs identified, in particular before
retestIdentifiedCnvs
is used to filter out spurious/non-confident identified CNVs. However, the final set of identified CNVs on chromosomes 5, 7, 20, 10 13, and 14 should be reproducible though, especially if you are able to reproduce the figure fromhb$plotGexpProfile()
.The tutorial is compiled from the Rmarkdown under https://github.com/JEFworks-Lab/HoneyBADGER/blob/master/vignettes/Getting_Started.Rmd in case you would like to recompile it from there instead of copying and pasting from the tutorial.
Hope that clarifies some things.
Stay healthy and safe,
Prof. Jean Fan
Dear Prof. Fan,
Thank you for taking the time to help us in this isssue. Yes, different version of rjags might cause little difference during the sampling.
With the demo data provided by the pcakage, both expression and allele info part can be run following the getting started tutorial.
However, I found that the last step which combine the expression and allele information can not get results! Could you give me some instructions on how to figure it out? (The log of the last step is attached)
Thanks a lot for your time!!!
hb$retestIdentifiedCnvs(retestBoundGenes=TRUE, retestBoundSnps=TRUE, verbose=FALSE)
WARNING! ONLY 9 SNPS IN REGION!
WARNING! ONLY 2 SNPS IN REGION!
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph information:
Observed stochastic nodes: 30095
Unobserved stochastic nodes: 37372
Total graph size: 431029
Initializing model
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|| 100%
|| 100%
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph information:
Observed stochastic nodes: 46958
Unobserved stochastic nodes: 64285
Total graph size: 754590
Initializing model
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|| 100%
|| 100%
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph information:
Observed stochastic nodes: 3548
Unobserved stochastic nodes: 4150
Total graph size: 27465
Initializing model
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|| 100%
|| 100%
ERROR! ONLY 1 GENES IN REGION!
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph information:
Observed stochastic nodes: 13842
Unobserved stochastic nodes: 10094
Total graph size: 69120
Initializing model
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|| 100%
|| 100%
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph information:
Observed stochastic nodes: 20082
Unobserved stochastic nodes: 13934
Total graph size: 102837
Initializing model
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|| 100%
|| 100%
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph information:
Observed stochastic nodes: 8578
Unobserved stochastic nodes: 7005
Total graph size: 40610
Initializing model
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
|| 100%
|| 100%
results <- hb$summarizeResults(geneBased=TRUE, alleleBased=TRUE)
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 7, 6
from honeybadger.
Related Issues (20)
- lt$setGexpDev HOT 1
- Warning in setGeneFactors and error in retestIdentifiedCnvs
- Allele-mode for selecting normal cells HOT 2
- 10X + Honeybadger question HOT 2
- Error in summarizeResults HOT 3
- Error in calcGexpCnvBoundaries when running with numeric chromosome names HOT 1
- Error: subscript contains invalid names HOT 9
- read bam files HOT 4
- Filtering of identified CNVs HOT 2
- speed of running setAlleleMats step
- What is the reason for only including snps in HoneyBADGER? HOT 2
- gene filtering different in HoneyBADGER object HOT 3
- Error: Failed to install 'HoneyBADGER' from GitHub HOT 2
- Showing error when trying Getting_Started.Rmd
- no method for coercing this S4 class to a vector HOT 3
- Applying bayesian hierarchical model in integrating analyses tutorial
- error in hb$summarizeResults
- Error in summarizeResults for the allele-based approach
- Error in calcCombCnvProb HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from honeybadger.