GithubHelp home page GithubHelp logo

bam.iobio.io's People

Contributors

adityaekawade avatar alistairnward avatar anderspitman avatar chmille4 avatar dependabot[bot] avatar mjbowler avatar stefinfection avatar tonydisera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bam.iobio.io's Issues

Defined range for insertion length

Is it worth allowing a defined range to be input by the user? Right now, if I look at my genome, the default range is 0-600, with the mean ~360. If I click outliers, it jumps to 0-200M, so even with zooming, I have very little control. I want to see if there is a spike from Alus which would be ~700 for this data, but can't expand the range of the default view to get out to ~800 which is what I would like. Would be nice if I could just specify that I want to look at the range 0-800.

Allow sampling of multiple references

data with many small references (e.g. microbiome) need to sample multiple references at the same time to be useful. Need to figure out how to display multiple read coverages at once to be able to implement this feature.

Add region to bam.iobio

Discussion with Omicia led to the suggestion that we should be able to define chromosome and region and include in URL.

Incomplete coverage track

This is not always replicatable, which is obviously a pain, but I have seen this happening a number of times recently, including at BoG.
screen shot 2016-05-17 at 10 35 28 am

Need better error reporting when .bai file is not present

Need better error reporting when bai file is missing. Right now, when loading the bam file from a URL, if the corresponding .bai file is not present, the failure is silent. We should send an error message back to the web page that indicates that the corresponding .bai file is not found. Down the road, it might me nice to generate a .bai file on the fly if it is not present.

Coverage average is off for sparse data

For a BAM with sparse data, zoom in on a section of data where there are reads and the read coverage distribution looks good, but the average is off the chart to the left. I assume that this is showing the average of all the data, which since it is sparse, is close to zero. Would it make more sense to show the average of the data in the selected data?

Please support sftp protocol for remote files

From Howard Sun:

I do not think to first fix it,but more good solution is to load bam via sftp(ssh), because a large bam data is not open for anyone.
There should be many public codes available. I hope you will consider first sftp. That would be great.

Loading bam files is slow!

Hi All,

Good day and hope all is well.

We have recently installed bam.iobio.io locally with the help of the installation guide at this link: https://github.com/chmille4/iobio/wiki/Local-iobio-setup

However, when loading the bam files by selecting the 'choose bam file' button it takes very long to load and visualize the data. Our bam file is indexed according to http://bam.iobio.io/help.html.

I am wondering if the bam file needs to be in the same virtual server for some reason where the webservice is running . Currently the bam files are in a separate node connected through a high speed data network.

Also is there any recommended server specification (cpu, memory etc.) for installing bam.iobio.locally?

Best Regards,

Anwar

Missing file isn't handled

If I supply a file that doesn't exist, bam just sits. Console logs show the file doesn't exist, but the user has no idea.

add support for ga4gh data

Doing so would also let authorized users access data on Google Genomics, which will include the MSSNG autism database, TCGA, 1000 Genomes and more

Add range selector for fragment length

Add a range selector to fragment length so that a user can look at the outlier fragments without the chart being overwhelmed by the data in the correct fragment range

upgrade to samtools v1

Current version of samtools fails on remote files that have indices named xxx.bai instead of xxx.bam.bai. This is fixed in samtools v1.

Handle ftp data urls

Ftp doesn't support byte range requests so we can't support ftp urls.

Detect ftp data urls and do

  1. Check if url is also being served over http and if so use that url
  2. If above doesn't work, give error to use.

Test bam files

Add a set of test BAM files and a doc that provides bam.iobio links to all the test files. This can be used to check through a list of files and ensure that bam.iobio acts as expected on some constructed use cases.

Improve sampling algorithm

sampling currently is solely based on regions. This means we can either over or under sample based on the read depth of the data. Need to improve sampling so that we keep sampling if we get too few reads or stop sampling once we have enough reads.

Mapping rate

Start bam.iobio for whole genome, rather than chromosome 1 and estimate the number of unmapped reads to get proper mapping rate estimate.

Unmapped reads for BoG poster

Can we figure out a way to generate a bam.iobio screenshot showing the real stats for my genome (specifically, mapping rate of ~70%) for BoG poster. If the iobio suite workflow is also going to be demo'd at BoG, we need to be able to have bam.iobio run and generate these stats. Real stats are:

Total reads: 671644470
Mapped reads: 479405211 (71.3778%)
Forward strand: 431035875 (64.1762%)
Reverse strand: 240608595 (35.8238%)
Failed QC: 0 (0%)
Duplicates: 40566531 (6.03988%)
Paired-end reads: 671644470 (100%)
'Proper-pairs': 472073744 (70.2862%)
Both pairs mapped: 477749930 (71.1314%)
Read 1: 335671024
Read 2: 335973446
Singletons: 1655281 (0.246452%)

Gaps (0 depth) in regions of BAM loaded from file

When I compare the Read Depth line charts for the same BAM file, one loaded from a client-side file, the other from a URL, the line charts closely resemble each other; however, on some chromosomes, there are empty regions on the file-loaded BAM where the URL-loaded BAM has point (depth) data. See screen prints below on the default 1000Genomes dataset (HG04141).

Chromosome 13
screen shot 2014-09-11 at 6 03 24 pm
screen shot 2014-09-11 at 6 03 12 pm

Chromosome 9
screen shot 2014-09-11 at 6 02 22 pm

screen shot 2014-09-11 at 6 02 31 pm

Loading in Read Coverage information

When the index is parsed, the reference are built up, but the read coverage across the chromosomes appears all at once after the sequences are loaded. Should these be drawn in along with the chromosome titles?

Files from samba mounted drive do not load in bam.iobio

Andrew is having trouble with bam.iobio for local files from a mounted drive. I am seeing the same problem from my local computer.

The density chart loads, but then the browser (Chrome in this case) crashes with an ‘out of memory’ error.

Multiple resolutions

We should test on multiple resolutions, in particular when presenting, we want to be sure that the apps still look good. In the viz-dev branch, the scaling of the chromosome bars and the read coverage (top image) are not the same, so the coverage starts partway through the bars (no coverage above chr1) etc. in a presentation.

Different number of Reads Sampled on same file

I am running into one behavior that seems odd…. I get a different number of Reads Sampled each time I load the same bam file. Is this expected behavior?

At first, I thought that the stats were different when the .bai was produced from samtools vs bamtools. However, it may just be that the number of reads sampled varies considerably for each load.

On a related note, the line charges for read depth look slightly different for the .bai generated from the same bam but with samtools vs bamtools. Specifically, I don’t see the line going from the high point back down to zero on the line chart for the samtools generated .bai file.

I’ve attached 2 screen prints showing the read depth line charts from same bam, but from .bai files generated from samtools vs. bamtools.

Can't load BAM from url with https

No errors, warning in console and the server appears to return data for get estimated read depth; however, the all widgets stay in 'sampling...' state.

Read coverage distribution histogram view gets strange (random?) brush selection each time Chromosome is selected from read depth panel.

Good news - This seems to be dependent on the file loaded. For example, I do not see this problem with our HG04141.mapped.ILLUMINA.bwa.BEB.low_coverage.20130415.bam file.

I am seeing it with a public bam file I downloaded from a Univ of Utah data sharing site.
(A1754/Bam/Hg19/SRR893054_STP1N240A.bam)

Here are screen prints for each time the Read Coverage Distribution histogram is refreshed for the times I clicked on ch20:
screen shot 2014-09-08 at 5 16 16 pm

2nd iteration

screen shot 2014-09-08 at 5 17 19 pm

3rd iteration

screen shot 2014-09-08 at 5 17 31 pm

Scaling issues and reference drop down

The drop down for reference sits on top of the wheel and the read coverage doesn't rescale with the window. The reference sequence markers (the coloured blocks) resize as the screen is resized, but the data stays put. This is an issue, because sometimes the scaling is wrong when you load and you
screen shot 2016-08-15 at 8 55 27 am
have to resize the window to get them to line up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.