vgteam / sequencetubemap Goto Github PK

View Code? Open in Web Editor NEW

177.0 177.0 24.0 63.19 MB

displays multiple genomic sequences in the form of a tube map

License: MIT License

HTML 0.07% JavaScript 96.09% CSS 0.62% Shell 2.69% TypeScript 0.36% Dockerfile 0.17%

sequencetubemap's People

Contributors

Stargazers

Watchers

sequencetubemap's Issues

Reads going out of bounds on the left aren't visible, while on the right they are cut off

I'm looking for a read, so I enter the read's start node into the tube map in node mode and hit go.

I don't see the read. It's because (I think) the read maps to the reverse strand. The tube map starts at the node I entered and heads locally right, while the read wanders off locally left.

But I don't even see the part of the read that visits the node I entered. I think we might only be collecting reads fully withing the graph.

Starting at the read's ending node gets it to show up.

Checkbox states are retained across page refreshes by the browser and the frontend doesn't notice

In Firefox at least, when you refresh the page, the checkboxes and select dropdowns retain their previous states. For example, if you turned off Remove redundant nodes , the checkbox will be clear on page refresh.

But the frontend doesn't know about this; it will render things with redundant nodes removed until you re-check and re-un-check the checkbox.

Either the checkboxes and dropdowns should be reset by the page to their default values on page load, or some logic should be added to see if the browser has adjusted the state of these controls, so their settings are respected.

Substitutions can render out of bounds of nodes when alignments aren't really to the right graph

See that floating G? It's a substitution that's supposed to be on that blue read, but it isn't drawn right; it's floating below the node it belongs inside of, and not overlayed on the read that has it. The problem may be that the read loops around, later in the graph, in a way that is possible to express but which the aligner generally does not generate.

I can send the files that cause this problem, or you should be able to reproduce it (while my server is running) by entering the settings shown at http://kolossus.sdsc.edu:9000/

US: Element selection

As a genomicist, I can select and mark a read, haplotype, or positional path, so I can keep track of it as I pan the view around.

prepare scripts aren't executable

The examples in the README like docker exec -it <container_id> ./prepare_vg.sh <vg_file> don't work because the scripts aren't runnable.

docker exec -it 50346e057492 ls -la | grep prepare
-rw-r--r--    1 root     root           207 Nov 19  2017 prepare_gam.sh
-rw-r--r--    1 root     root           270 Nov 19  2017 prepare_vg.sh

This is worked around by adding sh to the command. docker exec -it <container_id> sh ./prepare_vg.sh <vg_file>

Either the README or the scripts should be updated to patch this...

US: Node ID search

As a VG developer, I can call up graph displays given a node ID, so that I can find the parts of the graph I am interested in.

US: Multi-sample read pileups

As a genomicist, I can upload multiple .gam files where I can view multiple samples of read pileups within the same genomic site so that I can do mendelian or conserved variant analysis.

US: Link to current view

I want to be able to record a view I have brought up in the tube map (including the combination of files, track settings, and start and length settings, with the files and position being the most important).

The UCSC Genome Browser solves this problem with "sessions", but I think a permalink-based approach might be better. I should be able to copy a link that when I open it takes me back to where I was.

Cross-talk between simultaneous queries

The backend gets information from vg by making vg dump files (like regions.tsv) in the current directory, and then reading them.

If two requests are being handled at once, files from different requests will overwrite one another.

The backend probably needs to use a separate temp directory or set of named pipes for each request.

While we're at it vg chunk's -T option needs to be modified not to just generate its own filename to write to but to take one as input.

It is not possible to supply custom data to the tube map

Clicking on "custom" and "reload" in the interface triggers

error:[vg chunk] unable to load xg index file ./mountedData/none

US: GRCh38 coordinate search

As a biologist, I can call up graph displays given a genomic coordinate on the GRCh38 reference, so that I can find the parts of the genome I study.

US: .gam/.vg upload

As a VG developer, I can upload a small raw .gam file and a small .vg graph, and have them be displayed together, so that I can see if they do what I want on a display that's more robust and intuitive than the GraphViz display.

Obtain visualization on the command line

Is it possible to generate the sequence tube map in a standalone way on the command line?

US: Common difference identification

As a genomicist, I can see when several reads differ from the graph in the same way, so that I can evaluate the correctness of a variant call at that position.

US: More zoom

As a genomicist, I can expand or contract a displayed region, so I can get a wider context or a narrower focus.

US: Custom read set display

As a genomicist, I can display custom read data against an existing indexed graph, so that I can analyze new samples.

TubeMap requires both RocksDB and .gai index

My understanding is that a .sorted.gam + .gai are supposed to provide all of the functionality of a .gam.index, but the software seems to be written to require both indexes. Is there a reason for this? If we could get rid of the RocksDB index, it could substantially reduce the preprocessing time.

Display read mapping quality in alpha or shade channel

@benedictpaten wants reads shown on the tube map to be able to be colored/shaded according to mapping quality, instead of just randomly as they are now.

Add substitutions to display

Align read from text box and display it

I want to be able to paste a sequence into the tube map browser UI and have it automatically align it (with vg align) to the region I am looking at and display the alignment.

This would be very useful for working out why a read did not map to a certain graph region, because I would be able to visually see how good the optimal alignment there was, and I wouldn't have to mess around with the command line tools myself.

Out of bounds node ID crashes vg on the backend

If I enter a node ID that is not in the graph as the node to start at, I get a vg error on the backend and no real error message on the front end (just a blank tube map). There should be an error on the frontend that the node is not in the graph.

US: Read/reference difference display

As a genomicist, I can see where my reads differ from the graph that they are aligned to and where they match it, so that I can get an idea of the quality of my reads and do visual variant calling.

Revise insertion display

Insert character should be a vertical line.

You should be able to mouse over it and see the inserted sequence and where it fits in the node's sequence (with an extended vertical line to the top).

wrong repo lol

fails on vg + unitig assembly fasta with error
error: failed to include path

https://drive.google.com/open?id=1jtMG9kWA9FYKxGJU_GhiKdMaAP9whkV1
contains the files and command

US: Display reads, reference, and haplotypes

As a genomicist, I can display sequencing reads in the context of both the primary reference and local unique haplotypes simultaneously, so that I can see how well my reads fit with both.

Make data preprocessing scripts match current vg indexing API

It seems like the work to make TubeMap use @adamnovak's fancy new indexing code is incomplete. I'm trying to work with the file-upload branch, which has the command line arguments fixed in vg chunk. However, the data preprocessing scripts (data/prepare.sh, data/prepare_dev.sh, backend/prepare_gam.sh, maybe more) still seem to be written to create the old RocksDB-based index. I understand that this branch is a work in progress, but I also can't use master, since it's still written for the old vg chunk API.

@adamnovak says he can update the scripts for me, since he understands what's going on in the TubeMap internals.

US: Copy Node Sequence

I want to be able to copy-paste the sequence form a node, so I can BLAT it on the genome browser. When looking around in node mode, there's no position legend, so it's hard to find out where in the linear reference I am.

We could also have the scale bar in node mode. And maybe a "go to UCSC genome browser" button or something?

GBWT as Haplotype Source

I just merged a PR into VG allowing vg chunk to source its haplotypes from a GBWT in a .gbwt file, instead of from a gPBWT embedded in a .xg file.

This is the setup I am using for all my haplotype-informed mapping experiments, so I need the tube map to be able to read this new format. I think it would just consist of knowing enough to pick up the new file when present and to pass it along to the vg calls.

Distinguish softclips

They should be toggleable with a checkbox somewhere (so you can just ignore them), and should be distinguished form normal inserts by a different color and/or letter.

Scaling Issue of Drawn Bases when Zooming.

Hi There :)
When I zoom (in or out), the scaling of the drawn bases usually does not fit anymore. They are drawn too wide.

Best,
Simon

Reload Data Files Button

It would be nice to be able to reload the lists of files in the dropdowns without reloading the page and messing up my visualization settings.

Can I arrange all the nodes on the selected path horizontally as possible?

Hello, I would like to arrange all the nodes on the selected path horizontally.
I also hope to align all the paths horizontally as possible.

I am trying to do so in this branch, but it fails in some cases.

In this case, the blue path is selected and it is able to align horizontally at all. But, the green path is overlapping with the blue path. Moreover, the purple path can be more straightened.

I would be grateful if you could check it.

Raise an error on selecting the path shown in the bottom

Hello, I am grateful to this library and I would like to use it to display genome graphs :)

I found that it causes an error when I click the path shown in the bottom. In this case, I received an error when I make ref unchecked.

And I have several questions:

Could it visualize not only paths but also nodes and edges without path information?
Are there onClickNode callback or onClickPath callback as external API?

Thank you,

README is out of date

The README references sequenceTubeMap.js, but this appears to have been moved/renamed (to app/main.js?).

vg chunk problem with newer vg docker image v1.5.0-2018-g71f96239-t119-run

Hi,

I just played around with the provied dockerfile and changed in that process to a newer vg docker image which results into following error:
sh

./vg/vg chunk -x ./internalData/snp1kg-BRAC1.vg.xg -a ./internalData/NA12878-BRCA1.gam.index -g -A -p 17:1-101 -T -E regions.tsv | ./vg/vg view -j - >c5717fe0-dc25-11e7-8804-b9ee87b476a9.json
received request for filenames
Error: ENOENT: no such file or directory, scandir './mountedData/'
    at Error (native)
    at Object.fs.readdirSync (fs.js:961:18)
    at app.post (/usr/src/app/app.js:209:6)
    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
    at next (/usr/src/app/node_modules/express/lib/router/route.js:137:13)
    at Route.dispatch (/usr/src/app/node_modules/express/lib/router/route.js:112:3)
    at Layer.handle [as handle_request] (/usr/src/app/node_modules/express/lib/router/layer.js:95:5)
    at /usr/src/app/node_modules/express/lib/router/index.js:281:22
    at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:335:12)
    at next (/usr/src/app/node_modules/express/lib/router/index.js:275:10)
err data: error:[vg chunk] context expansion steps must be specified with -c/--context when chunking on paths

My assumption is that there was a change in vg chunk parameters handling logic. Unfortunately I am pretty new into vg and not sure how to set the correct value (in this case for c). Could someone tell me what value I should provide?

Mouse-over node IDs are confusing with redundant nodes removed

Removing the redundant nodes gives it a cleaner look, but the way it presents the node IDs is confusing. I believe the mouse-over only shows the first node ID. I spent a while confused about this because I thought there were missing nodes in the visualization. Perhaps it would be better to make the mouse-over give a list of IDs, or maybe an ID range?

Make minor improvements to the tube map

Change the header name to a logo or something next to the controls
Default to gray haplotypes and red and blue reads
Hide the radius-based nature of the region extraction and support a start and length or start and end
Move download button to the top

mountedData is not created when following the README instructions

The backend code needs mountedData to exist, but there's no step to create it.

Ability to pull in haplotypes relevant to a context around a node

I've been looking at single reads. This involves throwing a node ID that the read touches into the tube map, and then manually panning left and right by adjusting the start node ID and distance until I can see the whole read.

If I set the start node to an ID that happens to be a SNP allele, instead of a fixed backbone node, I will only get the haplotypes that take that allele of the SNP, and the parts of the graph they touch. I may not see a particular allele at the next SNP, for example, because it is in perfect LD with the SNP I am visiting.

I want to see a bit more context; if I start on a SNP allele, I want to see the other allele of the SNP, and I definitely need to see all the alleles of downstream SNPs that my read might visit. If more haplotypes come into the view from other parts of the graph that aren't reachable going right from the node I started on, I want to see those haplotypes, too, even if I can't see their start nodes.

D3.js Version

Hi Wolfgang ;)

I hope you had a smooth transition into the new year :)

I was wondering if there is any particular reason, why you are still using D3.js V3 instead of V4?

Best,
Simon

Handling of paths that leave and re-enter the displayed subgraph

Hello :)
I am very grateful for your tool, but I found some unexpected behavior.

trackDoubleClick on the last track sometimes fails like this.

Nodes which are not adjacent are shown as if it is concatenated on the path.

In this figure, the red and green lines seem to be concatenated, but they are not adjacent by observing input JSON's rank, which is generated by vg find -x <xg> -P <path> -c <contexts> from our internal dataset.

{"name":"chr20","mapping":[{"position":{"node_id":22608},"rank":6042},{"position":{"node_id":22623},"rank":6057},{"position":{"node_id":22624},"rank":6058},{"position":{"node_id":22625},"rank":6059},{"position":{"node_id":22626},"rank":6060},{"position":{"node_id":22627},"rank":6061},{"position":{"node_id":22632},"rank":6066},{"position":{"node_id":22633},"rank":6067},{"position":{"node_id":22634},"rank":6068}]}

In the same reason, the scale shown on the path is not correct for green / red path.
In my opinion, the link which the both nodes are not adjacent but on the same path had better be a dotted line.

Moreover, I have several requests:

A callback called on dragging SVG to move left or right of genomic regions.
Enable to select an arbitrary color on each path.
Make nodeDoubleClick / trackDoubleClick more general API with callbacks.

Sincerely,

Pull in changes from Illumina fork

Some folks at Illumina have developed a tube map fork that understands Illumina's "paragraph" format, including node coloring for the reference path and support for transparency-based MAPQ visualization which would address #44.

We want to pull in their improvements, but since they forked off before we added some features (like softclip display and strand coloration), it might be a difficult merge to do. Also, it's not clear where exactly their code is located.

Reference paths sometimes extend one node further than haplotypes

When I look up a node by ID and draw so many nodes downstream of it, sometimes I will get this situation, where the reference path (top) extends one node further than the haplotypes (bottom). The haplotypes do extend into that next node in reality, but they are getting cut off one node before the reference path is by the chunking.

This is probably some kind of bug in the vg chunk code. @yoheirosen?

dist
├── apple-touch-icon.png
├── favicon.ico
├── fonts
│   ├── glyphicons-halflings-regular.eot
│   ├── glyphicons-halflings-regular.svg
│   ├── glyphicons-halflings-regular.ttf
│   ├── glyphicons-halflings-regular.woff
│   └── glyphicons-halflings-regular.woff2
├── images
│   └── logo.png
├── robots.txt
├── scripts
└── styles

I am not familiar with "gulp.js". Do you have any idea why it does not build the frontend?

command line version

I'm curious if we could have a version of this which would work on the command line, allowing us to pipe in the data and render out a (possibly static) visualization.

Also, the "oldindex" disappeared, https://vgteam.github.io/sequenceTubeMap/oldIndex.html. Is there any way to bring it back? It was very useful for rendering small graphs, much more legible than the vg view -d - | dot renderings.

vgteam / sequencetubemap Goto Github PK

sequencetubemap's People

Contributors

Stargazers

Watchers

Forkers

sequencetubemap's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs