dariober / asciigenome Goto Github PK
View Code? Open in Web Editor NEWText Only Genome Viewer!
Home Page: http://asciigenome.readthedocs.io/en/latest/description.html
License: MIT License
Text Only Genome Viewer!
Home Page: http://asciigenome.readthedocs.io/en/latest/description.html
License: MIT License
Setting the awk filter prevents next
and find
to move to features outside the current window. This is because the awk filter stores a set of intervals (in field IntervalFeatureTrack.awkFiltered
) which are considered valid.
Reproduce:
ASCIIGenome -x 'bookmark && +1k && bookmark && +1k && bookmark && 1'
awk '$4 > -1' <- do not filter anything, always true.
next <- does not move!
awk <- remove filter
next <- now it does move
Solutions? Maybe instead of storing the set of features passing filter, store the awk script itself. Every time the method featureIsVisible
is computed, re-run the awk script and filter accordingly (this is quite expensive of course).
If you try to load bam files aligned to different genomes you get a rather misleading message:
ASCIIGenome fk045_hyd1.Lmaj.bam fk068_Ldono_fU1.ldon.bam
Initializing coordinates...
Unable to classify fk068_Ldono_fU1.ldon.bam; skipping <---
Here fk045_hyd1.Lmaj.bam is Leishmania major genome and fk068_Ldono_fU1.ldon.bam is L. donovani so it's fine to drop fk068 as chromosomes don't match. The message (Unable to classify) however is misleading
Searching for seqRegex ^
gives:
Invalid coordinates: 1 0
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: exceptions.InvalidGenomicCoordsException
at tracks.IntervalFeature.validateIntervalFeature(IntervalFeature.java:190)
at tracks.IntervalFeature.intervalFeatureFromBedLine(IntervalFeature.java:139)
at tracks.IntervalFeature.<init>(IntervalFeature.java:57)
at samTextViewer.GenomicCoords.findRegex(GenomicCoords.java:757)
at samTextViewer.Main.main(Main.java:299)
... 5 more
If a an input bam file doesn't have an index, you get a horrible stack trace. Fix it by giving a more gentle message.
Files in plain wig format are not handled correctly (bigWig ok). To load wig files you probably need to have the correct genome set since this is necessary to convert wig to bigWig. In the meantime one can use wigToBigWig from UCSC utils:
wigToBigWig in.wig chrom.sizes out.bw
When printing more features than set by print -n xxx
, the string [n/m features omitted]
is printed above everything and it's not very useful.
trim
doesn't play very well with terminal window resizing try:
ASCIIGenome hg19_genes_head.gtf.gz -r chr1:11583-12359
Then execute trim
-> ok. Then resize terminal window to make it smaller, trim
again. You get:
java.lang.IndexOutOfBoundsException: Index: 85, Size: 85
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.set(ArrayList.java:444)
at tracks.TrackIntervalFeature.printToScreenOneLine(TrackIntervalFeature.java:361)
at tracks.TrackIntervalFeature.printToScreen(TrackIntervalFeature.java:312)
at samTextViewer.TrackProcessor.iterateTracks(TrackProcessor.java:66)
at samTextViewer.InteractiveInput.processInput(InteractiveInput.java:325)
at samTextViewer.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Then resize window again to make it larger. zoom out and go to another feature. trim
again. the window has become smaller.
You probably need to play with getUserWindowSize() in the trim method.
There are other issues with trim. Make this method undocumented for now
Methods ansiForegroundColourToGraphicsColor(int ansiColor)
and Color ansiBackgroundColourToGraphicsColor(int ansiColor)
in coloring.ColorChar.java
do not list all the colours available on screen (e.g. magenta, cyan). Add them.
It would be good to have a brew formula, to install Asciigenome more easily.
(I've already written it, I'll send a push request to linuxbrew in a few minutes)
There is a bug in v0.2.0 where in some cases calling next
incorrectly print chunks of fasta sequence (if fasta file is available). This is because the method GenomicCoords.centerAndExtendGenomicCoords()
changes the coordinates of the GenomicCoords
object but it doesn't update the underlying sequence.
This has been fixed in forthcoming v0.3.0 where refseq
is no loger an attribute of GenomicCoords
. Now the sequence is extracted it fresh from the fasta file using the current coordinates and this gives you an up-to-date sequence.
addTracks should expand wild card characters, e.g. addTracks peaks/hela*.narrowPeak
*.bed
and .*.bed.tbi
. Change it to be *.bed.gz
. It makes no difference but better to be consistent.This should be enabled:
extend 10k
extend 10m
Version 0.6.4: It appears remote tabix files are first downloaded locally. I.e. the remote tabix index is not used!
As shown by infoTracks
, the working file is a local one:
ASCIIGenome
addTracks ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/input_call_sets/ALL.wex.union_illumina_wcmc_bcm_bc_bi.20110521.snps.exome.sites.vcf.gz
infoTracks
infoTracks
------
Track tag: ALL.wex.union_illumina_wcmc_bcm_bc_bi.20110521.snps.exome.sites.vcf.gz#1
Input source: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/input_call_sets/ALL.wex.union_illumina_wcmc_bcm_bc_bi.20110521.snps.exome.sites.vcf.gz
Working file: /tmp/asciigenome.5141873215838905009.ALL.wex.union_illumina_wcmc_bcm_bc_bi.20110521.snps.exome.sites.vcf.gz
Track type: VCF
[h] for help:
ASCIIGenome -b test.bed -x 'zo 3 && FOOBAR && save .png' test.bed
Should exit immediately after hitting the FOOBAR command. Instead it keeps looping.
hideTitle
applied to annotation files leaves a blank one in place of the tile line instead of removing it altogether:
ASCIIGenome -g hg19 -x 'bookmark && hideTitle'
*-------------------110M----------------220M--
<- This line shouldn't be here at all
||||||||||||||||||||||||||||||||||||||||||||||
58 84 110 135 161
chr1:58-174; 117 bp; 2.5 bp/char; Mem: 1011 MB
[h] for help:
Quantitative data files are ok.
say genome.fa
does not have a .fai index attached. Command setGenome genome.fa
should create the appropriate index but it doesn't:
ASCIIGenome
[h] for help: setGenome fk_hmu_pos_01.fa <-- Should create fai index and load genome
Adding a track with a dictionary should automatically set the genome if none is available. This doesn't happen:
ASCIIGenome
addTracks test_data/ds051.actb.bam
showGenome <- Sequence dictionary not available
Also setGenome:
setGenome hg19 <- OK, hg19 set
setGenome mm9 <- OK, replace with mm9
setGenome aln.hg19.bam <- Should replace mm9, but it doesn't!
Generation of consensus sequence should be aware of BS mode so it's easier to spot variations and methylation calls. For example, you could simply use lower case c, t, y (T|C), r (A|G) for calls where ref is C and G and upper case elsewhere.
If dataCol
command gets an invalid column (e.g. containing non-numeric data), ASCIIGenome exits with an ugly message. Fix by recovering gracefully.
This way I could more easily share the output of ASCIIgenome to others without including a legend.
Perhaps better if the separator is not :
, but +
for forward strand and -
for reverse strand?
It appears TabixReader
cannot read from ftp, see samtools/htsjdk#797. This means remote tabix files on ftp cannot be read remotely.
Oops, duplicate of issue #2
It would be hugely helpful if there were a way to list all sequences in the reference genome, and use autocompletion on goto
(or :
) commands.
The following steps lead to an uncaught exception:
ASCIIGenome hg19.gencode_genes_v19.gtf.gz
goto FOOBAR
[h] for help: addTracks ear045.oxBS.actb.bam
Error processing input: [ear045.oxBS.actb.bam]
[h] for help:
That's ok-ish, error could be more informative though.
goto chr1 && addTracks ear045.oxBS.actb.bam && p
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.NullPointerException
at samTextViewer.GenomicCoords.getChromIdeogram(GenomicCoords.java:396)
at samTextViewer.TrackProcessor.iterateTracks(TrackProcessor.java:41)
at samTextViewer.Main.main(Main.java:141)
... 5 more
This is no good, although going to non existing locations repeatedly is something you don't normally do.
If a feature starts right at the start of the chrom, next
doesn't see it:
cat /tmp/tmp.bed
Undefined_contig 0 255
Undefined_contig 1000 1155
Undefined_contig 2000 2155
Undefined_contig 3000 3155
Flip thorough the fur positions:
ASCIIGenome /tmp/tmp.bed
next
next
next
next <- Should return to interval 0-255 instead goes to 1000-1155
This works fine if 0-255 is replaced by 1-255
I'm on it: bioconda/bioconda-recipes#2125
but will need further help/more time. Just in case sb was thinking of trying the same.
Start without any sequence dictionary:
ASCIIGenome
Now set genome, you get an error, even if the position is correctly set:
setGenome ds051.actb.bam
Error processing tracks with input [ds051.actb.bam]
This is more annoying as the track is not loaded (first you need to move to an existing location, then add the track):
ASCIIGenome
addTracks ds051.actb.bam
There is a bit of confusion how ASCIIGenome recognises GTF/GFF2/GFF3 format extensions. This is how it should be:
GTF (extension .gtf) and GFF2 (ext .gff2) are the same format. So .gff2 should be processed as gtf.
GFF3 (ext .gff3) to be processed as gff.
Extension .gff should be processed as if gff3 (?? check me!)
The mutations in the VCF files are colored and the colour leaks into the printed lines from print
ASCIIGenome -r 1:1082518-1128598 -x 'print' CHD.exon.2010_03.sites.vcf.gz
CHD.exon.2010_03.sites.vcf.gz#1; N: 3
A T A
1 1105468 . G A . PASS AA=g;AC=2;AN=174;DP=1238 <- These lines are blue like the last A
1 1108138 rs61733845 C T . PASS AA=c;AC=6;AN=184;DP=1496
1 1110294 rs1320571 G A . PASS AA=A;AC=26;AN=206;DP=4605;HM2;HM3
1100950 1103524 1106099 1108673 1111247 1113822 1116396
1:1100950-1147030; 46,081 bp; 256.0 bp/char; Mem: 129 MB
ASCIIGenome gencode.v25.annotation.gtf
Initializing coordinates... Reading file 'gencode.v25.annotation.gtf'...Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at java.lang.String.replace(String.java:2239)
at tracks.IntervalFeature.intervalFeatureFromGtfLine(IntervalFeature.java:146)
at tracks.IntervalFeature.<init>(IntervalFeature.java:60)
at tracks.IntervalFeatureSet.loadFileIntoIntervalMap(IntervalFeatureSet.java:217)
at tracks.IntervalFeatureSet.<init>(IntervalFeatureSet.java:58)
at tracks.TrackIntervalFeature.<init>(TrackIntervalFeature.java:25)
at samTextViewer.Main.main(Main.java:201)
... 5 more
Perhaps you should add something to the docs about how to increase the memory/GC size? I have no idea how to do it.
Tried on OS X and Ubuntu.
I've tried running ASCIIGenome on a downloaded genome fasta file, and it returned an error as it was not indexed.
However this is not described in the documentation. Should I index it with samtools?
ASCIIGenome -fa Human.hg38.ucsc.FullSet.Nmasked.fa example.bed
Exception in thread "main" java.io.FileNotFoundException: /home/cbapps/Human.hg38.ucsc.FullSet.Nmasked.fa.fai not found.
at htsjdk.samtools.reference.IndexedFastaSequenceFile.findRequiredFastaIndexFile(IndexedFastaSequenceFile.java:110)
at htsjdk.samtools.reference.IndexedFastaSequen
How about accepting both?
https://genome.ucsc.edu/goldenpath/help/bedgraph.html
And I did not understand what the error was because immediately after an error message is displayed, the program opens, so you never see it.
It seems that download from behind a proxy doesn't work.
For example, running the following command doesn't return any output:
ASCIIGenome -g hg19 $encode/wgEncodeSydhTfbsGm10847NfkbTnfaIggrabPk.narrowPeak.gz \ $encode/wgEncodeSydhTfbsGm10847NfkbTnfaIggrabSig.bigWig \ $encode/wgEncodeSydhTfbsGm12892Pol2IggmusPk.narrowPeak.gz \ $encode/wgEncodeSydhTfbsGm12892Pol2IggmusSig.bigWig
The command seems to be stuck, and no message is shown. I assume it is trying to download the files, but no verbose output is shown. Since it is taking a lot of time, I am also guessing that it is getting no connection.
GFFs takes a while to load, which is a bother since I want to visualize hundreds of regions. Therefore I am wondering whether tabix-indexed GFFs/GTFs are accepted? (Haven't been able to index any yet, seems like a hassle.)
An alternative way to solve this would be to have ASCIIGenome accept a bed file of regions to create images from, so that I would only need to load the input files once.
Hi,
I'm using ASCIIGenome to visualise how reads are assigned to contigs.
NEWBLER (OLC assembler) has generated a set of contigs, I've use BWA MEM to align the the read to the contigs. I'm able to visualise this using a sorted .bam file.
Now I want to visualise the variants and proceed to use bcftools call feature to generate a .vcf file. However when i use the addTrack command, the track does not show.
If i were to individually open just the vcf file. it prints:
Cannot add file.vcf; skipping
Hi @dariober ,
I am starting to use ASCIIGenome and would like to produce screenshots with BS-colouring. I couldn't find the notation in the documentation to do that. Can you post an example here?
Example command-line I am using:
ASCIIGenome -ni -r chr9:136946101-123456781 -x "save /data/group/test/TEST_Run001/TEST_Run001-12345678/TST59_70_1b-12345678/TST59-70-1b_S1_L001_R1_001.1.cutB_bismark_bt2_se.dedupled.sorted.bam.chr9:136946101-123456781.ascii.pdf" /data/group/test/TEST_Run001/TEST_Run001-12345678/TST59_70_1b-12345678/TST59-70-1b_S1_L001_R1_001.1.cutB_bismark_bt2_se.dedupled.sorted.bam 1>/dev/null
This is my command:
ASCIIGenome -nf -ni PolII.bed PolII_tabs.bedgraph -fa ~/genomes/hg38.fa \
-x 'goto chr10:69223041-69223072 && save .png && goto chr3:196318127-196318155 && save .png && goto chr8:1755859-1755886 && save .png'
This is the output:
endrebak@havpryd ~/c/asciigenome> ls -latrh *png
-rw-r--r-- 1 endrebak endrebak 53K Aug 3 10:17 chr8_1755859-1755886.png
Note that
ASCIIGenome -nf -ni PolII.bed PolII_tabs.bedgraph -fa ~/genomes/hg38.fa \
-x 'goto chr10:69223041-69223072 && save .png' \
-x 'goto chr3:196318127-196318155 && save .png'
did not work either.
Note that if you fix this you fix #16 (creating many pngs in one go).
Sorry for the issue storm. I'll calm down now, but this seems pretty essential :)
For now I will use a loop and an indexed gtf file.
Need to protect against invalid range when coordinates > Integer.MAX_VALUE are requested. Reproduce:
But if: 1-2000000000
followed by zo
you get
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.RuntimeException: Invalid range: 1--1294967295
at tracks.IntervalFeatureSet.getFeaturesInInterval(IntervalFeatureSet.java:78)
at tracks.TrackIntervalFeature.update(TrackIntervalFeature.java:30)
at samTextViewer.Main.main(Main.java:231)
... 5 more
Reproduce bug:
echo -e "chr1\t1000\t10000" > test.bed
ASCIIGenome -g hg19 -r chr1:1 test.bed
Then issue next
command, you get:
Invalid coordinates: from: -39499;to: 50501Resetting to initial 1-2147483647
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: exceptions.InvalidGenomicCoordsException
at tracks.IntervalFeatureSet.getFeaturesInInterval(IntervalFeatureSet.java:97)
at tracks.TrackIntervalFeature.update(TrackIntervalFeature.java:38)
at tracks.TrackIntervalFeature.<init>(TrackIntervalFeature.java:32)
at samTextViewer.GenomicCoords.findRegex(GenomicCoords.java:701)
at samTextViewer.Main.main(Main.java:294)
... 5 more
Reason:
If next
is executed from position 1 (or there about) and the first feature is close to 1, after extension left and right you get a negative start position which is reset to 1-2147483647 if a genome file is present you then get this exception.
It appears tabix files from ftp cannot be read. For example, read a vcf file from 1000genomes:
import htsjdk.samtools.seekablestream.SeekableStream;
import htsjdk.samtools.seekablestream.SeekableStreamFactory;
import htsjdk.tribble.readers.TabixReader;
String ftp= "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/input_call_sets/ALL.wex.union_illumina_wcmc_bcm_bc_bi.20110521.snps.exome.sites.vcf.gz";
// Here it hangs:
TabixReader tabix= new TabixReader(ftp);
// In particular, this seems to hang:
SeekableStream stream= SeekableStreamFactory
.getInstance()
.getBufferedStream(SeekableStreamFactory.getInstance()
.getStreamFor(ftp));
Would be good to post on biostars/htsjdk list
I tried goto 1:123-456
but the magic words were goto chr1:123-456
. Instead of showing a blank region, it should tell me there is no such chromosome.
Would be neat if it understood 1
to mean chr1
and vice versa.
If an invalid bedgraph is loaded, an empty track is generated. An exception should be thrown instead.
echo -e "chr1 0 10\nchr2 0 10" > test.bedgraph ## note space instead of tab
ASCIIGenome test.bedgraph ## Gives empty track
The tabix index created via htsjdk may skip the first interval of the first contig under certain cases. See samtools/htsjdk#393 for an example.
While we wait for htsjdk to fix this, a temporary hack may be to add a dummy first record on the files susceptible of this bug.
Possibly todo: Rename filter
to grep for consistency with Linux. Command API could be:
grep exon my.*bed
-v
as in grep?grep -v'mRNA exon my*.bed
Dunno if this is hard to fix. You only have to press enter to have the screen reappear though.
Also, pressing cntrl-z/cntrl-c/cntrl-d
without meaning it is all too easy for me. Perhaps adding a prompt (Do you really want to exit
) would be good?
If there are you try to load a single invalid file, you get an ugly error stack. Fix it by issuing a friendlier message. E.g.
ASCIIGenome du.tmp.txt
Initializing coordinates...
Could not initilize from file du.tmp.txt
Reading file 'du.tmp.txt'...First line skipped. Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
Caused by: java.lang.RuntimeException: intervalFeatureFromBedLine: Invalid bed line:
[20, /nas/sblab_data1/berald01/repository/bwameth/ear020_OXBS.ecoli_fastqc/Icons]
at tracks.IntervalFeature.intervalFeatureFromBedLine(IntervalFeature.java:115)
at tracks.IntervalFeature.<init>(IntervalFeature.java:57)
at tracks.IntervalFeatureSet.loadFileIntoIntervalMap(IntervalFeatureSet.java:217)
at tracks.IntervalFeatureSet.<init>(IntervalFeatureSet.java:58)
at tracks.TrackIntervalFeature.<init>(TrackIntervalFeature.java:25)
at samTextViewer.Main.main(Main.java:208)
... 5 more
Currently a white background is used for all text, which looks weird on terminals that have a different background color such as black (used by the super-hacker-y types). It would be great if the user could change the background color or set it to transparency.
Just an idea, dunno if it is hard to implement or not.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.