nlabrad / cims-cyanobacterial-its-motif-slicer Goto Github PK
View Code? Open in Web Editor NEWA tool to identify and extract the commonly used ITS folding motifs from a 16s-23s rRNA sequence.
License: GNU General Public License v3.0
A tool to identify and extract the commonly used ITS folding motifs from a 16s-23s rRNA sequence.
License: GNU General Public License v3.0
In organism with A# KU574618.1 the box B is being truncated. It should be returning multiple possibilities including the GCTG thats about 10 bps down from the one its selecting now
I have found that these two genera (which are commonly examined) have "CCTCCTA" as the end of their 16s and "GACAA" as the beginning of their D1D1 region. Perhaps we can code these possibilities into the software or even have a question that asks, "Is this a sequence from Synechococcus or cynobium?" when the software finds CCTCCTA instead of CCTCCTT.
In the most recent version, when the program runs into a sequence that doesnt contain the ITS region, it wont automatically move onto the next sequence when saving the output as a text file. This results in output files that are truncated and dont include possibe ITS motifs that would have been found by the program.
I have tried with many fasta files and I am getting NO d1d1 output for any of them. It says "d1d1 not found in this sequence" everytime.
Here are the two fasta files that were not working.
Program quit after first organism was processed.
Sometimes there are two D1D1 endings that are close nearby and the program takes the first one, but a few times (especially when the sequence was 30-40ish base pairs) it needed the second D1D1 ending. Sometimes it will still fold nicely (but will only have one bubble) but usually the structure is broken. Some sequences I ran that had this issues were MT764787.1, MG255294.1, KF941246.1, KF941239.1. I'm not sure if its always the case that the second one should be choose or just sometimes when the sequence is too short but I just saw it happen a few times.
Code worked and found boxb but couldn't find leader or D1--D1 has typical start and end sequences, so I think it may be because there is an earlier "CCTCCTT" which occurs in the sequence
accession number:
MT135015.1
Please set a limit to how long boxB can be because currently there are times where 7 possible boxB sequences will be in the output and some of them are wayyyy too long. Lets cap it at 80 for now. Thanks
'Tis all in the file below!
issue 1.docx
Sometimes it finds the beginning of BoxB too early. There can be another true beginning for boxB a couple of basepairs away from the one it grabs. Can we return both possibilities to the user?
Can we change tRNA1 to tRNA_ile and change tRNA2 to tRNA_ala? Thanks!
Change layout of the project to be modular instead of a monolithic single file.
Ran this Rivularia ITS sequence (fasta file attached) through the script, and found that it correctly identified everything except boxB--which surprised me, because boxB in this sequence begins and ends with the familiar [CAGCA]...[TGCTG] pattern. But then I looked at the script and saw that it's looking for CAGCA(AorC) at the beginning of boxB, and wouldn't ya know it: this boxB begins with CAGCAT!
rivularia_input.txt
Can't figure out why this BoxB sequence couldn't be found in two Fischerella spp (order Nostocales, both spp have the same boxb sequence): TAGCATCTGAATGAAAATATTCAGGCTGCTG
Accession numbers for sequences: DQ786173.1 and DQ786171.1
previous version(s?) of the code identified boxb correctly, but now identifies incorrect sequences as boxb; all begin with "TAGCA"
accession numbers for sequences with this issue (all in order Nostocales):
KF417427.1
MK953008.1
MN15981.1
When using the -t flag i found that the output says "tRNA1:" and "tRNA1:" instead of "tRNA1:" and "tRNA2:" OR, better yet, "tRNAile:" and "tRNAala:"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.