Comments (7)
Hi @XiaoyuZhan520 ,
Can you confirm that in the collapsed.group.txt
file that the PBID PB.10113.15
has multiple members? If so, then this discrepancy comes from collapsing multiple redundant sequences that have the same exonic structure but may vary slightly in the exact 5'/3' ends. The GFF file will display the longest 5' and 3' sites (which may be attributed by different collapsed members) but for rep.fa
it is showing just one of the sequences from that collapsed set of members. Hence the small difference.
Another possibility of the difference - especially in the case of the sequence you posted - is difference in the transcript sequence and mapped alignment. The end of the sequence is all "A"s which may be untrimmed polyA sequences. They would hence not be mapped to the genome - and result in a shorter exonic length - then the sequence itself.
--Liz
from cdna_cupcake.
Hello,
Thanks for the quick answer.
Yes, indeed, PB.10113.15 was collapsed from multiple transcripts (around 20). And I can find a exactly same one from the non-collapsed transcripts.
According to the information above, can I make use of the information from collapsed.gff file and capture transcripts sequence from genome and get the matching sequence?
Besides, could you please tell me the difference between collapsed.gff file and collapsed.gff.unfuzzy, as well as collapsed.group.txt and collapsed.group.txt.unfuzzy file.
from cdna_cupcake.
Hi @XiaoyuZhan520 ,
can I make use of the information from collapsed.gff file and capture transcripts sequence from genome and get the matching sequence?
Do you mean: if you reconstitute the sequence form the genome according to GFF, will it match exactly (minus the small difference in length) the representative sequence?
Yes and No. It will match exactly except the PacBio transcript may contain SNPs not contained in the reference genome. Possibly, there may be a very small number of residual errors.
re: what is ".unfuzzy" files
The collapse script has a --fuzzy_junction
parameter. That is, two transcripts that have the same mapped exonic structures except the junctions are different by a small amount (default: 5 bp) will still be considered identical and collapsed. Most likely, small number of junction diff (5 bp or less) are caused by residual errors not well aligned by GMAP or other aligners.
--Liz
from cdna_cupcake.
Thank you very much. Really appreciate for your help
from cdna_cupcake.
Hello Liz,
I found it quite confusing to collapse the transcripts and I hope you can provide me any idea.
For example, transcripts PB.6923.1, which is 1324 nucleotide long. It was mapped to the genome with 124 nucleotide perfectly mapped, 1200 nucleotide mapped to N (gap). This transcript passed the judgement of -c 0.85 -i 0.85 and collapsed into one final transcript.
Do you think it is reasonable to act like this?
from cdna_cupcake.
Hi @XiaoyuZhan520 ,
This is an odd case. Can you please share the following information:
- the collapsed (
.rep.fq
) fasta/fastq sequence forPB.6923.1
- the GFF annotation for it
PB.6923.1
- the line that starts with
PB.6923.1
in*.collapsed.group.txt
You may share this privately by sending it to [email protected].
Thanks,
--Liz
from cdna_cupcake.
Closing unless there is further information.
from cdna_cupcake.
Related Issues (20)
- Struggling in identifying the true positive fusion transcripts outputting from fusion_finder.py
- Question: where does the count.txt file come from?
- cluster id mismatch issue in get_abundance_post_collapse.py HOT 3
- chain_samples.py does not work!
- cupcake2 directory missing, compilation fails HOT 4
- Create sam format for collapse_isoforms_by_sam.py
- lima for skera.bam
- fusion_collate_info.py script has bugs and its not working.
- TypeError: iter_gmap_sam() got an unexpected keyword argument 'type' using collapse_isoforms_by_sam.py
- run_preCluster.py HOT 1
- setup.py requires sklearn instead of scikit-learn HOT 1
- fusion_collate_info.py KeyError: 'count_fl' HOT 3
- Cython compiler error when installing/building cDNA_cupcake HOT 2
- solved
- Saturation analysis bug fixes; compatibility with newer isoseq cluster [make_file_for_subsampling_from_collapsed.py]
- AttributeError: module 'numpy' has no attribute 'int'. HOT 2
- ModuleNotFoundError: No module named 'cupcake.tofu' after installing cdna_cupcake in conda subenv HOT 3
- fusion_finder.py HOT 1
- Demultiplex after clustering / genome alignment
- cdna_cupcake与SQANTI3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cdna_cupcake.