GithubHelp home page GithubHelp logo

How to prepare the gtf? about rpvg HOT 7 CLOSED

ld9866 avatar ld9866 commented on August 23, 2024
How to prepare the gtf?

from rpvg.

Comments (7)

jonassibbesen avatar jonassibbesen commented on August 23, 2024 1

That depends on what you are interested in. If you want to create haplotype-specific transcripts using the haplotypes from those individuals then you would need to add them to the pangenome graph. However, the genotypes need to be phased for that.

from rpvg.

ld9866 avatar ld9866 commented on August 23, 2024

When we print the code, it showed that "(standard_in) 1: syntax error" and we do not how to solve the problem.

grep -P "\ttranscript\t" gencode.v29.primary_assembly.annotation_renamed_full.gtf | cut -f9 | cut -d ';' -f2 | cut -d '"' -f2 | uniq | shuf | head -n $(echo "172449 * ${1} / 100" | bc | cut -d '.' -f1) > transcripts_subset${1}.txt

from rpvg.

jonassibbesen avatar jonassibbesen commented on August 23, 2024

Hi,

Are you interested in replicating the benchmark from the manuscript or do you have your own data that you want to run the pipeline on?

The scripts you are referring to are not really part of the standard pipeline. They were used specifically for the data and benchmarking presented in the manuscript:

  • preprocess.sh renames the contigs in the annotation (column 1) to match the genome used (removes chr prefix) and filters transcripts that are not full length.
  • exons.sh creates a BED file of exon coordinates from the annotation.
  • gene_transcripts.sh creates a table of gene and transcript names.
  • subsample_transcripts.sh creates a subset of the annotation by removing transcripts. Takes as input the percentage that should be kept.

Running these scripts are only needed if you want to exactly replicate the benchmark det was done in the manuscript. If you have your own data and just want to run the pipeline to get expression estimates these scripts are not really needed.

from rpvg.

ld9866 avatar ld9866 commented on August 23, 2024

Hello developers!
Thank you for your patient and timely reply!
I was building a 15 genome pan-genome using minigraph-cactus and got the pan-genome well. However, I also want to add short sequenced data from 500 individuals to our pan-genome to build a more complete pan-genome in order to obtain more complete transcriptome information.
Is it necessary to add these individuals to our pan-genome?
Best yours,

from rpvg.

ld9866 avatar ld9866 commented on August 23, 2024

Thank you for your reply!
Now we have one more little question.
Our genome is about the same size as the human genome. It has been several days since we used vg auto index, but it is still not finished. Would you please tell me how to solve this problem? If we split up the chromosomes and analyzed them, we couldn't match them in the transcriptome, and that confused us.
Best yours,

from rpvg.

jonassibbesen avatar jonassibbesen commented on August 23, 2024

I am unfortunately not able to answer that. You should ask the question on the vg GitHub.

from rpvg.

ld9866 avatar ld9866 commented on August 23, 2024

Thank you for your reply
Best wishes

from rpvg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.