GithubHelp home page GithubHelp logo

Add T2T positions? about yleaf HOT 4 OPEN

genid avatar genid commented on August 29, 2024 1
Add T2T positions?

from yleaf.

Comments (4)

teepean avatar teepean commented on August 29, 2024 1

Snipsa scrapes snps from Yfull using function load_yfull_snp.

https://github.com/alinja/snipsa/blob/main/haploy.py

from yleaf.

RandyHarr avatar RandyHarr commented on August 29, 2024

I had bounced this around with Thomas and Ted earlier. Not just specific to yLeaf. Was planning on maybe just using a liftover file (from UCSC) to get things back to Build 38 first myself. But then you miss out on the SNPs not in the Build 38 model. The other issue is finding them in the yFull tree. I have not seen them in the JSON tree file one can grab (yet, but maybe I missed them). But I thought they are using them there (unique T2T SNPs not just ones that can liftover).

I rely on yBrowse for the definition of ySNPs. In general, there are more there than on the trees. But not sure how Thomas has kept up to date with T2T ones; not just yFull defined ones but also with the FTDNA defined SNPs from T2T. Hopefully the yFull tree does not have T2T SNPs in its tree that are not in yBrowse. The other hassle is yBrowse only has HG002 v2 and not HG002 v2.7 SNPs / coordinates -- the latter is what is in T2T v2 and what yFull uses. So it seems these issues need to be addressed somehow before expanding yLeafs tables to handle T2T. Curious to hear more thoughts about this.

For reference: https://github.com/marbl/CHM13 and more specifically https://www.ncbi.nlm.nih.gov/nuccore/CP086569.2/

HG002 v2 is CP086569.1 whose coordinates are used in yBrowse for SNPs. (I think Thomas created his own liftover file from Build 38)

HG002 v2.7 and T2T v2 is CP086569.2 which is the corodinates most are using

Tree JSON files can be found via https://github.com/RandyHarr/JSON-Haplogroup-Tree-Parser (see the python code header)

And a reminder that T2T v2 only has the HG002 Y and not the HG002 X. So the PAR regions on Y do not directly relate to those in the T2T CHM13 X. Hence why I wonder if the HPP model using HG002 X and Y and only the CHM13 Autosomes is a better one to align too? Have not seen any comparisons of this. The Y PAR region (nor any region) is masked out in an analysis reference model like the HS / 1K Genome project models.

from yleaf.

RandyHarr avatar RandyHarr commented on August 29, 2024

Snipsa scrapes snps from Yfull using function load_yfull_snp.

https://github.com/alinja/snipsa/blob/main/haploy.py

That appears to be using https://www.yfull.com/snp-list. Which appears to only be yFull identified / named SNPs. Not clear how much overlap with the other lists there is. Their highest count Y SNP is Y571495. Which is about the length of the list. Whereas there are over 1 million SNPs named in yBrowse. There is the separate YP names list (yfull.com/yp/snp-list but that is very short. Only about 7-8,000 names it appears.

(on a side note, That code like Hunter's (Cladefinder, etc) is in Python 2. Which requires a separate Python installation. The last patch update was in 2020 and no real development since 2011. Difficult to find it on all platforms anymore; I believe.

from yleaf.

stuartn60 avatar stuartn60 commented on August 29, 2024

It would be useful to have a new fit of the tree based on the three reference genomes including particularly the recent DF27 CM034974.1. I've read that FTDNA and YFull have experimental T2T trees.

from yleaf.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.