GithubHelp home page GithubHelp logo

usegalaxy-eu / galaxy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from galaxyproject/galaxy

2.0 2.0 10.0 683.85 MB

Data intensive science for everyone.

Home Page: https://galaxyproject.org/

License: Other

Shell 0.54% Dockerfile 0.03% Makefile 0.05% JavaScript 18.89% Vue 6.86% CSS 1.13% Python 61.44% Mako 2.58% Perl 0.22% HTML 0.13% Lua 0.02% Jupyter Notebook 3.86% SCSS 0.86% R 0.11% Smarty 0.02% TypeScript 3.25%

galaxy's People

Contributors

anatskiy avatar anuprulez avatar assuntad23 avatar bernt-matthias avatar bgruening avatar blankenberg avatar carlfeberhard avatar dannon avatar davebx avatar davelopez avatar electronicblueberry avatar fubar2 avatar gregvonkuster avatar guerler avatar hexylena avatar jdavcs avatar jgoecks avatar jmchilton avatar jxtx avatar kanwei avatar martenson avatar mvdbeek avatar natefoo avatar nerdinacan avatar nsoranzo avatar nuwang avatar olegzharkov avatar pcm32 avatar selten avatar vjalili avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

galaxy's Issues

Stats issue to be reconsidered for the beacon integration

Copied this over from #131.
Once the beacon integration goes live and sees some use, the limitations described here should be revisited.


The import method uses the following info fields

  • AN will map to callCount in the beacon DB. Has 2 * num_called as a fallback (num_called is calculated from VCF)
  • AF will map to frequency in the beacon DB. Has AC / AN as a fallback
  • VT will map to varianttype in the beacon DB. Database field is nullable, so it still imports fine without this
  • AC will map to alleleCount in the beacon DB. Will break the import when missing (for this dataset) - I added a line that AC is required.

There is an option min_ac for filtering out variants that were seen less than a minimal amount (1 by default). I currently set this to 0 - setting this to anything higher than 0 will also break the import for anything that does not contain VT (and maybe others too)

The import is a bit "python-esc" ๐Ÿ˜…

It has an _unpack method, that reads the INFO fields into nested lists.
While inserting variants list entries are just accessed by indices, leading to "index out of bounds" exceptions whenever something is not set.
There is a try/catch block around the whole for each variant in variants loop that catches these exceptions, cancelling 1000 variant imports a pop.


There is also a whole other block that is calling for SVTYPE and MATEID info field. I just never had any data with variant.is_sv == true


On another note, the same variant is never added twice duo to ON CONFLICT (datasetId, chromosome, start, reference, alternate) DO NOTHING.
In an ideal world we would increment sample and allele counts and recalculate the allele frequency.

But Iยดd argue that its not that big of a deal, since the datasets uploaded by users are arbitrary and therefore allele frequency across this data has not much meaning anyway.


TL;DR;

Had to add AC info field as a requirement.

The import routine that comes with beacon-python was written for a specific kind of dataset. It does the job for now, but if the feature sees some use we will write our own importer to handle all kinds of data (as suggested in the docs).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.