GithubHelp home page GithubHelp logo

churchlab / millstone Goto Github PK

View Code? Open in Web Editor NEW
47.0 47.0 19.0 102.65 MB

Genome engineering and analysis software

Home Page: http://churchlab.github.io/millstone/

License: MIT License

Python 84.07% CoffeeScript 0.19% JavaScript 9.59% HTML 4.90% CSS 0.72% Shell 0.08% Jupyter Notebook 0.45%

millstone's People

Contributors

changpingc avatar dbgoodman avatar glebkuznetsov avatar kevinychen avatar tiffanyachen avatar woodymit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

millstone's Issues

Support multiple ALTs

  • Add an is_primary key to the VariantAlternate
  • When the user is generating a ReferenceGenome from a VariantSet that contains Variants with multiple ALTs, the software must require the user to choose an ALT

Merge Variants that affect the same codon

Each of the Variants might not be consequential by itself, or may not capture the full effect of having both.

However, this might caught by the flow where we generate new reference genomes and do iterative analysis.

Structural variants

  • Identify a tool that works pretty well
  • Test it on test genomes with known deletions that we create
  • Integrate with our database representation

test_remove_variant_from_set fails sporadically

For some reason this test fails sometimes when running all the tests. It seems to always pass when run alone.

Perhaps there's a lapse in our understanding of how the Django test framework uses databases.

SnpEff field (INFO.EFF) is currently a long string, and needs to be split into multiple fields.

SnpEff VCF field currently looks like:
'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon | GenotypeNum [ | ERRORS | WARNINGS ] )'

as one long string. We need to split it up into many fields, and add them all as separate info fields in the VCF, by adding new methods to vcf_parser.py.

We could just split them up in our data model only and not touch the snpeff VCF, but it might be annoying since other software might want a normalized VCF file.

Add putative variants for freebayes to look for when creating alignment

Freebayes can take in putative snps and look for them regardless of their actual presence/absence. This will be useful with designed variants. We want to be able to input an existing variant set when performing a new alignment with samples, turn that into a VCF, and send it to Freebayes with:

-@ --variant-input variant_set.vcf

Figure out how to handle the different reference genome IDs and name fields.

Each chromsome corresponds to a a BioPython sequence record, which has many 'descriptive' fields, and different programs use different ones.

This has been causing problems. For instance, snpEff uses the genbank LOCUS (not sure which SeqRecord field this corresponds to), while FASTA uses record.id (which comes from genbank ACCESSION), and internally before our django code used record.name.

For now, I'm just setting all these to be equal, but we need to come up with a unified scheme, including tests and assertions, for making these all agree so the pipeline (genbank -> FastQ -> Bam -> VCF -> SnpEff) doesn't report missing chromosomes when a genbank LOCUS, a genbank ACCESSION, and a FASTA record.id are all different, for instance.

Dynamic generate_filter_key_map() generation.

Different VCFs will have different fields, and so we will need to dynamically update the allowed VARIANT_CALLER_COMMON_MAPs and VARIANT_EVIDENCE_MAPs accordingly. We will add these as JSONFields to ReferenceGenomes, and update them whenever a new vcf is added, either by the user or by the pipeline.

Callable regions

Partitioning the genome into regions that can be called and those that can't based on:

Raw notes

  • how the reads fall (read depth, uniqueness)
  • investigate whether GATK or something has a tool to do this
  • add feature annotations to JBrowse for such regions

Without the actual data

  • read len
  • distance between paired end reads
  • genome sequence

Empirical calling

  • Based on bam
  • Regions without reads (borders on structural variation (deletion) calling )

Allow merging AlignmentGroups

An AlignmentGroup is defined as a set of related alignments that can all have Variants called together. And specifically, all samples are aligned to the same ReferenceGenome.

It's possible that we run some alignments at different times, but we may want to be able to add subsequent alignment runs to an existing AlignmentGroup so that Variants can be called all together.

Analyze / act on variants in regions of the genome

Regions could be duplications, etc. We may want to revert all SNPs in a region, or ignore SNPs in a region, etc.

Regions where SNPs cannot be accurately called due to paralogous sequence (non-unique mapping) or no/low read depth should also be marked.

Support multiple VariantCallerCommonData objects in a view

Right now, when requesting the melted view with test data that includes a Variant that was both imported and called, I'm getting the error:

Traceback (most recent call last):
File "/home/glebk/Projects/churchlab/genome-designer-v2/venv/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 115, in get_response
response = callback(request, _callback_args, *_callback_kwargs)
File "/home/glebk/Projects/churchlab/genome-designer-v2/venv/local/lib/python2.7/site-packages/django/contrib/auth/decorators.py", line 25, in _wrapped_view
return view_func(request, _args, *_kwargs)
File "/home/glebk/Projects/churchlab/genome-designer-v2/genome_designer/../genome_designer/main/xhr_handlers.py", line 93, in get_variant_list
combined_filter_string, is_melted),
File "/home/glebk/Projects/churchlab/genome-designer-v2/genome_designer/main/data_util.py", line 32, in lookup_variants
variant_id_to_metadata_dict))
File "/home/glebk/Projects/churchlab/genome-designer-v2/genome_designer/main/melt_util.py", line 29, in variant_as_melted_list
"objects not implemented yet.")
AssertionError: Support for multiple VariantCallerCommonData objects not implemented yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.