churchlab / millstone Goto Github PK
View Code? Open in Web Editor NEWGenome engineering and analysis software
Home Page: http://churchlab.github.io/millstone/
License: MIT License
Genome engineering and analysis software
Home Page: http://churchlab.github.io/millstone/
License: MIT License
Each of the Variants might not be consequential by itself, or may not capture the full effect of having both.
However, this might caught by the flow where we generate new reference genomes and do iterative analysis.
Every row seems to have an empty K/V pair {None:''}, not sure why. Here I remove it by hand: https://github.com/churchlab/genome-designer-v2/blob/master/genome_designer/scripts/import_util.py#L102
This can be remedied by overriding the delete() method for models that filesystem data so that the respective filesystem data is deleted.
Also, it might make sense to make a test filesystem location.
Giving this to you Dan since you probably know how to do this after mucking around with datatables more than I have.
For example, we want to trace the lineage of a particular SNP.
This should probably be some sort of spreadsheet template/upload flow like the sample upload.
Probably want to separate the view into 3 tabs:
Starting to sketch this in the Google Powerpoint Mocks:
https://docs.google.com/presentation/d/1xIEn89cz6Pw1r-Ap8Cn5Olr9Xz_0_S1cwYoSAR12myw/edit
For some reason this test fails sometimes when running all the tests. It seems to always pass when run alone.
Perhaps there's a lapse in our understanding of how the Django test framework uses databases.
SnpEff VCF field currently looks like:
'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon | GenotypeNum [ | ERRORS | WARNINGS ] )'
as one long string. We need to split it up into many fields, and add them all as separate info fields in the VCF, by adding new methods to vcf_parser.py.
We could just split them up in our data model only and not touch the snpeff VCF, but it might be annoying since other software might want a normalized VCF file.
Also add more tests
As of 8f99e2f the tests are broken, even after Dan added missing files.
For some reason, the tests pass when run as part of the suite, but only running the test_variant_filter.py module.
I am debugging now.
Three levels of implementation:
This should be implemented by using our application-level (in python) database view objects.
Changping found out that his test was failing due to Rscript missing.
Freebayes can take in putative snps and look for them regardless of their actual presence/absence. This will be useful with designed variants. We want to be able to input an existing variant set when performing a new alignment with samples, turn that into a VCF, and send it to Freebayes with:
-@ --variant-input variant_set.vcf
Right now these filter operators are applied on the level of Variants without regard to the 'sample_variant_set_association' field of VariantToVariantSet.
Let's obliterate 2 birds with one stone here and get this to work straight on AWS.
Each chromsome corresponds to a a BioPython sequence record, which has many 'descriptive' fields, and different programs use different ones.
This has been causing problems. For instance, snpEff uses the genbank LOCUS
(not sure which SeqRecord field this corresponds to), while FASTA uses record.id
(which comes from genbank ACCESSION
), and internally before our django code used record.name
.
For now, I'm just setting all these to be equal, but we need to come up with a unified scheme, including tests and assertions, for making these all agree so the pipeline (genbank -> FastQ -> Bam -> VCF -> SnpEff) doesn't report missing chromosomes when a genbank LOCUS
, a genbank ACCESSION
, and a FASTA record.id
are all different, for instance.
These files are created and not cleaned up:
genome_designer/snpEff_genes.txt
genome_designer/snpEff_summary.html
Thus you can save a link to a particular view and navigate there without having to go through the whole series of button clicks to recreate a filter.
Also, I realized that this tab would also control variant calling, so perhaps the tab should be called something other than 'Align'?
Different VCFs will have different fields, and so we will need to dynamically update the allowed VARIANT_CALLER_COMMON_MAPs and VARIANT_EVIDENCE_MAPs accordingly. We will add these as JSONFields to ReferenceGenomes, and update them whenever a new vcf is added, either by the user or by the pipeline.
Partitioning the genome into regions that can be called and those that can't based on:
This happens only on the initial load of Variant data, but not on subsequent paginated server-side loads:
Uncaught TypeError: Cannot read property '_iDisplayStart' of null
An AlignmentGroup is defined as a set of related alignments that can all have Variants called together. And specifically, all samples are aligned to the same ReferenceGenome.
It's possible that we run some alignments at different times, but we may want to be able to add subsequent alignment runs to an existing AlignmentGroup so that Variants can be called all together.
Regions could be duplications, etc. We may want to revert all SNPs in a region, or ignore SNPs in a region, etc.
Regions where SNPs cannot be accurately called due to paralogous sequence (non-unique mapping) or no/low read depth should also be marked.
This is not trivial because while Django has operators for, e.g. <= is __lte, it does not have one for __ne. Instead clients should use Q objects.
Right now, when requesting the melted view with test data that includes a Variant that was both imported and called, I'm getting the error:
Traceback (most recent call last):
File "/home/glebk/Projects/churchlab/genome-designer-v2/venv/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 115, in get_response
response = callback(request, _callback_args, *_callback_kwargs)
File "/home/glebk/Projects/churchlab/genome-designer-v2/venv/local/lib/python2.7/site-packages/django/contrib/auth/decorators.py", line 25, in _wrapped_view
return view_func(request, _args, *_kwargs)
File "/home/glebk/Projects/churchlab/genome-designer-v2/genome_designer/../genome_designer/main/xhr_handlers.py", line 93, in get_variant_list
combined_filter_string, is_melted),
File "/home/glebk/Projects/churchlab/genome-designer-v2/genome_designer/main/data_util.py", line 32, in lookup_variants
variant_id_to_metadata_dict))
File "/home/glebk/Projects/churchlab/genome-designer-v2/genome_designer/main/melt_util.py", line 29, in variant_as_melted_list
"objects not implemented yet.")
AssertionError: Support for multiple VariantCallerCommonData objects not implemented yet.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.