Comments (8)
@gokeson this one is for you!
from ctgap.
-
Create representative reference set to use as a whole genome backbone
do you mean reference strains? If yes, we have 20 of these. And they are the same as the individual ctReferences. -
Same for ompa gene only
We already have this sorted right? I mean all the ompA sequences that we use for BLAST -
Create bed file of recombination regions to mask
I think we should use clonalFrameML after running kSNP. To use ClonalFrameML, we do not need to create a bed file, the program filters the recombination site by itself. I favour this approach mainly because recombination in Ct is ever-evolving. That's actually how some new strains evolve.
The alternative is to use Gubbins which requires prior alignment to a reference genome with SKA. I don't think this is a bad approach as all the 20 strains will be aligned to our choice reference genome, most likely strain D as it is the best annotated genome. -
Use KSNP for generating reference free snps
Here is the note for ksnp:
#Put all of your reference sequences in a folder (all have to be fasta)
#create the list of input files for k-SNP4
MakeKSNP4infile -indir (folder) -outfile myInfile
#find the ideal length of k-mer that will be used for the alignment:
Kchooser4 -in myInfile
#run ksnp
kSNP4 -in myInfile -outdir name_of_output_directory -k 13 -core -ML -vcf
#Core will give you the shared SNPs between all samples, in addition to all SNPs.
#Vcf will output VCFs for your SNPs: what is found where, and what the original base should be and what it is instead.
#ML will run a maximum likelihood tree. This is what we want, since maximum parsimony generates a tree based on minimum distances.
#The output will be multiple files, but we care about: SNPs_all_matrix (the alignment file), and the ML.tre (the tree file).
#After kSNP4, fix the tree using ClonalFrameML to mask recombinant regions:
#Why run ClonalFrameML? Masking recombinant regions is customary practice in generating trees, since they bias the way the tree is generated. Therefore, ClonalFrameML will use your tree and sequences to identify heavily-recombinant regions and branches, note the rho/theta ratio, and re-draw your tree.
#Use the output of k-SNP4 as the input to ClonalFrameML:
ClonalFrameML tree.SNPs_all.ML.tre SNPs_all_matrix.fasta final_clinical_clonalframe
#(first argument is the tree, second argument is the alignment file, the third argument is the prefix to your files)
#The output files are listed here: https://github.com/xavierdidelot/clonalframeml/wiki but what we care about is the tree (.newick). Look at the (em.txt) file if you want to know the rho/theta values, branch lengths (based on recombination distance) etc.
#Visualization:
#FigTree on windows to visualize. It is not necessarily the best, but itβs a very simple GUI
from ctgap.
I think we should avoid computational heavy like ClonalFrameML, how long does it take to run?
from ctgap.
I think we should avoid computational heavy like ClonalFrameML, how long does it take to run?
220 minutes for 125 samples to run clonalframeML after already running ksnp for 3 hours for same samples.
This is where gubins may be advantageous. it requires a ref genome but Ithink that is okay. Everyone uses strain D for gubbins so why not
from ctgap.
This is where gubins may be advantageous. it requires a ref genome but Ithink that is okay. Everyone uses strain D for gubbins so why not
And gubbins is really fast.
If we stick with our one sample one tree approach, gubbins makes sense. Anyone needing to do comparative analyses for multiple samples can use ksnp+clonalframeML
from ctgap.
- Use KSNP for generating reference free snps
#How to run gubbins my way:
#more info at: #running gubbins: https://github.com/nickjcroucher/gubbins/blob/master/docs/gubbins_manual.md
mamba create --name myenvname gubbins
mamba activate gubbins_env
generate_ska_alignment.py --reference seq_X.fa --input input.list --out test.aln
#Where input.list is a tab-delimited file with one row per isolate. The first column should be the isolate name, and the subsequent entries on the same row should contain the corresponding sequence data
run_gubbins.py --prefix gubbins_out test.aln --tree-builder raxml
from ctgap.
On your 125 sample tree with clonalframeML, can you extract the snps which are flagged as recombinant?
from ctgap.
On your 125 sample tree with clonalframeML, can you extract the snps which are flagged as recombinant?
Nope. I tried but can't.
Do we need it though?
from ctgap.
Related Issues (15)
- Changes for v0.02 HOT 7
- v0.3.0 HOT 7
- to-do HOT 1
- How to handle plasmids? HOT 5
- some quick fixes: HOT 1
- error: HOT 1
- Scrubby updates HOT 7
- reference 'guided' consensus assembly
- create plurality consensus from 24 reference genomes HOT 1
- rename outputs to sample name
- add dnaapler to reorientate contigs HOT 1
- automate ref-denovo for specific reference strain
- Add snippy for SNP detection HOT 2
- Add QUASt HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctgap.