zaneveld / q2-karenina Goto Github PK
View Code? Open in Web Editor NEWQIIME2 plugin for the karenina package for modeling stochastic microbiome dynamics
QIIME2 plugin for the karenina package for modeling stochastic microbiome dynamics
Qiime2 Visualizers must output a valid index.html file in the data directory.
To replicate:
qiime karenina fit-timeseries --p-timepoint-col DaysSinceExperimentStart --p-pcoa weighted_unifrac_pcoa_results.qza --p-metadata None --o-visualization ./moving_pictures_fit_timeseries_movie --p-individual-col 'Subject,BodySite' --p-treatment-col None --verbose --p-method basinhopping
Go to view.qiime2.org. Although the qzv 'loads' (no errors) there is not a way to actually get the data (I know it's there in the zip file, but we want to support qiime tools view and view.qiime2.org).
Approach
We need our visualizer to output a q2-compatible index.html. Working by example, the emperor plugin uses the following steps to output it's index.html (comments are my interpretation):
#pkg_resources helps us find the plugin
import pkg_resources
#Find the filepath for the q2_emperor folder 'assets' and save it to TEMPLATEs
#NOTE: see stackoverflow example here: https://stackoverflow.com/questions/39104/finding-a-file-in-a-python-module-distribution
TEMPLATES = pkg_resources.resource_filename('q2_emperor', 'assets')
#get the path to our basic, unfilled index.html file (in the assets folder of q2_emperor)
index = os.path.join(TEMPLATES, 'index.html')
#Use q2_templates.render to fill in data specific to our output in this visualization.
#Documentation for q2_templates.render is available here: https://github.com/qiime2/q2templates/blob/master/q2templates/_templates.py
q2templates.render(index, output_dir, context={'plot_name': plot_name})
How to modify
I think we should be able to use this without too much modification, as long as we can find an index.html file that is a) compatible with q2templates.render and b) not specific to another package. We can then modify to add project specific info.
Hi!
I'm really excited to use q2-karenina, but for reasons
As I go through the code, there are a number of things that could be done to make it easier, would you be open to a refactor?
To replicate
qiime karenina --help
To solve
This issue requires updating text only, not code. Currently the karenina help text gives help for what looks like spatial ornstein uhlenbeck.py rather than the whole module. This should be an easy fix resolved by updating the info we provide Qiime2 on plugin registration.
The purpose of this issue is two-fold.
pull the metadata out of the simulation.qzv file and reference it directly to show users the implementation. Add this and explain in tutorial.
Per e-mail discussion with Greg, it's totally fine to add our QIIME2 tutorial under version control in the q2-karenina project, and then just post a brief description of what the module does to the qiime community forum.
This is vastly preferable IMO because it allows us to update the tutorial to add new capabilites, etc. It also is more discoverable to users that come to the GitHub page from somewhere other than QIIME2.
Currently, the tutorial uses metadata that is already embedded in the .qza for the moving pictures tutorial. It would probably be clearer to have users separately download the moving pictures metadata and supply it to our script as a parameter, as that will be closer to the workflow we'd expect them to do. (Having the metadata file separate is also more convenient because these get updated frequently and you often want to run the same data with the latest iteration of the metadata).
For an example see q2 diversity beta_group_significance in the moving pictures tutorial.
https://docs.qiime2.org/2018.8/tutorials/moving-pictures/
In _spatial_ornstein_uhlenbeck shares functionality with lines 289-303 in the main() function of spatial_ornstein_uhlenbeck.py in karenina.
These should be consolidated into a single set_up_experiment() function that returns an Experiment object. That function can then be imported into q2-karenina where needed.
To replicate
Currently, the user has to type a monster command to get basic output:
qiime karenina fit-timeseries --p-timepoint-col DaysSinceExperimentStart --p-pcoa weighted_unifrac_pcoa_results.qza --p-metadata None --o-visualization ./moving_pictures_fit_timeseries_movie --p-individual-col 'Subject,BodySite' --p-treatment-col None --p-method basinhopping
This level of complexity isn't necessary, and may make it less likely that new users complete the tutorial.
To solve
Of these parameters, several should be changed to take a default value.
Should definitely be optional parameters with a default:
--p-method (default --> basinhopping)
Should probably be optional parameters:
--p-metadata (default --> None)
--p-treatment-col (default -- 'None')
Arguable either way IMO:
--o-visualization (default -- './karenina_fit_timeseries_results/')
--p-individual-col (default -- 'Individual')
We should check whether resolving this for QIIME2 also requires changing the underlying scripts in karenina
The current tutorial should be restructured. This first issue is just a basic restructure without any underlying code changes. Some other updates may require updates to the code, but this is the low-hanging fruit.
Tutorial notes:
Notes that apply throughout the tutorial:
-- less is more
-- Can we make the images (esp the results image for fit timeseries) smaller? I found the large images a bit distracting.
-- Can we ask the users to do steps first, then show them the result? I think we want to 1) set up why they are doing a step first 2) give them the command to do it 3) show them the output they should expect (and if relevant how to look at the results) and 4) interpret the output. I think showing the results first can be a bit overwhelming.
-- Change “Usage: qiime karenina fit_timeseries” to “Fitting an Ornstein-Uhlenbeck model to time-series data”. More generally, let's avoid 'usage' since that's typical for function or script-level documentation. We can just tell them what each step does in a more conversational way.
Installation notes:
-- If users already have a qiime2 environment in conda and didn’t install Karenina directly into it, they may need to manually run setup.py build and setup.py install in q2-karenina (I ran this in Karenina and then q2-karenina and it fixed the issue), and then qiime dev refresh-cache.
-- We might want to ask users to check that they have their qiime2 environment activated, if relevant.
-- Check what conda envs are available:
conda info --envs
#Find your qiime2 environment in the list
-- Activate your qiime2 environment
source activate [name of your enviornment here]
for example, if in your qiime2 environment was qiime2-2018.6:
source activate qiime2-2018.6
Data download notes:
-- We should have a Tutorial Data section right after the instructions for installing the plugin. Ideally the workflow should be: a) install b) download all data to complete the overall tutorial c) do all tutorial steps. It's easy to miss a download these steps are interspersed.
-- Move the instructions for downloading the Moving Pictures .qza file and all other data required for the full tutorial immediately after the installation instructions. Ideally users should be able to get everything setup in this section and then be able to do the rest of the tutorial without having to go back to their browser.
-- I don’t think we want to add the full usage information for each script into the tutorial, as this makes it more difficult to follow. Or if we do, make it a tutorial step that we ask them to do: “Run the following command to list the available user options for karenina fit_timeseries.
Down below I've outlined how I think the tutorial could be revised. The text will hopefully be useful but may need to be merged/integrated with existing text
Overview of the parts of the tutorial
Section 1: Setup
Section 2: Fit timeseries
5. Discuss why someone would want to use the fit-timerseries script.
6. Run fit-timeseries –-help to see user options.
7. Run fit-timeseries on the moving pictures data.
8. Tell the user how to get to the results.
9. Discuss how we interpret the underlying parameters.
Section 3: Simulation and benchmarking
10. [To do] Discuss how to simulate and benchmark data
1.Check that the q2-karenina module is properly installed and
accessible from qiime2:
qiime --help
karenina should appear in the list of available commands. If it does not, you may need to make sure that your current environemnt has karenina installed (see Installation above), and/or to refresh your qiime cache (using qiime dev refresh-cache)
qiime karenina --help
You should see three possible commands: fit_timeseries, spatial-ornstein-uhlenbeck, and visualization. Fit timerseries is used to fit an Ornstein-Uhlenbeck model to PCoA data. spatial-ornstein-uhlenbeck simulates qiime2 PCoA and metadata using OU models. Visualization (which requres the ffmpeg software on the path) generates movie files from PCoA and metadata.
q2-karenina models microbial communities in PCoA space as if they were physical particles. Microbial community changes reflected by shifts in position in the first three axes of a PCoA plot (PC1,PC2,PC3) are separately fit using OU models.
If microbial communities were changing purely randomly that could be described using Brownian motion dx/dt = W*sigma, where W is the Weiner process (effectively a draw from a normal distribution), and sigma scales the velocity of the random displacement (higher sigma = more rapid movement). Ornstein-Uhlenbeck models are a simple extension of Brownian motion that introduce an additional concept: each particle has a home position (represented by the variable theta) to which it is attracted at each timestep. The strength of that attraction is controlled by lambda.
Mathematically this is represented as follows:
dx/dt = W*sigma + (x - theta)*lambda
Where
dx/dt = change in x with time
W = the random change in position from the Weiner process
sigma = a scaling factor representing intrinsic volatility
theta = the 'home' position to which a particle is attracted
lambda = how strongly the particle is attracted to its home position.
These models have three parameters:
sigma - this describes the intrinsic volatility of a community at each timepoint (higher numbers = more volatile)
theta - this describes an attractor or 'home location' that the process will tend to return to over time.
lambda - the strength with which the particle is attracted back to its home location.
Given a PCoA and metadata for individuals sampled over time, karenina will output a .qzv file at the specified location (NOTE: the .qzv extension will be appended).
qiime karenina fit-timeseries --p-timepoint-col DaysSinceExperimentStart --p-pcoa weighted_unifrac_pcoa_results.qza --p-metadata None --o-visualization ./moving_pictures_fit_timeseries_results --p-individual-col 'Subject,BodySite' --p-treatment-col None --verbose --p-method basinhopping
The resulting .qza file can be viewed as with any other QIIME2 visualization.
One way to do this on the commandline is to use the 'qiime tools view' command. The usage is:
qiime tools view [path to your file]
So to view results for the tutorial file, type:
qiime tools view ./moving_pictures_timeseries_results.qzv
Alternatively, you can navigate a web browser to view.qiime2.org and drag the .qzv file where indicated to view the results.
Either way, you should get a single link to a text file containing the results of the model fit.
Finally, if you're having any trouble, it's worth noting that .qzv files are simply zipped directories. It is also possible to unzip them directly and navigate their internal file structure to find what you need (although this isn't necessary in most cases). The results will be in the /data/ folder.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.