zaneveld / q2-karenina Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 3.0 379 KB

QIIME2 plugin for the karenina package for modeling stochastic microbiome dynamics

Python 97.31% HTML 2.69%

q2-karenina's People

Contributors

Watchers

Forkers

slpeoples sonettd jwdebelius

q2-karenina's Issues

No index .html file is output in the qzv file

Qiime2 Visualizers must output a valid index.html file in the data directory.

To replicate:
qiime karenina fit-timeseries --p-timepoint-col DaysSinceExperimentStart --p-pcoa weighted_unifrac_pcoa_results.qza --p-metadata None --o-visualization ./moving_pictures_fit_timeseries_movie --p-individual-col 'Subject,BodySite' --p-treatment-col None --verbose --p-method basinhopping

Go to view.qiime2.org. Although the qzv 'loads' (no errors) there is not a way to actually get the data (I know it's there in the zip file, but we want to support qiime tools view and view.qiime2.org).

Approach

We need our visualizer to output a q2-compatible index.html. Working by example, the emperor plugin uses the following steps to output it's index.html (comments are my interpretation):

#pkg_resources helps us find the plugin
import pkg_resources

#Find the filepath for the q2_emperor folder 'assets' and save it to TEMPLATEs
#NOTE: see stackoverflow example here: https://stackoverflow.com/questions/39104/finding-a-file-in-a-python-module-distribution

TEMPLATES = pkg_resources.resource_filename('q2_emperor', 'assets')

#get the path to our basic, unfilled index.html file (in the assets folder of q2_emperor)
index = os.path.join(TEMPLATES, 'index.html')

#Use q2_templates.render to fill in data specific to our output in this visualization.
#Documentation for q2_templates.render is available here: https://github.com/qiime2/q2templates/blob/master/q2templates/_templates.py

q2templates.render(index, output_dir, context={'plot_name': plot_name})

How to modify

I think we should be able to use this without too much modification, as long as we can find an index.html file that is a) compatible with q2templates.render and b) not specific to another package. We can then modify to add project specific info.

Modify the plugin to be used with all APIs

Hi!

I'm really excited to use q2-karenina, but for reasons™️, I'd like to be able to use the python API rather than going through the command line.
As I go through the code, there are a number of things that could be done to make it easier, would you be open to a refactor?

MINOR: q2-karenina plugin description in the qiime 2 plugin lit. is only for spatial ornstein uhlenbeck.py

To replicate
qiime karenina --help

To solve
This issue requires updating text only, not code. Currently the karenina help text gives help for what looks like spatial ornstein uhlenbeck.py rather than the whole module. This should be an easy fix resolved by updating the info we provide Qiime2 on plugin registration.

Expand on the scientific motivation behind karenina in readme.md

The purpose of this issue is two-fold.

Orient new users as to why they might want to use q2-karenina
Serve as a template for our q2 forum post announcing the module

Update tutorial to use specific metadata

pull the metadata out of the simulation.qzv file and reference it directly to show users the implementation. Add this and explain in tutorial.

Add qiime2 karenina tutorial documentation to q2-karenina

Per e-mail discussion with Greg, it's totally fine to add our QIIME2 tutorial under version control in the q2-karenina project, and then just post a brief description of what the module does to the qiime community forum.

This is vastly preferable IMO because it allows us to update the tutorial to add new capabilites, etc. It also is more discoverable to users that come to the GitHub page from somewhere other than QIIME2.

Modify tutorial to use externally supplied metadata to define individuals, body sites

Currently, the tutorial uses metadata that is already embedded in the .qza for the moving pictures tutorial. It would probably be clearer to have users separately download the moving pictures metadata and supply it to our script as a parameter, as that will be closer to the workflow we'd expect them to do. (Having the metadata file separate is also more convenient because these get updated frequently and you often want to run the same data with the latest iteration of the metadata).

For an example see q2 diversity beta_group_significance in the moving pictures tutorial.
https://docs.qiime2.org/2018.8/tutorials/moving-pictures/

Change spatial_ornstein_uhlenbeck.py to simulate.py

Consolidate code that's redundant between karenina and q2-karenina

In _spatial_ornstein_uhlenbeck shares functionality with lines 289-303 in the main() function of spatial_ornstein_uhlenbeck.py in karenina.

These should be consolidated into a single set_up_experiment() function that returns an Experiment object. That function can then be imported into q2-karenina where needed.

Too many required options in QIIME2 user interface for fit-timeseries.

To replicate

Currently, the user has to type a monster command to get basic output:
qiime karenina fit-timeseries --p-timepoint-col DaysSinceExperimentStart --p-pcoa weighted_unifrac_pcoa_results.qza --p-metadata None --o-visualization ./moving_pictures_fit_timeseries_movie --p-individual-col 'Subject,BodySite' --p-treatment-col None --p-method basinhopping

This level of complexity isn't necessary, and may make it less likely that new users complete the tutorial.

To solve
Of these parameters, several should be changed to take a default value.

Should definitely be optional parameters with a default:
--p-method (default --> basinhopping)

Should probably be optional parameters:
--p-metadata (default --> None)
--p-treatment-col (default -- 'None')

Arguable either way IMO:
--o-visualization (default -- './karenina_fit_timeseries_results/')
--p-individual-col (default -- 'Individual')

We should check whether resolving this for QIIME2 also requires changing the underlying scripts in karenina

Restructure tutorial (see below for suggestions)

The current tutorial should be restructured. This first issue is just a basic restructure without any underlying code changes. Some other updates may require updates to the code, but this is the low-hanging fruit.

Tutorial notes:

Notes that apply throughout the tutorial:

-- less is more

-- Can we make the images (esp the results image for fit timeseries) smaller? I found the large images a bit distracting.

-- Can we ask the users to do steps first, then show them the result? I think we want to 1) set up why they are doing a step first 2) give them the command to do it 3) show them the output they should expect (and if relevant how to look at the results) and 4) interpret the output. I think showing the results first can be a bit overwhelming.

-- Change “Usage: qiime karenina fit_timeseries” to “Fitting an Ornstein-Uhlenbeck model to time-series data”. More generally, let's avoid 'usage' since that's typical for function or script-level documentation. We can just tell them what each step does in a more conversational way.

Installation notes:

-- If users already have a qiime2 environment in conda and didn’t install Karenina directly into it, they may need to manually run setup.py build and setup.py install in q2-karenina (I ran this in Karenina and then q2-karenina and it fixed the issue), and then qiime dev refresh-cache.

-- We might want to ask users to check that they have their qiime2 environment activated, if relevant.

-- Check what conda envs are available:
conda info --envs

#Find your qiime2 environment in the list

-- Activate your qiime2 environment

source activate [name of your enviornment here]

for example, if in your qiime2 environment was qiime2-2018.6:

source activate qiime2-2018.6

Data download notes:

-- We should have a Tutorial Data section right after the instructions for installing the plugin. Ideally the workflow should be: a) install b) download all data to complete the overall tutorial c) do all tutorial steps. It's easy to miss a download these steps are interspersed.

-- Move the instructions for downloading the Moving Pictures .qza file and all other data required for the full tutorial immediately after the installation instructions. Ideally users should be able to get everything setup in this section and then be able to do the rest of the tutorial without having to go back to their browser.

-- I don’t think we want to add the full usage information for each script into the tutorial, as this makes it more difficult to follow. Or if we do, make it a tutorial step that we ask them to do: “Run the following command to list the available user options for karenina fit_timeseries.

Down below I've outlined how I think the tutorial could be revised. The text will hopefully be useful but may need to be merged/integrated with existing text

Overview of the parts of the tutorial

Section 1: Setup

Ensure your qiime environment is active
Install q2-karenina
Download all data necessary for the tutorial
Run qiime karenina –help to see available scripts.

Section 2: Fit timeseries
5. Discuss why someone would want to use the fit-timerseries script.
6. Run fit-timeseries –-help to see user options.
7. Run fit-timeseries on the moving pictures data.
8. Tell the user how to get to the results.
9. Discuss how we interpret the underlying parameters.

Section 3: Simulation and benchmarking
10. [To do] Discuss how to simulate and benchmark data

1.Check that the q2-karenina module is properly installed and
accessible from qiime2:

qiime --help

karenina should appear in the list of available commands. If it does not, you may need to make sure that your current environemnt has karenina installed (see Installation above), and/or to refresh your qiime cache (using qiime dev refresh-cache)

Access help for the karenina module:

qiime karenina --help

You should see three possible commands: fit_timeseries, spatial-ornstein-uhlenbeck, and visualization. Fit timerseries is used to fit an Ornstein-Uhlenbeck model to PCoA data. spatial-ornstein-uhlenbeck simulates qiime2 PCoA and metadata using OU models. Visualization (which requres the ffmpeg software on the path) generates movie files from PCoA and metadata.

Fitting an Ornstein-Uhlenbeck model to microbial community data

q2-karenina models microbial communities in PCoA space as if they were physical particles. Microbial community changes reflected by shifts in position in the first three axes of a PCoA plot (PC1,PC2,PC3) are separately fit using OU models.

If microbial communities were changing purely randomly that could be described using Brownian motion dx/dt = W*sigma, where W is the Weiner process (effectively a draw from a normal distribution), and sigma scales the velocity of the random displacement (higher sigma = more rapid movement). Ornstein-Uhlenbeck models are a simple extension of Brownian motion that introduce an additional concept: each particle has a home position (represented by the variable theta) to which it is attracted at each timestep. The strength of that attraction is controlled by lambda.

Mathematically this is represented as follows:

dx/dt = W*sigma + (x - theta)*lambda

Where
dx/dt = change in x with time
W = the random change in position from the Weiner process
sigma = a scaling factor representing intrinsic volatility
theta = the 'home' position to which a particle is attracted
lambda = how strongly the particle is attracted to its home position.

These models have three parameters:
sigma - this describes the intrinsic volatility of a community at each timepoint (higher numbers = more volatile)
theta - this describes an attractor or 'home location' that the process will tend to return to over time.
lambda - the strength with which the particle is attracted back to its home location.

Given a PCoA and metadata for individuals sampled over time, karenina will output a .qzv file at the specified location (NOTE: the .qzv extension will be appended).

qiime karenina fit-timeseries --p-timepoint-col DaysSinceExperimentStart --p-pcoa weighted_unifrac_pcoa_results.qza --p-metadata None --o-visualization ./moving_pictures_fit_timeseries_results --p-individual-col 'Subject,BodySite' --p-treatment-col None --verbose --p-method basinhopping

Viewing the results.

The resulting .qza file can be viewed as with any other QIIME2 visualization.

One way to do this on the commandline is to use the 'qiime tools view' command. The usage is:
qiime tools view [path to your file]

So to view results for the tutorial file, type:
qiime tools view ./moving_pictures_timeseries_results.qzv

Alternatively, you can navigate a web browser to view.qiime2.org and drag the .qzv file where indicated to view the results.

Either way, you should get a single link to a text file containing the results of the model fit.

Finally, if you're having any trouble, it's worth noting that .qzv files are simply zipped directories. It is also possible to unzip them directly and navigate their internal file structure to find what you need (although this isn't necessary in most cases). The results will be in the /data/ folder.

Interpreting the results
The results .csv file has columns for describing the metadata parameter(s) on which the model was based. Each row in the results represents an Ornstein-Uhlenbeck model fit to one PC axis, for a particular set of samples. Because the model is fit once per PC axis, a pc column says which PC axis each row. The sigma, theta, and lambda parameters are described above. The n parameters column reflects the number of parameters that were fit. The nLogLik column gives the negative log likelihood of the data given the Ornstein-Uhlenbeck model fit. The AIC column describes the Akake Information Criterion score for the model, which accounts for both the negative log likelihood, and the model complexity (more complex models are penalized because they will tend to obtain better nLogLik scores even in random data).

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble