GithubHelp home page GithubHelp logo

qiime2 / q2-longitudinal Goto Github PK

View Code? Open in Web Editor NEW
9.0 11.0 18.0 10.63 MB

QIIME 2 plugin for paired sample comparisons

License: BSD 3-Clause "New" or "Revised" License

Python 95.36% HTML 3.15% Makefile 0.09% TeX 0.74% CSS 0.66%
hacktoberfest

q2-longitudinal's Introduction

q2-longitudinal

This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.

q2-longitudinal's People

Contributors

andrewsanchez avatar chriskeefe avatar david-rod avatar ebolyen avatar gregcaporaso avatar hagenjp avatar jairideout avatar lizgehret avatar nbokulich avatar oddant1 avatar q2d2 avatar sterrettjd avatar thermokarst avatar timyerg avatar turanoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

q2-longitudinal's Issues

`volatility`: make `group-column` optional

CLARIFICATION: without a group-column selected, mean lines should still be drawn, but calculated across all samples rather than aggregating by group. (edited 4/23/18)

Standard way to import data in Python

I am exploring the way to import taxa and mapping data in python and develop new functions.

Could you show me a quick example how to import both taxa and mapping file. Then write a function (that follow _utility.py style) to calculate a beta diversity at Week 0?

Below is my code to include taxa information. But I am not sure what will be the standard way to import mapping information in Qiime2 artifact API.

from qiime2 import Artifact

taxa = Artifact.load("../tutorial_data/ecam-table-taxa.qza")
taxa_df = taxa.view(pd.DataFrame)

Strange plots resorting from different usage of q2-longitudinal

Hello,
I am using q2-longitudinal in a bit of a different way than it was originally perceived for and some strange behavior has resulted. This issue has arisen in creating a volatility plot using this code:
qiime longitudinal volatility
--m-metadata-file ../EXMP_Sample_metadata_3_17_2018.tsv
--m-metadata-file EXMP-200-4562-single-core-metrics-results/shannon_vector.qza
--p-metric shannon
--p-group-column activity
--p-state-column sample_number
--p-individual-id-column redcap_survey_identifier
--p-spaghetti yes
--o-visualization EXMP-200-4562-single-shannon-volatility.qzv
I am introducing an intervention period to be compared to a baseline period both prior to and after the intervention. As you can see this creates a volatility plot that is unusual:
image
image
image

It causes some strange behavior that I am not so sure is a bug, but rather arising from the differing way I am trying to do things.
Thanks,
Arron Shiffer

volatility plots: sundry interactive features

  • zoom
  • toggle on/off 🍝
  • toggle on/off control limits
  • toggle on/off mean group trajectories
  • interactive color palette? Could be emperor style custom palette, or more likely just the selection of palettes currently exposed with the palette parameter
  • selection of different metrics and metadata grouping values (akin to alpha-rarefaction) could be useful, though this will require more structural changes to how these are handled at input

Actually, many of the parameters for this action could be useful as interactive features. E.g., interactively set x-tick intervals, yscale, xscale, but these are less important than those listed above.

outdated documentation in `paired-differences`

I think this must be left over from before we had optional artifact support (the feature table is now optional in this visualizer, this documentation is just out of date): A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

The meaning of Tutorial Data

How could I find the explanation of data in the folder "tutorial_data"?

I have few questions.

  1. I assume ecam-table-taxa.qza contains species level taxa data
  2. I assume ecam_map_maturity.txt contains mapping information.

Question:

  1. What is ecam-table-maturity.qza?

spaghetti sample information mouse-over

spaghetti is great, but (in the words of @gregcaporaso ):

users are going to want to know which subjects the outlier lines (or any lines, for that matter) in these plots are... For example, you might be able to achieve this with mouse-overs that highlight a specific line and give more information about it including the subject id.

update `paired-differences` example in readme to use a .qza as input

You mention that this is possible, but it'd be better to just use that in your example since that's the preferred way to do this (since it retains provenance where exporting the alpha diversity data wouldn't).

--m-metadata-file ecam_map_maturity.txt

You could also link to the metadata tutorial, which has a good description of this.

add download links for tsv of raw difference/distance data

it might be really useful for users to be able to download the raw distances/differences. this could look like a sample metadata file where the rows are:

sample-id <tab> {metric} difference <tab> group

or

sample-id <tab> {metric} distance <tab> group

improve error messages on bad column names

$ qiime intervention paired-differences     --m-metadata-file ecam_map_maturity.txt     --m-metadata-file ecam_shannon.qza     --p-metric shannon     --p-group-column delivery     --p-state-column month     --p-state-1 12     --p-state-2 0     --p-individual-id-column not-a-column     --o-visualization ecam-delivery-alpha     --p-no-drop-duplicates --verbose
Traceback (most recent call last):
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2442, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)
  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: 'not-a-column'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/q2cli/commands.py", line 222, in __call__
    results = action(**arguments)
  File "<decorator-gen-251>", line 2, in paired_differences
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/qiime2/sdk/action.py", line 201, in callable_wrapper
    output_types, provenance)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/qiime2/sdk/action.py", line 392, in _callable_executor_
    ret_val = callable(output_dir=temp_dir, **view_args)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/q2_intervention/_intervention.py", line 38, in paired_differences
    drop_duplicates=drop_duplicates)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/q2_intervention/_utilities.py", line 36, in _get_group_pairs
    for individual_id in set(group_md[individual_id_column]):
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/frame.py", line 1964, in __getitem__
    return self._getitem_column(key)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/internals.py", line 3590, in get
    loc = self.items.get_loc(item)
  File "/Users/gregcaporaso/miniconda3/envs/qiime2-2017.7/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)
  File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)
KeyError: 'not-a-column'

Plugin error from intervention:

  'not-a-column'

See above for debug info.

Should instead say something like: The individual column specified (not-a-column) is not a column name in the sample metadata. Available columns are: ...

Confirm that any time a category is passed as a type different than qiime2.MetadataCategory that you catch these issues.

technicolor spaghetti

Improvement Description
spaghetti color is defined by the group value at the initial state.

ideally, spaghetti color should change dynamically. E.g., some metadata categories (like antibiotic use or other exposures) may change longitudinally for a subject. It would be nice to capture those.

Comments
That probably cannot be done easily... but it would be pretty cool if it could.

BUG: volatility: label sorting on plots occasionally breaks

Most plots i've seen are in correct order, but not this subplot
image
Other subplots in the same plot work, including the following that contains these same data (but different labels), so this may be a labeling issue, not a sorting issue.
image

Drop README tutorial(s)

As with the other plugins in QIIME 2, we provide official tutorials as part of https://github.com/qiime2/docs, and unofficial tutorials on the forum. We should clear out this README of the existing tutorial content and ensure that things get moved to the appropriate location (docs or forum).

`volatility`: show/hide groups/individuals interactively

Improvement Behavior
click to show/hide individuals/groups.

Current Behavior
currently, can click on the group legend to show single groups, but this only does one at a time.

Proposed Behavior
Would be very useful to, e.g., drop one or more groups to focus on specific groups for comparison.

Comment
same with individuals (spaghetti) but no such feature exists. Would be very helpful to hide all spaghetti but one, for example, to compare an individual's trajectory vs. the group mean.

LME visualization suggestions

It would be helpful to link to a key that would help with interpreting the Model summary and Model results sections - we're going to get a lot of questions about interpreting these (I'm not exactly sure how to interpret them myself). I would recommend including links in those sections the visualization if possible, and if not expanding on the interpretation in the tutorial.

add raw data download for all plots

see @gregcaporaso 's comment in #36

One more thought: Would it be worth adding a download link for the data used to generate these plots? Since we're not including any statistics, it could be useful to allow the user to get that data to do statistics on their own. If we did that, I think you'd want a tsv file that looks something like:

delivery  month  studyid  shannon
vaginal  0  42  2.2
cesarian  0  43  3.0

(EDITED: to make the example file tab-separated text instead of comma-separated)

blank plots created if no samples found that correspond to a state value

This will be confusing for users if they accidentally specify an state value that doesn't correspond to something in their data...

screenshot 2017-08-07 13 18 06

You should probably throw an error if there are no paired samples being evaluated.

This also suggests that it's going to be important to tell the user how many samples were included in each test. Could you include n (number of paired samples per group) in all of the tables? See the pairwise table here for one example of where we do this.

suggestions for visualizations

In the paired-differences boxplot, the y-axis label should be Difference in {metric} (state 2 - state 1), or you could get more fancy with it and actually use the state_column, state1 and state2 variables, in which case it could be: Difference in {metric} ({state_column} {state2} - {state_column} {state2}) (e.g., Difference in shannon (month 12 - month 0))

In Paired difference tests table, can you include the test name and the test statistic name (currently it just says stat, but you should be able to keep a dict mapping test name to test statistic name so that this label could be more informative. See here for an example. Also, please make the P column label say P value and FDR P -> FDR P-value.

Can the Multiple group tests table be transposed so it matches the others?

I'm confused about what the difference is between the Multiple group tests and Pairwise comparison tests tables when there are only two groups. It might help to have a brief description of what each test is (and including the test name in each would help with this). When there are only two groups, should the results of these tests be different (they are in the README example, so just confirming that that is expected).

These should also be applied to the pairwise-distances visualization.

`volatility`: plot subpanel with N per group

Proposed Behavior
show N per group per state as histogram or line plot sharing axis with main (volatility) plot.

and/or toggle sample size in x-axis label?

Comments
If that's difficult/ugly forget about it — but it might save folks from manually typing in this info for pub-ready figures.

BUG: LME plots fail to generate when single variable is provided

When a single independent variable is used for LME, plots fail to generate because a single AxesSubplot object is generated — the current code expects multiple variables/subplots, and the ability to index these.

Key error is here:

File "/Users/nbokulich/miniconda3/envs/qiime2-2017.8/lib/python3.5/site-packages/q2_longitudinal/_utilities.py", line 351, in _regplot_subplots_from_dataframe
    ax=axes[num], lowess=lowess, ci=ci)
TypeError: 'AxesSubplot' object does not support indexing

This bug is noted in this forum post.

LME singular matrix error should fail gracefully

Some inputs to LME will result in a singular matrix error:

File "/Users/nbokulich/miniconda3/envs/qiime2-2017.8/lib/python3.5/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular
    raise LinAlgError("Singular matrix")
numpy.linalg.linalg.LinAlgError: Singular matrix

This is not a bug — it is due to improper inputs to LME — but should fail gracefully so users can respond appropriately.

The issue seems to be that if the independent variables passed to LME are covariates, this results in a singular matrix error, either due to the correlation between covariates or to the lack of variance between group subcategories.

This issue was reported in this forum post.

parameter suggestions

There are a lot of default metadata column names (eg. I would make these required parameters (without defaults) because it's unlikely that your defaults will be useful for most people, and if they're required it won't make users think that they need to rename the columns in their sample metadata.

Instead of using the term Metadata category, can you use Metadata column? We're switching our terminology since category doesn't necessarily make sense for continuous variables, so it'd be good to start making that change in documentation. For example: Metadata category on which to separate groups for comparison.

I recommend state-pre and state-post be renamed to state-1 and state-2, since pre/post aren't always relevant (e.g., you mention that "States" can also commonly be methodological).

I think non-parametric tests should always be the default:

  --p-parametric / --p-no-parametric
                                  [default: True]
                                  Perform parametric (ANOVA
                                  and t-tests) or non-parametric (Kruskal-
                                  Wallis, Wilcoxon, and Mann-Whitney U tests)
                                  statistical tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.