JupyterBook for QIIME 2 FAES January 2022 workshop

License: Other

Jupyter Notebook 73.08% TeX 26.37% Makefile 0.38% CSS 0.17%

cancer-microbiome-intervention-tutorial's Introduction

qiime2 (the QIIME 2 framework)

Source code repository for the QIIME 2 framework.

QIIME 2™ is a powerful, extensible, and decentralized microbiome bioinformatics platform that is free, open source, and community developed. With a focus on data and analysis transparency, QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.

Visit https://qiime2.org to learn more about the QIIME 2 project.

Installation

Detailed instructions are available in the documentation.

Users

Head to the user docs for help getting started, core concepts, tutorials, and other resources.

Just have a question? Please ask it in our forum.

Developers

Please visit the contributing page for more information on contributions, documentation links, and more.

Citing QIIME 2

If you use QIIME 2 for any published research, please include the following citation:

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37:852–857. https://doi.org/10.1038/s41587-019-0209-9

cancer-microbiome-intervention-tutorial's People

Contributors

Stargazers

Watchers

Forkers

jwdebelius lvelosuarez ebolyen keegan-evans bokulich-lab gregcaporaso tefer0 genostack cherman2 caio-andrey lizgehret colinvwood

cancer-microbiome-intervention-tutorial's Issues

importing.md chapter is incomplete

Either remove content to focus on the "why" part exclusively, or expand to include discussion of importing different types. (At the moment, importing different types is introduced but is incomplete.)

bug: rendered book link in the README is broken

https://github.com/qiime2/cancer-microbiome-intervention-tutorial/blame/main/README.md#L5

https://docs.qiime2.org/jupyterbook/cancer-microbiome-intervention-tutorial/

https://docs.qiime2.org/jupyterbooks/cancer-microbiome-intervention-tutorial/

split current filtering.md file into multiple files

This will require maintaining scope across Jupyterbook chapters, and may or may not be possible with the current tools. This is the closest that I've seen, but it doesn't get us what we need. I also checked to see if scope is shared across sections of a chapter (which are different files) and it is not. It's possible there is a config option to alter this behavior, but I haven't come across it yet.

If this turns out to be not possible, I think we should:

gauge interest in having us implement this and submit it as new functionality to JB
update files and TOC in this JB so that the full tutorial is presented in a single chapter (if we can't complete item 1 in time)

add classification with pre-trained, weighted classifier from ready-to-wear

To promote the idea of weighted classifiers and get improved taxonomic assignments, train a human stool GTDB classifier from the weights in ready-to-wear for use in this tutorial.

add install/build instructions to README.md

Issue on page /040-appendices/convert-and-import-artifacts.html

i think the raw data has been removed. I just want to see what exactly "Artifact.import_data" does. Are there anyone could just help about it? thanks.

add alpha and beta diversity values to sample metadata

Specific values to add are: Faith PD, umap axes for weighted and unweighted unifrac, Shannon diversity, evenness, observed features.

This should happen prior to generating the Emperor plots with custom axes, so we can color by these data in those plots.

Modify taxonomy filtering command in phylogeny tutorial

Given the filtering-step as outlined here, I'd recommend using the following command, or a variant of it, which I pulled from this post:

qiime taxa filter-table \
    --i-table table.qza \
    --i-taxonomy taxonomy.qza \
    --p-mode 'contains'  \
    --p-include 'p__' \
    --p-exclude 'p__;,Eukaryota,Chloroplast,Mitochondria,Unassigned,Unclassified' \
    --o-filtered-table ./table-no-ecmu.qza

Note that I set --p-exclude 'p__;,... . This is more explicit at removing taxa that have only the p__ rank, i.e. no accompanying taxonomic label. That is, --p-include 'p__' will keep k__Bacteria; p__Proteobacteria; as well as any data that has an empty phylum rank such as k__Bacteria; p__;. Which technically has no phylum classification.

Yes, the command above --p-include 'p__' might be redundant and not needed with the given exclude command. I only place it there for the sake of completeness and explicitness for teaching the difference between p__ and p__;. :-)

Or simply mentioning that it is recommended that plastid / organellar, and perhaps even host sequences be removed. Especially, when considering that mitochondria are a "family" within the phylum Alphaproteobacteria, and chloroplasts are a "class" within the phylum Cyanobacteria. So, if the user does not look at the family or class level they may inadvertently retain these sequences.

NOTE: This is presented out of order in reference to the workshop schedule. That is, the material for taxonomic classification occurs after the phylogeny bit. So, perhaps this should be mentioned as something to consider later on to avoid user confusion? That is something like "If you already have taxonomy information you can also perform additional filtering like so..."

add workshop-relevant links to book/index.md

This will include the workshop server, the workshop schedule, Zulip, etc.

Modify artifact numbering in feature table filtering tutorial

The tutorial contains an extraneous filtering step after filtering for autoFMT study samples:

This step was removed in the video tutorial. Consequently, downstream artifact names are now off by 1, e.g. filtered-table-1.qza in the video corresponds to filtered-table-2.qza in the written tutorial. All references to these file names should be updated in the written tutorial.

transfer upstream tutorial data from Dropbox to AWS

The data in the upstream tutorial (currently in PR #34) is stored in Dropbox - this should be transferred to the same location as the downstream data on AWS, and the link should be updated in the upstream tutorial. There is only one relevant file this time (fastq-casava.zip).

remove jupytext frontmatter from all pages

example: https://github.com/gregcaporaso/2022.1-faes-tutorial/blame/main/book/tutorial-1/03-filtering.md#L1-L14

Mostly this issue is just to make it clear to future editors that we aren't using the embedded notebook cell functionality of jupyter book.

automate build of this JupyterBook using GH Actions

My gregcaporaso/q2book repository may be a useful example for this. This will be useful for automating testing of changes.

usage source is showing up in rendered book

@thermokarst, @ebolyen - am I doing something wrong here? This was built using the most recent commit of q2doc. The source for this content is here.

add custom `question` admonition

I'm adding some admonition blocks through-out that pose questions to the user. I'm defining these as class question, but since that's not define they just show up as generic admonitions. We should define question as a custom admonition so we can style it different - for example, a question mark on the left side of the admonition box.

I'm also using the dropdown class for these so the answer can be hidden from the user as they work through the tutorial. For example:

The source for this looks like the following:

````{admonition} Try summarizing the feature table that was created by this round of filtering. Expand this box if you need help. 
:class: question, dropdown

```{usage}
use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=filtered_table_4, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='filtered_table_4_summ'),
)
```
````

One issue is that I don't think I can split the first line (the question) across multiple lines in the source or it will put the second and all remaining lines in the body of the admonition block rather than the header. If we could handle that with the custom admonition (if there isn't another way) that'd be pretty handy too.

Wrong year in GH "About" box

references are not being formatted

Probably something simple. Here's how references are showing up right now (see the ([])):

add notebook (or section) on importing data through DADA2

This should focus on a small subset of the data - @ebolyen can help to identify which.

content updates

General content work:

add text to existing content
add days relative to FMT column (or something along those lines)
search text for "TODO"; address or create issues for each (#42)
identify missing citations, links to the library, etc and add

Sections that are not yet started:

all sections upstream of filtering - waiting to hear back on whether there is pre-SRA data accessible that can be used here
alpha diversity section: LME
~~differential abundance testing: pre/post FMT (note that this will be tricky b/c time series samples are not independent; consider visualizing aldex2 differentials with qurro~~
"q2FMT ideas" notebook (i.e., add analyses that will integrate FMT timepoint data, possibly drawing from autism-fmt code)

copy select content from q2book

Some content from q2book will be copied over to this book. See the _toc.yml file in this repository and follow up with @gregcaporaso with any questions about what should be ported.

Any code examples should be converted to usage examples in the process of transferring.

add list of instructors to book/index.md

Also, their preferred contact information and relevant funding sources.

Dropbox link not working

Hi, first of all, thank you for making the very helpful tutorial!

However, this dropbox link:

https://www.dropbox.com/s/r5ag9d0lwlcg91n/tblcounts_asv_wide.csv?dl=1

in 04 - appendix convert-and-import-artifacts.ipynb is not working.

Would really appreciate it if you can fix it soon!

host data files used in these tutorials

These files are currently in Dropbox (Greg's account) - we probably want a better spot for this.

add galaxy-specific warning about waiting for download to complete before starting import

During the FAES 2022 workshop several users started their import command before the download command completed in 030-importing.md in the Importing section. This caused errors during import. We should add a galaxy-specific warning admonition telling users to wait for the download to complete before running the next command.

incorrect classifier used

We're using the 515F/806R classifier for these data, but the data was actually sequenced with 563F/926R. We should either train a different region-specific classifier, or use the full-length classifier.

qiime2 / cancer-microbiome-intervention-tutorial Goto Github PK