GithubHelp home page GithubHelp logo

qiime2 / cancer-microbiome-intervention-tutorial Goto Github PK

View Code? Open in Web Editor NEW
9.0 9.0 12.0 8.61 MB

JupyterBook for QIIME 2 FAES January 2022 workshop

License: Other

Jupyter Notebook 73.08% TeX 26.37% Makefile 0.38% CSS 0.17%
hacktoberfest

cancer-microbiome-intervention-tutorial's Introduction

qiime2 (the QIIME 2 framework)

Source code repository for the QIIME 2 framework.

QIIME 2™ is a powerful, extensible, and decentralized microbiome bioinformatics platform that is free, open source, and community developed. With a focus on data and analysis transparency, QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.

Visit https://qiime2.org to learn more about the QIIME 2 project.

Installation

Detailed instructions are available in the documentation.

Users

Head to the user docs for help getting started, core concepts, tutorials, and other resources.

Just have a question? Please ask it in our forum.

Developers

Please visit the contributing page for more information on contributions, documentation links, and more.

Citing QIIME 2

If you use QIIME 2 for any published research, please include the following citation:

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37:852–857. https://doi.org/10.1038/s41587-019-0209-9

cancer-microbiome-intervention-tutorial's People

Contributors

cherman2 avatar ebolyen avatar gregcaporaso avatar keegan-evans avatar lizgehret avatar thermokarst avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cancer-microbiome-intervention-tutorial's Issues

importing.md chapter is incomplete

Either remove content to focus on the "why" part exclusively, or expand to include discussion of importing different types. (At the moment, importing different types is introduced but is incomplete.)

split current filtering.md file into multiple files

This will require maintaining scope across Jupyterbook chapters, and may or may not be possible with the current tools. This is the closest that I've seen, but it doesn't get us what we need. I also checked to see if scope is shared across sections of a chapter (which are different files) and it is not. It's possible there is a config option to alter this behavior, but I haven't come across it yet.

If this turns out to be not possible, I think we should:

  1. gauge interest in having us implement this and submit it as new functionality to JB
  2. update files and TOC in this JB so that the full tutorial is presented in a single chapter (if we can't complete item 1 in time)

add alpha and beta diversity values to sample metadata

Specific values to add are: Faith PD, umap axes for weighted and unweighted unifrac, Shannon diversity, evenness, observed features.

This should happen prior to generating the Emperor plots with custom axes, so we can color by these data in those plots.

Modify taxonomy filtering command in phylogeny tutorial

Given the filtering-step as outlined here, I'd recommend using the following command, or a variant of it, which I pulled from this post:

qiime taxa filter-table \
    --i-table table.qza \
    --i-taxonomy taxonomy.qza \
    --p-mode 'contains'  \
    --p-include 'p__' \
    --p-exclude 'p__;,Eukaryota,Chloroplast,Mitochondria,Unassigned,Unclassified' \
    --o-filtered-table ./table-no-ecmu.qza

Note that I set --p-exclude 'p__;,... . This is more explicit at removing taxa that have only the p__ rank, i.e. no accompanying taxonomic label. That is, --p-include 'p__' will keep k__Bacteria; p__Proteobacteria; as well as any data that has an empty phylum rank such as k__Bacteria; p__;. Which technically has no phylum classification.

Yes, the command above --p-include 'p__' might be redundant and not needed with the given exclude command. I only place it there for the sake of completeness and explicitness for teaching the difference between p__ and p__;. :-)

Or simply mentioning that it is recommended that plastid / organellar, and perhaps even host sequences be removed. Especially, when considering that mitochondria are a "family" within the phylum Alphaproteobacteria, and chloroplasts are a "class" within the phylum Cyanobacteria. So, if the user does not look at the family or class level they may inadvertently retain these sequences.

NOTE: This is presented out of order in reference to the workshop schedule. That is, the material for taxonomic classification occurs after the phylogeny bit. So, perhaps this should be mentioned as something to consider later on to avoid user confusion? That is something like "If you already have taxonomy information you can also perform additional filtering like so..."

Modify artifact numbering in feature table filtering tutorial

The tutorial contains an extraneous filtering step after filtering for autoFMT study samples:

Screenshot 2022-01-06 at 14 47 39

This step was removed in the video tutorial. Consequently, downstream artifact names are now off by 1, e.g. filtered-table-1.qza in the video corresponds to filtered-table-2.qza in the written tutorial. All references to these file names should be updated in the written tutorial.

transfer upstream tutorial data from Dropbox to AWS

The data in the upstream tutorial (currently in PR #34) is stored in Dropbox - this should be transferred to the same location as the downstream data on AWS, and the link should be updated in the upstream tutorial. There is only one relevant file this time (fastq-casava.zip).

add custom `question` admonition

I'm adding some admonition blocks through-out that pose questions to the user. I'm defining these as class question, but since that's not define they just show up as generic admonitions. We should define question as a custom admonition so we can style it different - for example, a question mark on the left side of the admonition box.

I'm also using the dropdown class for these so the answer can be hidden from the user as they work through the tutorial. For example:

Screen Shot 2021-12-20 at 6 28 27 PM

Screen Shot 2021-12-20 at 6 28 52 PM

The source for this looks like the following:

````{admonition} Try summarizing the feature table that was created by this round of filtering. Expand this box if you need help. 
:class: question, dropdown

```{usage}
use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=filtered_table_4, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='filtered_table_4_summ'),
)
```
````

One issue is that I don't think I can split the first line (the question) across multiple lines in the source or it will put the second and all remaining lines in the body of the admonition block rather than the header. If we could handle that with the custom admonition (if there isn't another way) that'd be pretty handy too.

content updates

General content work:

  • add text to existing content
  • add days relative to FMT column (or something along those lines)
  • search text for "TODO"; address or create issues for each (#42)
  • identify missing citations, links to the library, etc and add

Sections that are not yet started:

  • all sections upstream of filtering - waiting to hear back on whether there is pre-SRA data accessible that can be used here
  • alpha diversity section: LME
  • differential abundance testing: pre/post FMT (note that this will be tricky b/c time series samples are not independent; consider visualizing aldex2 differentials with qurro
  • "q2FMT ideas" notebook (i.e., add analyses that will integrate FMT timepoint data, possibly drawing from autism-fmt code)

copy select content from q2book

Some content from q2book will be copied over to this book. See the _toc.yml file in this repository and follow up with @gregcaporaso with any questions about what should be ported.

Any code examples should be converted to usage examples in the process of transferring.

incorrect classifier used

We're using the 515F/806R classifier for these data, but the data was actually sequenced with 563F/926R. We should either train a different region-specific classifier, or use the full-length classifier.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.