GithubHelp home page GithubHelp logo

naturalis / mycorrhiza Goto Github PK

View Code? Open in Web Editor NEW
2.0 9.0 2.0 2.52 GB

Evolutionary dynamics of mycorrhizal symbiosis in land plant diversification - trait data and analysis

Perl 43.53% Shell 40.02% HTML 14.02% R 2.44%
land-plants mycorrhiza comparative-analysis phylogenetics character-evolution symbiosis

mycorrhiza's Introduction

Evolutionary dynamics of mycorrhizal symbiosis in land plant diversification

sequoia

Trait data and analysis

This repository holds trait data data and scripts for part of the methods of the manuscript entitled Evolutionary dynamics of mycorrhizal symbiosis in land plant diversification by Frida A.A. Feijen, Rutger A. Vos, Jorinde Nuytinck & Vincent S.F.T. Merckx. The contents of this repository are made available under a CC-BY license.

Directory structure:

  • data: contains verbatim input data and pruned/converted versions
  • script: contains conversion scripts
  • doc: supporting documentation, manuscript files
  • results: final output files

Methods

Phylogenetic analysis

Plant systematists assign four different rootings of the land plants enough plausibility that we consider them here. These four rootings are intended to orient the following major groups relative to one another: liverworts (Marchantiophyta), mosses (Bryophyta), hornworts (Anthocerotophyta) and vascular plants (Tracheophyta).

  • MBasal - liverworts split off first, followed by mosses, and hornworts are sister to vascular plants. This yields the following topology: (((Anthocerotophyta,Tracheophyta),Bryophyta),Marchantiophyta);
  • ABasal - hornworts branch off first, yielding the following topology: (((Marchantiophyta,Bryophyta),Tracheophyta),Anthocerotophyta);
  • TBasal - vascular plants branch off first, yielding the following topology: (((Marchantiophyta,Bryophyta),Anthocerotophyta),Tracheophyta);
  • ATxMB - liverworts and mosses (M,B) form a monophyletic group, and so do hornworts and vascular plants (A,T), resulting in: ((Marchantiophyta,Bryophyta),(Anthocerotophyta,Tracheophyta)); In the manuscript, this is considered the preferred rooting, which we will explore in more depth than the others. Earlier versions of the analyses also explored MBasal, so there may be reference to that here and there.

For each of the rootings, we performed a BEAST dating analysis. The results of these analyses are part of a separate (too-large-for-github) submission available at 10.5281/zenodo.1037548.

Comparative analysis

Using consensus over the BEAST trees, we performed phylogenetic comparative multistate analyses with the program BayesTraits, in which each state change is a transition that is modeled in a Q matrix, which is amenable to parameter restriction so that various hypotheses can be tested using Bayes Factors.

Because different higher taxa among the mycorrhiza can associate with land plants in a variety of observed combinations there is potentially a very large number of states: if we treat every permutation of associations as a potential state we will end up with an explosion of parameters in the Q matrix such that the analysis becomes practically intractable. But, we can reduce the number of parameters in the following ways:

  • We only take the empirically observed combinations of observations, not all permutations (i.e. only these).
  • We disallow transitions where more than one association is gained or lost instantaneously.
  • We then do a Reversible Jump MCMC analysis to further reduce the Q matrix.

Workflow overview

Broadly speaking, the steps that are described in the manuscript are described in the following, numbered sections. These steps have been executed in the data section of this repository, with different iterations in date-stamped folders. The most relevant of these subfolders are the most recent ones, in which the workflow was debugged and all the data were double-checked.

Note that this repository also contains abandoned results, e.g. earlier attempts to develop the general workflow, older versions of the input files, project background documentation, and so on. Old files that are not discussed in any of the READMEs should be considered irrelevant for the substance of the methods and results discussed in the manuscript.

1. Preparing input data files

First we make the raw input for BayesTraits/MultiState using the script make_ms_input.pl. This assumes the following:

  1. The consensus trees are in Nexus format. Newick trees need to be converted to Nexus first, e.g. using FigTree, Mesquite, etc.
  2. The input data are a tabular file that must meet the following requirements: line breaks in UNIX format, a single header line that at least enumerates all state symbols as a space-separated list between parentheses, all subsequent lines start with the taxon name (spelled exactly the same as in the tree, including underscores for spaces), then one or more spaces (can be tabs), then the states, which can either be a single string or space (tab) separated. Verify that the data with associations is in the same format as HostFungusAssociations.txt or TableS1.txt.

For each line in the data file, the first word is matched against the tips in the tree, any lines that don't match anything are assumed to be headers or footers and are ignored. When this happens, a warning is emitted by make_ms_input.pl, as follows: putative taxon '$taxon' not in tree, ignoring (could also be table header) Note that you need to capture STDOUT to make a states coding file, so the full command is:

make_ms_input.pl -d <indata> -i <intree> -o <outtree> -t <outdata> > <states>
  • -d location of input data file
  • -i location of input tree file in Nexus format
  • -o output Nexus tree file, reconciled with data
  • -t output data in TSV format, reconciled with tree
  • > <states> location to redirect states table to file

2. Generate BayesTraits command files

Then we create the BayesTraits/MultiState commands for restricting the transitions as per the general idea described above. These commands will also ensure that the run is done using reversible jump MCMC. Whether or not you use a hyperprior depends on whether the script make_restrictions.pl was invoked with the --hyper flag. It is probably a good idea to do this. In addition, by default, the command file will configure a chain with 'infinite' iterations (i.e. -1) that needs to be interrupted manually. If you have a better idea about the number of iterations it is worth specifying that. A conservative estimate for the present project is 10 x 10^6 generations, of which we will want to discard up to 50% burnin (in one case this appeared to be necessary). Lastly, it might make sense to indicate how many cores you have available for the analysis, although this only works for multi-core (e.g. OpenMP) versions. Hence, the full command would be:

make_restrictions.pl -states <states.tsv> -tree <outtree> [-iterations <iterations>] 
[-cores <cores>] [-hyper <min,max>] [-fossil <tip1,tip2=value>] [-restrict <qAB=qBA>]
[-stones <min,max>]
  • -states the redirected states table from the previous step
  • -tree the Nexus tree file produced by the previous step
  • -iterations optional, number of iterations, otherwise Inf
  • -cores optional, number of CPU cores to use
  • -hyper optional, range for the hyperprior, comma separated
  • -fossil optional, fix a node (identified by tips) to a value
  • -restrict optional, fix a rate
  • -stones optional, configures the stepping stone sampler for marginal lnL

Once this is done we should have a tree file in Nexus format, a data file in tab-separated spreadsheet format, and a text file with the restriction commands. You can now run the analysis, as per the instructions below, or explore your data first in Mesquite to do a visual check to see if it looks sane (probably a good idea).

3. (Optional) exploring your data in Mesquite

If you want to explore your data in Mesquite you can run the make_nexus.pl script. This script expects your data to be formatted with a header that enumerates all state symbols between parentheses, like the first line in HostFungusAssociations.txt and TableS1.txt. The full command would be:

make_nexus.pl -d <indata> -t <outtree> > <outdata.nex>

You can then open this file in Mesquite and trace the character state changes (as reconstructed under maximum parsimony) on the tree topology.

4. Running a BayesTraits analysis

To run an analysis like this, we have to do the following steps:

  1. install a Quad version of BayesTraits. This has higher precision to prevent underflows, which are somewhat possible because of the relatively large Q matrix.

  2. open the program, i.e.:

    BayesTraitsV2_OpenMP_Quad <tree.nex> <data.tsv> < restrictions.txt

The general idea is that many of these analyses are run (to evaluate all the hypotheses, for all rootings, and replicated (we ran this triplicated)). Hence, the following step is to post process all the log files for hypothesis test results and visualizations.

5. Post-processing BayesTraits analysis results

  • The data directory data/2016-12-01 establishes a workflow for running various hypothesis tests applied to different rootings. The documentation there should also make clear how the rate constraint tests are applied to the others rootings, in directory data/2017-03-06. From the marginal likelihoods of the different constraints, the rate constraints table is populated so that the Bayes factors can be computed.
  • The iteration of the workflow that produced the figures presented in the manuscript was executed in data/2017-10-06. That directory's README contains more detailed information about the analysis steps.

Results

The general outcomes of this project are:

mycorrhiza's People

Contributors

rvosa avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

007v pupubear007

mycorrhiza's Issues

Which is the preferred rooting?

As per @vincentmerckx's email 29/9/2016 the preferred rooting is Mbasal2:

Even ingaan op die laatste vraag: Mbasal2 = (Marchantiophyta, (rest) waarbij de Levermossen
basaal staan heeft de voorkeur. Dit is de hypothese die in de meest recente mycorrhiza
literatuur naar voor wordt geschoven (e.g. Field et al.).

Roughly how many iterations do we need?

For now the make_restrictions.pl script generates command files that run for an infinite (-1) number of generations. In order to be able to automate running multiple analyses we will need to:

  1. get a reasonable estimate for how many generations we actually need. Presumably by plotting
    the Lh curve and see when it starts to stabilize.
  2. make it so that the script embeds this requested number of iterations in the command file.

Additionally, it would be nice to make it so that we can also embed our hypothesis testing in the command file. This means that before the command file issues the run command, any necessary restrictions (as per #1 and #2) are defined.

Hypothesis tests for gains of Glomeromycota

The hypothesis is that, once association with Glomeromycota is lost, it is very hard to regain it. The following transitions that constitute a gain of Glomeromycota should, therefore, have a rate indistinguishable from 0:

  • 0000 (D) => 0010 (C)
  • 0001 (B) => 0011 (A)
  • 0100 (E) => 0110 (I)
  • 1000 (F) => 1010 (K)
  • 1100 (G) => 1110 (H)

Hypothesis tests for the root state

The input data, e.g. in TableS1.txt, capture presence/absence for each of the four major groups of mycorrhiza (in order, A, B, G and M). One of the hypotheses is that associations with G and/or M were formed first and so should be reconstructed at the root. This means doing ratio tests where we fix the root at any of these:

  • 0000 (No association), recoded as D
  • 0001 (Mucoromycotina only), recoded as B
  • 0010 (Glomeromycota only), recoded as C
  • 0011 (Mucoromycotina and Glomeromycota), recoded as A

In addition, we would do a free analysis, then do a BF test.

Two tree visualizations

  • In manuscript: phylogram with higher taxa, no tiplabels, no pies
  • For supplementary, cladogram with tiplabels and pies

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.