GithubHelp home page GithubHelp logo

jgeofil / mycorrhiza-algorithm Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 2.0 15.38 MB

Combining phylogenetic networks and Random Forests for prediction of ancestry from multilocus genotype data

License: GNU General Public License v3.0

Python 100.00%
mycorrhiza splitstree ancestry population-genetics phylogenetics machine-learning structure admixture random-forests dimensionality-reduction

mycorrhiza-algorithm's Introduction

Mycorrhiza algorithm

Published in -Mycorrhiza: Combining phylogenetic networks and Random Forests for prediction of ancestry from multilocus genotype data

ReadMe Card

Installing Mycorrhiza on Ubuntu 16.04

  1. Make sure you have the latest version of Python 3.x

    python3 --version
  2. Install pip3, Java and the tkinter library

    sudo apt-get install python3-pip python3-tk default-jre
  3. Install Mycorrhiza

    pip3 install --upgrade mycorrhiza
  4. Install SplitsTree

    Follow the instructions in the GUI installer, leaving all settings to default.

    wget http://ab.inf.uni-tuebingen.de/data/software/splitstree4/download/splitstree4_unix_4_14_6.sh
    chmod +x splitstree4_unix_4_14_6.sh
    ./splitstree4_unix_4_14_6.sh

    If the link above is not available - find the most recent version of the SplitsTree: http://ab.inf.uni-tuebingen.de/data/software/splitstree4/download

Installing Mycorrhiza on Mac OS X Sierra 10.12

  1. If you don't already have the package manager HomeBrew, install it before proceeding.

    ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    
  2. Install Python 3.x

    brew install python
  3. Install Mycorrhiza

    sudo -H pip3 install --upgrade mycorrhiza
  4. Install SplitsTree

    The package can be found here. Follow the installer instructions, leaving all settings to default.

    If the link above is not available - find the most recent version of the SplitsTree: http://ab.inf.uni-tuebingen.de/data/software/splitstree4/download

Running an analysis from command line

  1. Run an analysis.

    Run a 5-fold crossvalidated analysis.

    crossvalidate -i gipsy.myc -o out/ -s 5

    Run a analysis with a training set and a prediction set. Samples with a learing flag = 1 will be used for training and predictions will be made on samples with a learning flag = 0.

    supervised -i gipsy.myc -o out/

    To see all available parameters:

    crossvalidate -h

Running an analysis in a script

  1. Import the necessary modules.

    from mycorrhiza.dataset import Myco
    from mycorrhiza.analysis import CrossValidate
    from mycorrhiza.plotting.plotting import mixture_plot
  2. (Optional) By default Mycorrhiza will look for SplitStree in your home folder. I you wish to specify a different path for the SplitsTree executable you can do so in the settings module.

    from mycorrhiza.settings import const
    const['__SPLITSTREE_PATH__'] = '~/splitstree4/SplitsTree'
  3. Load some data. Here data is loaded in the Mycorrhiza format from the Gipsy moth sample data file. Example data can be found here.

    myco = Myco(file_path='data/gipsy.myc')
    myco.load()
  4. Run an analysis. Here a simple 5-fold cross-validation analysis is executed on all available loci, without partitioning.

    cv = CrossValidate(dataset=myco, out_path='data/')
    cv.run(n_partitions=1, n_loci=0, n_splits=5, n_estimators=60, n_cores=1)
  5. Plot the results.

    mixture_plot(cv)

Documentation

https://jgeofil.github.io/mycorrhiza/

File formats

For microsatellite loci set the is_str flag to True.

```python
data = Myco(file_path='data/myco.myc', is_str=True)
data = Structure(file_path='data/myco.str', is_str=True)
```

Myco

Diploid genotypes occupy 2 rows (the sample identifier must be identical).

Column(s) Content Type
1 Sample identifier string
2 Population string or integer
3 Learning flag {0,1}
4 to M+3 SNP Loci {A, T, G, C, N}
4 to M+3 STR Loci any or 000

STRUCTURE

Diploid genotypes occupy 2 rows (the sample identifier must be identical).

Column(s) Content Type
1 Sample identifier string
2 Population integer
3 Learning flag {0,1}
4 to O+3 Optional (Ignored)
O+3 to M+O+3 SNP Loci integer or -9
O+3 to M+O+3 STR Loci any or -9

mycorrhiza-algorithm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mycorrhiza-algorithm's Issues

Mycorrhiza problem

I tried to run Mycorrhiza examples with Ubuntu 18.04.4 using the python script, but the following error occurs:

Outputting data in Nexus format.: 100%|████████| 90/90 [00:00<00:00, 939.25it/s]
Building network from file data/JN3L2X7MA3CPN57GYCSHLV21EUVZNONP.nex
SplitsTree -g -v -i /home/rhaphael/Desktop/Mycorrhiza/mycorrhiza-master/examples/data/JN3L2X7MA3CPN57GYCSHLV21EUVZNONP.nex
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/rhaphael/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/rhaphael/.local/lib/python3.6/site-packages/multiprocess/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/rhaphael/.local/lib/python3.6/site-packages/mycorrhiza/analysis/analysis.py", line 141, in func
return sn.execute_nexus_file(file)
File "/home/rhaphael/.local/lib/python3.6/site-packages/mycorrhiza/network/network.py", line 25, in execute_nexus_file
bash_nexus_file(filename)
File "/home/rhaphael/.local/lib/python3.6/site-packages/mycorrhiza/network/network.py", line 40, in bash_nexus_file
process = subprocess.Popen(bash_command.split(), stdout=subprocess.PIPE, shell=True if os_name == 'Windows' else False)
File "/usr/lib/python3.6/subprocess.py", line 729, in init
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'SplitsTree': 'SplitsTree'
"""
How do I solve this problem (error)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.