Developed during the TASSEL hackathon at Buckler Lab for building predictive models for site features. It takes data as a TSV file and uses the Spark MLlib implementation of machine learning algorithms. Example cases include predicting the conservation of a nucleotide based on a set of annotations. The applicability is not limited to the computational genomics, the framework can be used for any other machine learning task.
Algorithms implemented include :
- Random Forests (classification/regression)
- Gradient Boosted Trees (classification/regression)
- Support Vector Machines (classification)
- Navie Bayes (classification)
- Linear Regression
Dependencies:
- Java (> 1.8)
- Spark
- JavaFX
Authors:
- Janu Verma
- Zack Miller