GithubHelp home page GithubHelp logo

jakebjorke / datumbox-framework Goto Github PK

View Code? Open in Web Editor NEW

This project forked from datumbox/datumbox-framework

0.0 2.0 0.0 2.61 MB

Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

Home Page: http://www.datumbox.com/

License: Apache License 2.0

Java 100.00%

datumbox-framework's Introduction

Datumbox Machine Learning Framework

Datumbox

The Datumbox Machine Learning Framework is an open-source framework written in Java which allows the rapid development Machine Learning and Statistical applications. The main focus of the framework is to include a large number of machine learning algorithms & statistical tests and being able to handle medium-large sized datasets.

Copyright & License

Copyright (c) 2013-2015 Vasilis Vryniotis.

The code is licensed under the Apache License, Version 2.0.

Version

The latest version is 0.6.0 (build 20150502).

The master branch is the latest stable version of the framework. The devel branch is the development branch. All the previous stable versions are marked with tags.

The releases of the framework follow the Semantic Versioning approach. For detailed information about the various releases check out the Changelog.

Installation

Datumbox Framework is available on Maven Central Repository.

Maven:

    <dependency>
        <groupId>com.datumbox</groupId>
        <artifactId>datumbox-framework</artifactId>
        <version>0.6.0</version>
    </dependency>

Note: A couple of classes which use Linear Programming require installing an external C library called lpsolve. Most users won't use these classes and thus installing the binary library can be considered optional; please check the Detailed Installation Guide for more info.

Documentation and Code Examples

All the public methods and classes of the Framework are documented with Javadoc comments. Moreover for every model there is a JUnit Test which clearly shows how to train and use the models. Finally for more examples on how to use the framework checkout the Code Examples or the official Blog.

Technical Details

The core part of the project is about 30000 lines of code, it uses Java 8 features and a Maven Project Structure. If you find a bug or decide to document particular parts of the code, please consider contributing your changes by sending a pull request.

Which methods/algorithms are supported?

The Framework currently supports performing multiple Parametric & non-parametric Statistical tests, calculating descriptive statistics on censored & uncensored data, performing ANOVA, Cluster Analysis, Dimension Reduction, Regression Analysis, Timeseries Analysis, Sampling and calculation of probabilities from the most common discrete and continues Distributions. In addition it provides several implemented algorithms including Max Entropy, Naive Bayes, SVM, Bootstrap Aggregating, Adaboost, Kmeans, Hierarchical Clustering, Dirichlet Process Mixture Models, Softmax Regression, Ordinal Regression, Linear Regression, Stepwise Regression, PCA and several other techniques that can be used for feature selection, ensemble learning, linear programming solving and recommender systems.

Bug Reports

Despite the fact that parts of the Framework have been used in commercial applications, not all classes are equally used/tested. Currently the framework is in Beta version, so you should expect some changes on the public APIs on future versions. If you spot a bug please submit it as an Issue on the official Github repository.

Contributing

The Framework can be improved in many ways and as a result any contribution is welcome. By far the most important part missing from the Framework is multithread support and the ability of using the framework from command line. Other important enhancements include improving the documentation, the test coverage and the examples, improving the architecture of the framework and supporting more Machine Learning and Statistical Models. Please consider contributing if you want to keep this project alive.

Acknowledgements

Many thanks to Eleftherios Bampaletakis for his invaluable input on improving the architecture of the Framework. Also many thanks to ej-technologies GmbH for providing a license for their Java Profiler.

Useful Links

datumbox-framework's People

Contributors

datumbox avatar lmpampaletakis avatar

Watchers

James Cloos avatar Jake Bjorke avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.