GithubHelp home page GithubHelp logo

gate-ml's Introduction

Why GATE-ML ?

Training,Application in GATE Batch Learning PR is already straight forward. You can use GATE GUI to load a corpus , Train and Test it in Batch Learning PR. Then what's the scope of GATE-ML ??

  • Crazy ?? : NO
  • Reinventing the wheel again ?? : NO

One and Only One Reason :)

  • When you load a very big corpus and try to pre process and train using the GATE GUI;it hangs like HELL.

Machine Learning in GATE as Embedded. This package contains 3 phases.

  • Preprocessing : Read input text files and create GATE XML files
  • Training : Train GATE XML files and create a model
  • Application : Classify a input text using the trained model

Property files

Reminder for "forward-slash" "backward-slash" change according to the operating system environment.

GATE_ML.properties

Inital property needed for the system to run

  • GATE_HOME : GATE HOME in your system
  • learningMode : Three modes are : Preprocessing,Training and Application
  • sourceDirectory : contains all the property files for the above 3 learning modes

Sources Directory

Source directory contains three sub directories. Each points to one of the three learning modes.

preprocess

  • GAPPFile : GAPP file for Preprocessing . A sample gapp file can be found at gappFile/ml_data_preprocessing.gapp
  • AnnotationTypesRequired : Annotation name which you want to inject the class label.By default its Sentence.You can add your own custom annotations here. If you are using a annotation other than GATE default annotations , make sure to build the gapp files using that PR's
  • CorpusName : Name of the corpus
  • inputDir : Contains training files as .txt files. At the time of preprocessing , the directory name is treated as the class label for all the txt files in it. Expects simple directory hierarchy like 20news-group-data
  • outputDir : Output GATE XML's are stored here
  • removeStopWords : Removing stopwords or not ( true / false )
  • removePunctuation : Removing punctuations or not ( true / false )

training

  • GAPPFile : GAPP file for Training . A sample gapp file can be found at gappFile/ml_training.gapp
  • CorpusName : Name of the corpus
  • xmlCorpus : outputDir of Preprocess mode

The ml-config.xml is under this folder , so default location of trained model is here.

application

  • GAPPFile : GAPP file for Preprocessing . A sample gapp file can be found at gappFile/ml_application.gapp
  • CorpusName : Name of the corpus
  • removeStopWords : Removing stopwords or not ( true / false )
  • removePunctuation : Removing punctuations or not ( true / false )

GAPP Files

Sample gapp files can be found here.

  • ml_data_preprocessing.gapp : ANNIE with defaults ( with out NE Transducer and Ortho Matcher)
  • ml_training.gapp : Batch Learning PR with ml-config.xml from sources/training
  • ml_application.gapp : ANNIE with defaults ( with out NE Transducer and Ortho Matcher) and Batch Learning PR

GATE-ML Work Flow

Execution starts from GateLearning.java which takes GATE_ML.properties and proceed further according to the learningMode.

If the learningMode is "Preprocess" then the system takes sources/preprocess folder as configuration directory.

If the learningMode is "Training" then the system takes sources/training folder as configuration directory.

If the learningMode is "Application" then the system takes sources/application folder as configuration directory. This is just a Demo mode , the input text is hard coded in GateLearning's executeClassifier method.

Dependency Project

Build

Using Maven , mvn clean install assembly:single or mvn clean package

License

Apache License 2 - http://www.apache.org/licenses/LICENSE-2.0.html

gate-ml's People

Contributors

srijiths avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.