GithubHelp home page GithubHelp logo

jtopia's Introduction

jtopia

Java clone for Python term extractor topia.It will extract important keywords/key phrases from text. https://github.com/turian/topia.termextract

jtopia is a light-weight term extractor, which is domain independent in nature. jtopia uses a rule based + POS tagged based approach to find out the keywords / key phrases.

You can even tune the parameters in jtopia to get some filtered keywords / key phrases.

jtopia just throws n number of keywords from the input text. It does not rank the keyword as first or second.

The numbers in the square is just an information about the extracted term.

Hurricane Saturday Night=[1, 3]. Here 1 means the extracted keyword "Hurricane Saturday Night" has frequency 1 in the input text. The number 3 means the keyword is formed using 3 words "Hurricane", "Saturday" and "Night".

By default , jtopia is using a standard POS tagged lexicon from an english text to prove the keyword extraction strategy.So all the power of jtopia lies in the POS tagged lexicon (model/english-lexicon.txt).

Dependency project

Expansion of jtopia in your domain

If you want to expand the power of jtopia , have a look at the below points.

  • Add more POS tagged words from your domain to model/english-lexicon.txt by preserving the current form.

  • Instead of using model/english-lexicon.txt , use another POS tagger ( Stanford POS tagger / OpenNLP POS Tagger ) and make the POS output available to TermExtractor class. You can set this by passing "taggerType" to Configuration class. The values are "default","openNLP" or "stanford".

Fine tuning jtopia

  • You can change the extracted terms output using the TermsFilter class.There are two parameters (singleStrengthMinOccur and noLimitStrength) which filters the extracted jtopia output according to the parameters.

You can set this filters in TermsExtractor class as TermsFilter termsFilter = new TermsFilter(3, 2); Thease values (3,2) i set by default.

This act as a fine tuning parameter in the post processing phase of jtopia.Here what jtopia does is clean the possible junk keywords from the entire extracted keyword set by applying these filter parameters. Nobody wants keyword explosion from an input text , instead every one wants maximum tuned feasible keywords.

jtopia Hints

  • If you are dealing short text , then apply singleStrengthMinOccur,noLimitStrength to minimum will give you maximum possible keywords from the text.
  • If you are dealing large text then apply singleStrengthMinOccur, noLimitStrength to a feasible maximum to chop out all the unwanted junk keywords, which gives you minimum keywords which are best suited for the text.

How to Use

For more help on how to use https://github.com/srijiths/jtopia/wiki

Or take a look at JtopiaUsage.java

License

Apache License 2 - http://www.apache.org/licenses/LICENSE-2.0.html

jtopia's People

Contributors

srijiths avatar

Watchers

Nimila Hiranya Samarasinghe avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.