GithubHelp home page GithubHelp logo

rlugojr / scalpel-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from barmalei/scalpel

0.0 2.0 0.0 7.2 MB

License: GNU Lesser General Public License v3.0

Python 22.58% Java 3.83% C 1.51% Groff 71.37% C++ 0.34% Ruby 0.37%

scalpel-1's Introduction

Scalpel: Text Analyzing Engine

Scalpel is text analyzing tool that implements and integrates various text analyzing and processing algorithms and packages. The approach and design of Scalpel tries to make the library maximal usable, clear and understandable for researchers and developers. One of the main goal is seamless integration various third-party text processing libraries (TNT, Lingpipe, Stanford, etc) under common ruff with unified common interface. The following main text processing and analyzing tasks are embedded in Scalpel:

  • Named entity recognition:
    • Stanford NER
    • Lingpipe NER
    • TNT Dutch NER
    • LBJ NER
  • Part of speech tagging:
    • Stanford POS
    • TNT POS
  • Stemmer
    • Basing on snowball
  • Edit distance (Levenshtein) match and search
  • Bitap search
  • Episode distance
  • Corpora manipulation:
    • CoNLL 2000
    • CoNLL 2002
    • CoNLL 2003
  • Tokenization and tokens manipulation
  • etc

To make live simpler Scalpel itself is implemented in Python. Beneath python API different third packages written in other programming languages are hidden. For instance Stanford and Lingpipe are Java based packages, Snowball stemmer is C-baed package, TNT is binary package etc. Scalpel abstraction level allows developers to access all these packages using the common interface/API. For instance, to recognize entities by Stanford NER do the following:

	from gravity.tae.ner.stanford.ner import NER
	result = NER()("Text to where named entities have to be recognized")

In case of Lingpipe NER it looks practically the same:

	from gravity.tae.ner.lingpipe.ner import NER
	result = NER()("Text to where named entities have to be recognized")

Scalpel installation

Just go to Scalpel home folder and type in terminal the following command:

$ python ./.primus/deploy.py

Deployment cares about downloading all third party packages, compilation C and Java code, validation environment, configuring and testing.

Than include "SCALPEL_HOME/lib" folder in your project PYTHONPATH and use it. It is possible to see whether it is in workable state by typing the following commands:

$ export PYTHONPATH=lib
$ python lib/gravity/tae/ner/stanford/ner.py

Or, SURPRISE, run JAVA based NER under Jython what can significantly speedup your code:

$ export JYTHONPATH=lib
$ lib/jython/jython lib/gravity/tae/ner/stanford/ner.py

License

LGPL

Pay attention there is a bunch of third party packages used by Scalpel that are distributed under various licenses. Be careful if Scalpel is going to be distributed as part of commercial project.

For research purposes the project can be used as is. No special restrictions have been found :)

More information

Take a look at Scalpel presentation:

scalpel-1's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.