GithubHelp home page GithubHelp logo

adi2412 / newsline Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 11.07 MB

A clojure application which would take an article and then categorise into pre-defined categories.

License: Eclipse Public License 1.0

Clojure 100.00%

newsline's Introduction

newsline

This is an application in clojure which would accept a news article as input and categorise it into any of the given categories.

Installation

Usage

	If running through lein use,

	lein run <filename.txt>

$ java -jar newsline-0.1.0-standalone.jar [args]

Issues

  • Words like "going" do not get recognised by wordnet at all for some reason. Need to figure a way out for this.

  • So Stemming was basically the shitiest idea in the world because half the time it doesn't work correctly.

  • It seems defining wordnet words without a pos tag actually is better because more words get recognised then.

  • Ditching the stemming idea initially(since the related synsets would automatically give us stemmed words) and now have to figure how to get more words to be recognised by wordnet.

  • Clojure shares its variables across functions. wtf?

  • Using .getOffset and .getSynsetID, I can find the value of the Synset ID for each instance of the word. Now I've realised that this value is very close for words that are very closely related.

  • I can formulate a function to calculate the difference between two synset id's, which can be used to make the value of the weight function.

  • We can also use statistical method to bolster the value of recurring words so that they are given higher priority.

Next Steps

  • Make a list of all the synset words gathered and find the synset for each.
  • Need to figure out how to add activation functions for this.
  • Make a Global Key List with all words and their synsetIDs along with their activation values. All words in the article get a default activation value of 1(or some other number).
  • Hide the crappy warnings from the stemmer module.

License

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

newsline's People

Contributors

adi2412 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.