GithubHelp home page GithubHelp logo

mwpereira / vector-space-model Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 1.83 MB

Using a vector space model to perform a keyword search.

JavaScript 8.16% TypeScript 71.68% Vue 20.16%
nuxt typescript buefy

vector-space-model's Introduction

๐Ÿ“Œ Group Members

  • Michael Pereira (500896409)
  • Hitarth Chudgar (500888845)

๐Ÿ” Vector Space Model

This project requires node.js.

Build Setup

# install dependencies
$ yarn install 

# serve with hot reload at localhost:8080
$ yarn dev

# build for production and launch server
$ yarn build
$ yarn start

Alternatively, can use npm instead of yarn.

Dictionary & Postings

Both files can be found under generated directory when the program generates them when a keyword is searched.

CACM Resources

All files required for the assignment is found under the static directory.

๐Ÿ“š Frameworks

  • Nuxt.js - for building user interfaces and connecting Javascript/Typescript code
  • Bulma - for UI components and styling

๐Ÿ” Back-End

Dependencies

  • express - for running a server locally to access local files
  • stopword - for removing stopwords from strings
  • natural - for stemming words in strings

๐ŸŽจ Front-End

Dependencies

  • Buefy - for using UI components for Vue.js based on Bulma
  • axios - for the promise based HTTP client to handle requests

๐Ÿ“ Program Details

Posting list order

The posting lists are in the ascending order of document ID.

Within the posting file, it is via the names:

term [documentId, TF [positions]]

top-K method and value

To find our IDF threshold value we made use of: Finding a set A of documents that are contenders, where K< |A| << N

We made use of the index-elimination method: as it only considers documents containing terms whose idfexceeds a threshold, and containing many (or all) of the query terms.

Our threshold values were:

idfValues[i] > 1.60 && idfValues[i] < 3.51 (1.60, 3.51) - based on lower and upper limit of document matching.

Hence, K values lies between 1.60 and 3.51.

tf-idf weighting scheme

We made use of the conventional weighting scheme for tf-idf like so:

Step 1: Computing the Term Frequency(tf)

Screen Shot 2021-11-03 at 10 06 39 PM

fij measures term frequency in document.

Step 2: Compute the Inverse Document Frequency โ€“ idf

idfi= log(N/dfi) where N is the number of documents in the collection, dfimeasures how many documents term ki occurs in

Step 3: Calculating the weighting scheme

Combining IDF factors with TF

wij= tfij* idfi

vector-space-model's People

Contributors

hitarthchudgar avatar mwpereira avatar

Stargazers

Rhichard Koh avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.