GithubHelp home page GithubHelp logo

krishnadwypayan / searchengine Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 56 KB

Java 100.00%
search-engine wikipedia wikipedia-corpus wikipedia-page java information-retrieval information-extraction

searchengine's Introduction

Search Engine for Wikipedia

A complete search engine for the Wikipedia corpus(62GB) that gives search results in the form of Wikipedia page titles that relate to the given search words.

The types of queries supported by the Search engine are:

*** Normal queries : simple line queries that would search for each of the words of the line. *** Field queries : Fields include Title, Infobox, Body, Category, Links, and References of a Wikipedia page. *** Boolean queries : Supported boolean operations include AND, OR, NOT on the words of the query.

Index Creation

The index would be about 1/4th the size of the corpus (~16GB). SearchEngineMain.java handles the process of index creation. To create an index on the wikipedia corpus, give the path for the corpus. Subsequently give the path for the folders where the files need to be placed. It is required to give separate folder paths for each of the index creation steps just like the paths stated in the file. Subsequently, run the SearchEngineMain class. (Note: The index creation, merging of the index and splitting of the index into multiple files might take a lot of time.)

Search For Query

  • Normal Query should be like "what is in a name?"
  • Field Query should be like "b:superhero c:cartoon i:superman r:man of steel". (Note: Fields to be in ["i" (infobox), "r" (references), "e" (external links), "c" (category), "t" (title), "b" (text body)])
  • Boolean Query should be like "sachin AND dhoni OR kohli NOT world cup" (Note: Boolean operations must be in capital case like mentioned in the example query)

Steps to run the Search Engine:

=> javac *.java => java QueryHandler

Enter the query when prompted for!

searchengine's People

Contributors

krishnadwypayan avatar

Watchers

 avatar

Forkers

charigardash

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.