Implemented a mini web search engine that ranks documents base on both query dependent and independent signals. Created Inverted Occurrence index and Inverted Compressed index to improve performance. Accomplished PageRank computation base on MapReduce framework.
web-search-engine's Introduction
Mini Search Engine
The project implements a mini web search engine that ranks documents base on both query dependent and independent signals. For query dependent signal, we used Cosine Similarity and Query LikelyHood algorithm. For query independent, we used PageRank value and Number of Views.
Also we created Inverted Occurrence index and Inverted Compressed index to improve performance.
For the PageRank, we used Hadoop MapReduce framework.