manel15279 / intelligent-information-retrieval-system Goto Github PK

View Code? Open in Web Editor NEW

An implementation of an Information Retrieval System that indexes documents and matches queries using vector space models (Jaccard, Cosine, Scalar), boolean model, and probabilistic model (BM25).

Python 100.00%

intelligent-information-retrieval-system's Introduction

Information Retrieval System: Indexing and Query Matching

Overview

This project implements an Information Retrieval System (IRS) that indexes documents and matches queries using various retrieval models. It aims to apply concepts learned in information retrieval courses, utilizing the LISA dataset for testing.

Features

Indexing: Implement algorithms for extracting terms, removing stopwords, and normalizing terms in documents using NLTK. Create descriptor and inverse files to facilitate retrieval.
Query Matching: Implement retrieval models such as scalar product, cosine measure, Jaccard measure, boolean models (AND, OR, NOT), and BM25 probabilistic model.
Evaluation: Compare retrieval models based on average precision, P@5, P@10, recall, F-measure, and plot precision-recall curves.

Usage

Clone Repository: Clone the repository to your local machine.
Install Dependencies: Install required dependencies using pip install -r requirements.txt.
Prepare Data: Obtain the LISA dataset from the University of Glasgow website and concatenate the files.
Execute Application: Run the app.py file to launch the application python main.py.
Interact with GUI: Use the graphical user interface to perform indexing, query research, query matching, and evaluation.
View Results: Evaluate the performance of different retrieval models and visualize precision-recall curves.

Recommend Projects