GithubHelp home page GithubHelp logo

valinsogna / ir_bim_model Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 6.14 MB

Python-based Information Retrieval system leveraging the BIM probabilistic model. Features include handling free-form text queries, relevance & pseudo-relevance feedback. Performance is rigorously evaluated using metrics like precision/recall, mean average precision, and R-precision. Utilizes the standard Cranfield dataset from aerodynamics.

License: MIT License

Python 37.55% Jupyter Notebook 62.26% Shell 0.19%
bim cranfield-collection information-retrieval

ir_bim_model's Introduction

IR system: BIM model

Implementation in Python of BIM, a probabilistic model for IR, able to:

  • answer free-form text queries.
  • allow relevance feedback.
  • allow use of pseudo-relevance feedback.

Evaluation of the effectiveness of the system is performed on a set of test queries by:

  • precision/recall for a query.
  • mean average precision for a set of queries.
  • R-precision of the top R ranked documents for a query.
  • mean average R-precision for a set of queries.

Dataset

The dataset used is the Cranfield which is a standard Information Retrieval text collection, consisting of 1400 documents (1.6 MB) from the aerodynamics field, in SGLM format. The collection contains 225 queries with relevance feedback.

Structure

The project is structured as follows:

  • data folder contains the dataset in original and preprocessed form.
  • utils folder contains the file function.py with the functions used in the project.
  • bim.py contains the implementation of the model bim.
  • index.pkl is the index built by the model bim in pickle format.
  • run_all bash script to run the entire project.
  • test_model.ipynb jupyter notebook to test the model.

Run

To import the dataset, preprocessing it and build the index, simply execute the bash script run_all:

bash run_all.sh

Results

To see the performance and usage of the index built, run the juptyer notebook test_model.ipynb:

Python Packages

  • NLTK for text processing.
  • NumPy for numerical operations.

Warning To run bash script run_all enable downlaod of nltk packages in utils/functions.py file at line 6,7:

nltk.download('stopwords')
nltk.download('punkt')

ir_bim_model's People

Contributors

valinsogna avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.