transformer_based_ir's Introduction

Transformer Based Information Retrieval (IR) System

In this project, we prototype the IR product that will recieve an input string and return the most relevant offers to the user with respect to that code. It uses a transformer model for text classification to detect the type of query, then depending on the type, it applies a series of procedures to find the best match (text similarity using TF-IDF, cosine similarity, stemming and lemmatization, "fuzzy" similarity, data augmentation (through synonym generation)

These are the steps taken to achieve this:

Load libraries and define functions.
Given an input text, get the category classified by our TypeDetector service which we consume through an API.
Depending on the response (0 = Retail, 1 = Brand, 2 = Category) we perform a specific treatment.
If the type of query is 0 or 1, we search for a match with our unique retailers or brands by performing TF-IDF and Cosine Similarity.
Also, we compute a 'fuzzy similarity score' in case there are typeos, which can be helpful if the cosine similarity doesn't bring relevant results.
Next, we fetch offers that come from our matched retailers or brands, only if the cosine similarity is > 0.4 or the fuzzy similarity is > 0.75
If the type of query is 2, then the similarity of the term is compared to offers directly, and synonyms of the terms are computed and also compared to the offers to contemplate a wider range of relevant results.
Finally, the relevant offers are returned in a json list along with a log of activity.