srishti-bhat / big-data-information-retrieval Goto Github PK
View Code? Open in Web Editor NEWBatch text search and filtering pipeline that takes in a large set of text documents and a set of user defined queries, then for each query, text documents are ranked by relevance for that query. The top 10 documents for each query are returned as output. Each document and query is processed to remove stopwords and stemming is applied. Documents are scored using the DPH ranking model.