GithubHelp home page GithubHelp logo

tap-search's Introduction

Tap-search

Tap search word finder. Upload a file or copy paste any number of paragraphs no matter how many new lines you leave for each paragraph and get a api for the frequency of occurence and the paragraph number. Tap-search runs on gunicorn web server that lets multiple users access at the same time. corn web server that lets multiple users access at the same time.

File Upload

Tap-search uses PyPDF2 for converting pdf to texts. This is bundled up with features package that lets you to convert pdf to text no matter the number of pages.

API

Tap-search gives you the api for the Paragraph number and the frequency of occurence in each paragraph so that you can find out if the word is present or not

if the word is present at paragraph 1&2 with 1 time each, you get a response

{"Paragraph":[1,2],"Frequency":[1,1]}

else,

{"Result":"Not found"}

Understanding API

tap-search gives you the result as

{"Paragraph":[],"Frequency":[]}

where paragraph list index = frequency list index , so it's even more easier to find the occurence at each paragraph.

How does it work?

The optimized search on tap-search works on inverted index. This allows you to create a in-memory database so you get a fast access to the database

Updates

The update will be on the image processing part. Tap search will in future allows you to upload files with text as an image an expect the same speedy search.

The code that extracts text from PDF

Tap search uses user defined package

from features.extract_pdf import pdf2string 

pdf2string obviously is a class that contains convert_file() function ,extracts the file and concatenates all the text in all the pages of the uploaded pdf.This works under pypdf2 install pypdf2 using pip3 comand line

pip install PyPDF2

Usecases

  • Check your resume Compatibility
  • Check if a particular word along with frequency ,is present in a file or not

tap-search's People

Contributors

theyk98 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.