GithubHelp home page GithubHelp logo

mogady / bertqa-htmlsnippets Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 23 KB

Creating Google search like snippets using BERT-QA.

Python 100.00%
nlp nlp-machine-learning nlp-question-answering question-answering cortex html ai html-questions machine-learning artificial-intelligence

bertqa-htmlsnippets's Introduction

Introduction

This is a Question-Answering Bert project for HTML content. The model simply answers the questions using the text of the HTML context, then post-process the answer and return the html snippet/card that contains the predicted answer after removing the unnecessary items like styles, similar to Google featured snippets.

you can read my article about it here

Request :

        {"html_url": "the url for the article",
         "question": "user question",
         "article": "HTML article to extract answer from"
        }

Output:

        {"html_url": "the url for the whole article",
         "question": "user question",
         "article": "HTML article to extract answer from",
         "html_snippet": "HTML chunk/section that holds the answer of the question",
         "text_snippet": "text chunk/section that holds the answer of the question",
         "images": "list of images exists in the article",
         "reader": "indicates if model managed to answer or not"
  
        }

How does it work:

Here I use distilbert which is pre-trained on the QA task and only works with text, not HTML, however, I want a model that can returns the HTML version of the answer, to do that I have to search for the answer in the HTML content and find the container element which has the answer in it.

I parsed the HTML as a tree and started looking in each branch for the model predicted answer.

you can use different models by changing the model name in predictor.py

How to run:

This is deployed using cortex-project.

Install the Cortex CLI.

$ bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/0.18/get-cli.sh)"

inside the project folder run

cortex deploy
cortex get reader

to monitor the server run

cortex log reader

bertqa-htmlsnippets's People

Contributors

mogady avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.