GithubHelp home page GithubHelp logo

sra1nani0303 / legal-summary Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bernino/legal-summary

0.0 0.0 0.0 25 KB

Legal document summarizer using BERT and transformers

License: Apache License 2.0

Python 100.00%

legal-summary's Introduction

legal-summary

Legal document summarizer using BERT and transformers from huggingface.co Mostly a Proof-of-Concept, but fully working pipeline for OCR and text summaries.

INSTALL

  • on a mac: brew install tesseract and rust
  • pip3 install pytesseract, pdf2image, nltk, transformers, yake, bert-extractive-summarizer
  • if conda env : conda install pytorch scikit-learn numpy On M1 Mac chip, its easier to install scikit-learn, numpy etc. with conda's pre-compiled packages!

RTFM

To summarise legal docs. Or other docs. But works with legal docs as well, which is a special case of difficult docs.

legal-summary traverses a folder for PDFs and OCRs them then extracts all paragraphs and summarises them using e.g. Google's T5 GAN ML neural net then summarises the concatenated summaries for an overview (higher quality than summarising the entire document). Other ML nets can be used, see Hugging Face models.

We cannot use the usual statistical summaries methods like TextRank etc. because a legal doc isn't such that more occurence of a word/sentence means its more important... Equally, a single occurence of a word or sentence may be extremely important, so we must record these as well.

We must rewrite each paragraph to a smaller paragraph with the same meaning, thus using a GAN.

legal-summary's People

Contributors

bernino avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.