GithubHelp home page GithubHelp logo

marro's Introduction

MARRO

MARRO: Multi-headed Attention for Rhetorical Role Classification in Legal Documents

This repository contains the code for MARRO : a multi headed attention based model for rhetoric role classification. MARRO has two variants, where we can use either pretraine dmebddings from the legal documents OR use LEGAL-BERT-SMALL model to generate embeddings form the sentences. The model LEGAL-BERT-SMALL can be swapped for other BERT based models

Every sentence in a court case document can be assigned a rhetorical (semantic) role, such as 'Arguments', 'Facts', 'Ruling by Present Court', etc. The task of assigning rhetorical roles to individual sentences in a document is known as semantic segmentation. We have developed MARRO for automatic segmentation of Indian court case documents. A single document is represented as a sequence of sentences. We have used 7 labels for this task: Arguments, Precedent, Statutes, Facts, Ratio Decidendi, Ruling of Lower Court, Ruling of Present Court.

TRAINING

For training a model on an annotated dataset

Input Data format For training and validation, data is placed inside "data/text" folder. Each document is represented as an individual text file, with one sentence per line. The format is:

  text <TAB> label

If you wish to use pretrained embeddings variant of the model, data is placed inside "data/pretrained_embeddings" folder. Each document is represented as an individual text file, with one sentence per line. The format is:

emb_f1 <SPACE> emb_f2 <SPACE> ... <SPACE> emb_f200 <TAB> label  (For 200 dimensional sentence embeddings)

"in_categories.txt" and "uk_categories.txt" contains the category information of documents in the format:

category_name <TAB> doc <SPACE> doc <SPACE> ...

Usage

To run experiments with default setup, use:

python run.py --data_path data/text/IN-train-set/ --use_marro True                                                              (no pretrained variant)
python run.py --pretrained True --data_path data/pretrained_embeddings/  --use_marro True   (pretrained variant)

Other default values and hyperparamters are given in run.py

By default, the model employs 5 fold cross-validation, where folds are manually constructed to have balanced category distribution across each fold.

marro's People

Contributors

purbid avatar abhijnanc avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.