GithubHelp home page GithubHelp logo

projectdossier / essir-2023-legal-tutorial Goto Github PK

View Code? Open in Web Editor NEW
1.0 4.0 0.0 2.89 MB

Official respository for Legal Tutorial in The 14th European Summer School on Information Retrieval

Home Page: https://2023.essir.eu/organizers/

Jupyter Notebook 100.00%
cross-encoder large-language-models llm-rankers llm-rerankers statute-ranking statute-reranking

essir-2023-legal-tutorial's Introduction

Official respository for Legal Tutorial in The 14th European Summer School on Information Retrieval

Welcome to the repoistory of legal tutorial in ESSIR'23!

In this tutorial, we will delve into the world of legal information retrieval, specifically focusing on the identification of relevant statutes given a brief description of a legal situation. This task is crucial for legal practitioners, as it enables them to access the written laws that may apply to their cases. We explore this subject in more detail in the following.

Task definition

The primary objective of statute retrieval is to identify the relevant statutes (from the candidate documents) based on a concise description of a legal scenario (query).

Motivation

In countries that adhere to the Common Law system (e.g., India, UK, Canada, Australia, and many others), two main sources of law exist:

  1. Statutes which are the written laws
  2. Precedents or judgements of prior cases delivered by a court, which involve similar legal facts and issues are the current case, but are not directly indicated in the written law

Legal practitioners frequently rely on statutes and precedents when working on new cases. These resources help them understand how the court has discussed, argued and decided similar scenarios. Our tutorial aims to provide preliminary information on developing retrieval systems that can address this critical need.

Dataset

For this tutorial, we leverage the Artificial Intelligence for Legal Assistance (AILA) dataset, specifically focusing on Task 1 - Precedent & Statute retrieval. AILA encompasses a series of shared tasks designed to create datasets and methods for solving various legal informatics challenges.

To be more precise, we concentrate on TASK 1B, titled "Identifying relevant statutes," in a multi-stage setup. In the initial stage of retrieval, we explore BM25 and Splade. For reranking the top-k candidates retrieved by the first-stage retriever, we employ large language models (LLMs) with few-shot in-context reasoning with only two training instances, and fine-tuned cross-encoders with 40 training queries. We evaluate the reranker using ten queries from the validation set.

It is important to note that while a separate test dataset would make our cross-encoder reranking setup more robust, our primary goal here is to teach students how to implement and train these methods effectively.

Tutorial plan

Our tutorial is divided into several informative sessions:

  1. Introduction to Legal Information Retrieval: Presented by Sophia Althammar and Alaa El-Ebshihy and Alaa El-Ebshihy, this session provides an overview of legal information retrieval.
  2. First Stage Retrievers with BM25 and Splade: Taught by Tobias Fink, this session explores the implementation and usage of first stage retrievers, with code available in the "first_stage_retrievers" folder.
  3. Reranking with BERT-based and Larage language models: In the afternoon session, Arian Askari presents the process of fine-tuning and evaluating cross-encoder rerankers. This includes an investigation into how LLMs, particularly FLAN-T5, can effectively rerank statutes based on a legal question with minimal provided examples. The implementation of the reranking stage is available in the "llms_transformers_rerankers" folder.

Notes:

  • You can check out othe presentation slides in the "presentation" folder.

  • All of our tutorial could be run with Google Colab without access to premium account.

Evaluation table

Here is an overview of the evaluation metrics for the different retrieval methods we explore in this tutorial:

Methode Backbone P@1 P@5 P@10 recall@10 recall@100 Map@100
BM25 Elasticsearch .1200 .0480 .0380 .0860 .4373 .0605
Splade BERT .1400 .0880 .0700 .1667 .7257 .1060
BM25 + Cross-encoder (fine-tuned) LegalBERT .5000 .1800 .1200 .2733 -- --

BM25 + LLM few-shot reranker (Flan-T5): We encourage you to desing and test your prompt using the statute reranking notebook. Feel free to contact Arian Askari, [email protected], if you had any question/interest regarding further analyzing LLMs in legal domain.

Organizers

Arian Askari, PhD candiate from Leiden University

Tobias Fink, PhD candidate from Tu Wien

Sophia Althammar, PhD candidate from Tu Wien

Amin Abolghasemi, PhD candidate from Leiden University

Alaa El-Ebshihy, PhD candidate from Tu Wien

essir-2023-legal-tutorial's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.