646e62 / case-brief Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 117.65 MB

Generates a FIRAC-style case brief from a reported decision

License: GNU General Public License v3.0

Python 3.65% HTML 93.02% Jupyter Notebook 3.29% CSS 0.04%

case-brief's Introduction

case-brief

Generates a FIRAC-style case brief from a reported decision

What is a case brief?

In countries like Canada that use the common law system, reported court cases are an important legal source. This is because common-law courts use precedent as an authoritative source. The underlying idea is that judicial decisions should be consistent with one another, such that similarly-situated litigants are treated similarly under the law. Because reported cases are records of how a court treated a particular litigant, they are a valuable source of insight for this purpose. Reported decisions are also records of how courts apply legal principles, and very often follow a hierarchy that allows "higher" courts to overturn decisions made by "lower" courts.

Case briefs are short high-level case summaries that distill the precedential value from a reported decision. This precedential value can include the factual, legal, and analytical components of other cases.

What is FIRAC?

FIRAC is an initialism that stands for Facts, Issues, Rules, Analysis/Application, and Conclusion. It is one of several common ways to break reported decisions into conceptual components, and in my experience, the most effective. I summarize these different conceptual components below. For a more detailed explanation of these components, see FIRAC: An ELEMENTary Approach § 1.1 - An overview of the FIRAC approach.

Facts

To the end that parties in similar circumstances be treated similarly, the facts underlying reported decisions are a very useful way to compare and contrast litigants to identify similarities and difference sbetween them.

Issues

At the heart of every reported decision is a legal issue, absent which there would be no legal case to write about. So-called "issue spotting" is the primary skill tested on most law school exams, and a key part of real life legal analysis. Legal arguments are built around issues and any legal case analysis must be able to identify issues in order to be useful.

Fortunately, most reported decisions make issue spotting a relatively straightforward enterprise, as issues are often clearly identified as such. Where they aren't clearly identified, they must be inferred by applying legal rules to the factual matrix.

Rules

Where facts help identify similarly-situated litigants, the rules inform courts how such litigants should be treated. Where facts and issues are case-specific, legal rules are theoretically universal. Identifying a case's legal rule or rules is one of a case brief's primary goals.

Analysis/application

The analysis portion of a case applies rules to facts in order to address the issues the parties raised. Broken down in this way, the analysis describes the underlying logic used to apply each element together, demonstrates how rules apply to facts, and provides the rationale for resolving legal issues. Where cases with similar facts and similar issues end with different results, the difference is often in the analysis.

Conclusion

The conclusion briefly identifies the successful party and any judicial findings made or remedies awarded. It summarizes a case's outcome.

Why is this useful?

In practice, lawyers are very frequently required to read and be familiar with a lot of written information. Case briefs are useful in this respect, as they provide almost all of a case's precedential value to a reader without having to wade through dozens of pages of dense text, procedural history, and explication. This provides the reader with the requisite familiarity more efficiently. To the extent that case briefs cut down on the amount of actual reading a lawyer has to do, they save time and money, and may reduce human error.

In law school, briefing cases is an essential activity that occupies an inordinate amount of time. A program capable of reliably separating the FIRAC elements from one another would assist with both generating these briefs faster and helping the user understand these components when they next see them in a case. As with many things, sometimes seeing the answer can help a student "reverse engineer" the correct method. Properly sorted FIRAC elements and appropriately summarized case briefs may assist with this.

How does this program create case briefs?

This project operates on the hypothesis that these five elements can reliably be found and identified in most reported decisions using NLP techniques. It adopts David Guenther's premise, outlined in FIRAC: An ELEMENTary Approach § 2.1 - An introduction to understanding judicial opinions, that different linguistic components in reported cases (sentences, paragraphs, key words, etc) can be thought of as FIRAC elements, and proposes to apply this approach programmatically.

To do this, I first run a written decision through a text classification model I've trained to identify the five FIRAC elements (as well as document headings and procedural history, the latter of which is a somewhat special subset of the analysis). The model is under development but already showing promise. Once a decision's sentences have been thus classified, they get sorted into the FIRAC categories. Local summarization functions extract key information which then gets fed to GPT-3 for further summary and analysis. The end result is the case brief.

case-brief's People

Contributors

Stargazers

Watchers

case-brief's Issues

GPT-3 integration

OpenAI gives access to GPT-3 through an API. The models currently offered aren't at Chat-GPT levels and aren't free, but they are (hopefully!) good enough to do some basic text summarization and analysis without relying on custom models and rules.

Move abbreviations and other global data to separate JSON files

Identify and move globally-accessed variables into data files that can be imported (and updated) across files.

Large-file queries exceed token limits

The program currently uses spaCy tokens to measure a prompt's size. This sometimes causes problems as there's no one-to-one correlation between spaCy tokens and GPT-2 tokens. Switching to using GPT-2 token lengths instead will resolve.

Import citation generator functionality from the previous PDM project to improve output file quality

Not a core function, but will be a nice flourish for the JSON file once I complete other project aspects. These functions may also assist with addressing Issue #18

Text classification items are too variegated

The corpus used for citation labelling and NER worked well with various examples, but text classification poses a few unique problems when using different types of decisions across different court levels. One of those problems is with how clearly the FIRAC elements come out in different types of decisions.

Working on the assumption that appellate decisions exemplify these elements better than trial decisions, a new corpus containing only SCC cases should work better for text classification.

Pronoun confusion

When dealing with small text blocks the local summarizer and subsequently GPT-3 can get confused about who words like "they", "it" and so forth refer to. GPT-3 tends to fill in the blanks that the local summarizer creates. Specific prompts could address this, but it may be possible to do it through a rules-based approach locally.

Text classification items are too large

The docs suggest that classifying large text chunks may lead to weak results. Although not all case decision paragraphs are several hundred entities long, many are. Classifying text by sentence, rather than by paragraph, may be the better choice.

Decision plurality management

Majority and minority opinions only come up in panel decisions. In almost all (non-administrative) cases, only courts of appeal and the Supreme Court of Canada decide cases as panels. If the program determines that the case is an appellate case from these courts, it should run a check to see if there's an opinion plurality. If there is, the program should identify how many opinions there are and complete separate analyses for each opinion.

Add case citations and related inferential data

Two-part resolution to this issue:

Redesign the output so the full CanLII citation (style of cause + neutral/CanLII citation) becomes the page title;
Program a function that adds inferential data about the court level and jurisdiction to the prompts

Separation of concerns and other refactoring in Django's apps.py file

The apps.py file works, but it needs to be refactored. Priorities include:

Moving apps into separate files, where appropriate
Removing extraneous code
Removing print statements and other CLI functions

NL analytical patterns for legal tests

Certain legal issues tend to come up all the time, so consistent ways of addressing those issues have developed. In law these are called legal tests, and are usually tied to and named after the case that first implemented the test (or made it authoritative). Legal tests are usually expressed as uniform if-then type lists, and seem to be easy enough for gpt-3.5 to spot them in text prompts and sort out how the test was or wasn't satisfied.

There should be a way to scan a document for it's citations and to do a search for the legal test pattern if the decision cites the originating case. This will allow gpt-3.5 to better understand the analysis and generate meaningful summaries

Document GPT-3 test output

Design a function to store GPT-3 queries, their parameters, and their results to file. This should be helpful when trying to find best fits

Analytic tools running slowly

Tests during the PDM code meeting this morning showed that one of the nlp() calls in analytic_tools.retrieve_citations() was slowing down function calls. Specifically, the "minimal" dictionary trained to detect neutral/CanLII citations, statutes, and section numbers accounted for 75%+ of the function's computing time. The text classification function was running surprisingly fast.

This problem may be fixable in one or more ways:

Reducing the amount of data that gets loaded into the function;
Using a NER model rather than a span model (NER seems to train and respond faster than spans); or
Substituting some ML tasks with regex.

Process FIRAC elements into an Argdown file

Argdown is a markup language for complex arguments. Although not directly related to the case brief generator, a function capable of translating a natural language analysis into argdown would likely lead to significant improvements in future rule-based legal text processing.

GPT-3 fills in blanks without enough information

GPT-3 is doing well when given a fair bit of info and asked to summarize or analyze it. It does not do as well when asked to summarize a single sentence, and tends to make up answers to fill in perceived blanks. This may be resolvable with alternative prompts, more restrained settings when using small text samples, or another means.

Remove citations or replace with short forms

Citations are generally not semantically important for text summarization, though they may be useful for weight analysis, relational values, identifying legal tests, etc. Citations are important when they are used as labels for legal tests (eg, "Grant test", "Stinchcombe obligation", "WD analysis", and so forth).

This issue will be fully resolved once I've trained a model that can distinguish these citation references from their comparatively superfluous counterparts. An interim (and possibly good enough) solution will simply detect citations inside texts and exclude them from the text that ultimately gets submitted to GPT for summarization.

Textually classify processed decisions

Once a decision has been cleaned up it will need to be analyzed. These functions will run the text classification model I end up training.

Design output file

Create a sample text classification output file that can then be fed into the summarization and analysis functions.

LaTeX output

One possibility for report generation is to use LaTeX. It's relatively versatile as a format, and translating from headings, bullets, numbered lists in plain text/HTML to LaTeX should be relatively straightforward.

Implement "human-assisted" functions prior to automation

Design elements that allow the user to preview the local summarization, generate GPT-3 output, and ensure it's summary/analysis is accurate. This should help the program become useful before being more fully automated.

Define summarization parameters

Some parameters will be needed to determine how text is summarized locally and with GPT-3. Sample sizes sent out will need to be small enough to be economical and not be overwhelming (i.e., to be a true summary) but also large enough to give GPT-3 enough information to summarize and analyze on its own.

Resolving this issue will require trial and error with the spaCy and GPT-3 summarization and analyses.

Economize GPT-3 reliance

Because GPT-3 isn't free, rules and custom models should be used when possible to limit how heavily the program has to rely on a paid service.

Additional human-assisted view for automatic html/txt uploads

Add another view between uploading an HTML file and sending it to GPT. This view should show the sorted text in the editable text boxes in the manual view. This will help troubleshoot sorting errors and give the user the chance to adjust input when results aren't what's hoped for.

spaCy sentencizer incorrectly breaks paragraphs, sentences

SpaCy treats some commonly abbreviated terms in written decisions (eg, "para." for "paragraph", "Cst." for "Constable", etc) as sentence breaks. This both adds time onto manual data cleaning and makes getting input more prohibitive. The data cleaning functions need to be updated to correct these problems so that text files can be automatically and reliably loaded into the program.

Redesign GPT-2 token limits, prompts

The program currently limits GPT-2 tokens to 2800 per call. Higher values tended to exceed the max length of 4096 when dealing with longer files. Smaller files, however, had no difficulty with the default values.

Several OpenAI calls combine results from previous prompts with data from new prompts. These results can vary in length, and the maximum token length in the summarization functions should automatically adjust to account for this.