freelawproject / courtlistener Goto Github PK

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.

Home Page: https://www.courtlistener.com

License: Other

Python 59.90% CSS 0.76% JavaScript 15.43% HTML 14.80% Shell 0.19% Dockerfile 0.03% Makefile 0.01% TypeScript 0.57% PLpgSQL 8.30%

courts government government-data legaltech

courtlistener's Introduction

CourtListener

Started in 2009, CourtListener.com is the main initiative of Free Law Project. The goal of CourtListener.com is to provide high quality legal data and services.

What's Here

This repository is organized in the following way:

cl: the Django code for this project. 99% of everything is in this directory.
docker: Where to find compose files and docker files for various components.
scripts: logrotate, systemd, etc, and init scripts for our various configurations and daemons.

Getting Involved

If you want to get involved send us an email with your contact info or take a look through the issues list. There are innumerable things we need help with, but we especially are looking for help with:

legal research in order to fix data errors or other problems (check out the data-quality label for some starting points)
fixing bugs and building features (most things are written in Python)
machine learning or natural language problems.
test writing -- we always need more and better tests

In general, we're looking for all kinds of help. Get in touch if you think you have skills we could use or if you have skills you want to learn by improving CourtListener.

Contributing code

See the developer guide to get started.

Copyright

All materials in this repository are copyright Free Law Project under the Affero GPL. See LICENSE.txt for details.

Contact

To contact Free Law Project, see here:

https://free.law/contact/

                                   g@@D
                                  "l@@B!
                                   "@@"
                                    @@
                                    @@
                            _P '@.  @@
                            71__@   @@
                              @@    @@    __
                              @@    @@  ;F  @
                              @@    @@  'h__@
                              @@    @@    @g
                              @@    @@    @g
                              @@    @@    @g                     _~~_
                              @@    @@    @g   @@@@@@@@@@@@@@@@@@F  |!
                              @@    @@    @g   @@         @T     TmmP
   _gg_                       @@    @@    @g   @@         @'
   @   @gggggggggggggggggg    @@    @@    @g   @@         @\
   '@WP      .@         @@    @@    @@    @g   @@        J "_
             !@         @@    @@    @@    @g   @@       ,'  T
             ;@         @@    @@    @@    @g   @@       8    %
             W @        @@    @@    @@    @g   @@   ___d______@_-_
            @   q       @@    @@    @@    @g   @@   ______________
           ;"    g      @@    @@    @@    @g   @@   0@@@@@@@@@@@@"
       ____E_____]L___  @@    @@    @@    @g   @@
       ,_____________   @@    @@    @@    @g   @@
       '@@@@@@@@@@@@D   @@    @@    @@    @g   @@
                        @@    @@    @@    @g   @@
                        @@    @@    @@    @g   @@
                  _~ggg~_.       __g@g~_.      gg        ggg   gggggggg_,   ;gggggggggggg
                g@@P"""<@@g    _@@P"""<B@g     @@        @@@   @@@"""""Q@g  """""9@g"""""
              .@@F       "    @@F       "@@,   @@        @@@   @@@      @@g      [@g
              g@@            |@@         (@@   @@        @@@   @@@     ,@@/      [@g
              [@@            [@@         j@@   @@        @@g   @@@@@@@@@B        [@g
               @@L       ,    @@L       ,@@'   @@1       @@'   @@@   '@@L        [@g
                T@@_____g@@    T@@_____g@@      @@g____+@@?    @@@     @@a       [@g
                  '4B@BP"        '=B@BP"          <8B@B+"      BBB      0BB      "BN
              g        ;;   _~mma_  mmmmqmmmmm  mmmmmmms  _        ;   gmmmmmmm  gmmmmm__
              g        [|  F            |]      |         g\_      [   @         @       q
              g        [|  1.           |]      |         g  q     [   @         @       [
              g        [|    "+m__      |]      P""""""   g   "_   [   @""""""   @_______'
              g        [|         \,    |]      |         g     \, [   @         @    `a
              g        [| ,       /'    |]      |         g       q[   @         @      0
              """""""" ''   ""==""      '"      """"""""  "        "   """"""""  "       "
                        @@    @@    @@    @g   @@
                        @@    @@    @@    @g   @@
                        @@    @@    @@    @g   @@

courtlistener's People

Contributors

Stargazers

Watchers

Forkers

nowherenearithaca brianwc jraller snorey aktary vmgonzales elliottash prerakpradhan shashi792 wmbutler crobertadams findingjimoh voutilad colinstarger andr3ic zayedmohamed janderse djeraseit ishamibrahim macressler smiller171 pilgrim2go precedented abagwell kmckisicwd nyimbi akshaykumar14 jeffcjohnson hungdv136 haoyangt entaopy johnhawkinson wethepeopleonline jabagawee devuri jjjake jon-freed justforkin parthsagdeo idc9 kpetrov malteos atrbx5 davidxia digideskio nehasingh8289 saizai umeboshi2 glennneiger cmontaner mattdahl rieseted sharpenyoursword swipswaps backwardn gppassos lkj19zpdidde db3000 divergentdave nathreed paulyc dubbar81 ikeboy silky glen-uc drewsilcock lilianaguerrero gaybro8777 raindrum violet26 tewen pombredanne weiplanet drewdennison donovan680 lcrowther-snyk litewarp hut8 ricardoscotia volvox99 albertisfu lexeme-dev purselane mikimaine quinnr david-clark-1043 cite-reader k9coates maxwell-bland cweider kbrownnn mikeweinberg erosendo dikabgr ss108 openrefactory nhat175 daswordnerd brasfb fliperphile

courtlistener's Issues

The documents on /$court/$case should be in reverse chronological order

When there's more than one doc, the newest one should be at the top. This requires refactoring the code, and I'm not up to it right now. Should be easy to do though.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/3
Originally Reported By: Mike Lissner
Originally Created At: 2010-03-22T21:34:31.018

4th Cir. case numbers are missing their hyphens

Fourth Circuit case numbers show up next to the case name in the browse list as:

091167

when they should be:

09-1167

Somehow a hyphen should always be inserted after the first two digits. (I checked all other Circuits and this problem is unique to this Circuit.)

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/11
Originally Reported By: Brian Carver
Originally Created At: 2010-03-27T22:58:54.948

Backups are backing up Sphinx index

This is unnecessary, and an exception needs to be made in the rsnapshot file.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/26
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-11T07:55:00.173

Atom feeds should be advertised on the alerts page

This will help promote the use of the atom feeds over the emailer, which I'm guessing will place less demand on the server (though I haven't entirely thought it out yet).

This ought to be easy once the feeds work properly, but not a priority right now.

One consideration is that people will want to turn off (or delete) a feed once they are subscribed that way. But there's no way to turn off a feed at present.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/27
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-11T23:03:33.769

SEO Needed

Need to set up daily pings to google, bing, etc.

Need to submit to DMOZ

Need to reconfigure the sitemap so it's optimized.

Need to do the other handful of annoyances that help.

Ideas can be placed here for other SEO activities that help.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/25
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-11T02:56:22.446

CSS should be used to make the cursor a hand on the advanced link

Apparently there is a cursor attribute or something.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/41
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-21T04:25:36.120

Slug not created for ~600 cases

Looking in the URL, these cases have /None/ as their slug. Not good, need a quick fix.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/243
Originally Reported By: Mike Lissner
Originally Created At: 2013-05-12T00:29:21.511

Sitemap.xml lacks priority for items created through the flatpages framework

If you pull up sitemap.xml, you will see that it has entries for the pages that were created via the flatpages module, but that they lack priority settings.

Since they are more important to the meaning of the site than many of the other items in the sitemap, this needs to be set.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/1
Originally Reported By: Anonymous
Originally Created At: 2010-03-22T08:06:15.648

Users don't have to put in email addresses when they sign up

This is a problem. Either require it, or make them put it in when they create their first alert. Either way is fine, more or less.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/6
Originally Reported By: Mike Lissner
Originally Created At: 2010-03-27T20:50:49.544

Give Users the Ability to Create TopicTags

Users should be able to tag a document with a keyword or "TopicTag". By default such tags should be public, but optionally, users could create private tags. Users should be able to select whether (and which) TopicTags are part of their bulk downloads. Users should be able to form groups that work together to create TopicTags of mutual interest, and then they need the ability to see/download 1) No TopicTags, 2) Just Their Own TopicTags, 3) Just TopicTags from selected groups or 4) All public TopicTags.

This is a major enhancement, quite desirable, but somewhat complex to implement.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/242
Originally Reported By: Brian Carver
Originally Created At: 2013-03-23T05:03:35.459

Links are just a bit too ugly

For example:
http://173.203.199.26/ca2/United%20States%20v.%20Bari/

That's just no good. Maybe spaces should be replaced with underscores, but I can't think of how to do this easily in django.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/4
Originally Reported By: Anonymous
Originally Created At: 2010-03-23T18:07:38.023

Search and alert creation need to be combined

After using the site for a while now, I've noticed a few times when I wish a search was an alert, and I couldn't figure out how to make it happen. I've never had the opposite feeling if wishing an alert was a search, and I can't think of a reason why I would.

Since the search system is simply powering the alert system, we should just show the alert creation form for every search. It'll simplify things significantly.

Making the change is fairly simple, so I plan to do it before beta release.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/35
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-18T01:03:32.998

Add Canonical Citations as They Become Available

A cron job should run a script that tries to match cases in our database that lack the West citation with no-cost online sources that do have such citations (site rhymes with "frugal").

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/240
Originally Reported By: Brian Carver
Originally Created At: 2013-03-23T04:47:33.238

Spaces in URLs again

All of the April 2, 2010 cases in the 11th Circuit have URLs that include a space at the beginning and a space at the end. I didn't check other circuits.

Also, I tried to erase the spaces and see if the case showed up there too and got the error page. (So, I don't think there are dupes.) However, I then found that the "file a bug" link does not work.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/17
Originally Reported By: Brian Carver
Originally Created At: 2010-04-04T15:28:44.888

Need to do some Email Anti-spam work...ug.

Atwood has a very good post on this today, and I was noticing some strange fields in the emails yesterday.

http://www.codinghorror.com/blog/2010/04/so-youd-like-to-send-some-email-through-code.html

This looks like a pain, though probably one we need to endure to make the alerts go through consistently.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/44
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-21T19:23:19.086

Conversion to HTML can include lots of whitespace

I've only seen this once, so far, here:

http://courtlistener.com/ca4/Riley%20v.%20Dozier%20Internet%20Law,%20PC/

but the HTML version has A LOT of extra spaces sprinkled throughout.

Perhaps post-processing could be done that would essentially amount to a looping find and replace (find two spaces and replace them with one space until no more instances of two spaces together are found.)

To which "component" does this apply: "backend" or "web"?

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/5
Originally Reported By: Brian Carver
Originally Created At: 2010-03-27T08:50:21.395

Meta Description Information Should be applied

We need to figure out how the description text is filled in on Google and other search engines, and put something nice there.

I think it's a meta tag in the HTML head, but my memory is escaping me on the details.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/24
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-11T02:51:32.454

Case Name capitalization issues

See: http://courtlistener.com/ca9/Crs%20Recovery,%20Inc.%20V.%20John%20Laxton/

The plaintiff's name is CRS Recovery (all caps CRS), but the database contains "Crs" also the database has this with capital "V." rather than lowercase "v." between the parties. I know the 9th Cir. provides the names in ALLCAPS, so I assume we're doing some case conversion that likely assumes just the first letter of each word should be capitalized--usually a good assumption, and thus solving the first problem may be harder than solving the second.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/21
Originally Reported By: Brian Carver
Originally Created At: 2010-04-09T22:22:19.840

Updating Sphinx indexes should be done with a delta index

As our index grows (it's at 300MB now), we're going to need to start using the delta index system that Sphinx allows.

This will allow us to update the sphinx index with much less downtime.

It's complicated and not necessary yet, so I haven't set it up, but we're going to need to eventually.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/29
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-12T20:55:31.985

Parser/emailer and scraper should be moved to standalone scripts

I'm doing the emailer now, since it's a security problem, but the others should move too so people don't decide to wreck havoc using them.

Info here: http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/38
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-20T02:33:09.942

Add Links to Alternative Document Sources

In addition to providing the hyperlink to the Court website where we retrieved a document, a drop-down box should provide alternative sources for the same document, such as resource.org, Justia, Google Scholar, Findlaw, Cornell LII, Fastcase, LexisNexis, and Westlaw, etc. This option should also appear on any result page created for documents we are missing, as in the case of "red links" citations that we might ask people to sponsor for scanning. Sites would preferably have a consistent/predictable URL structure to make this possible.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/241
Originally Reported By: Brian Carver
Originally Created At: 2013-03-23T04:57:49.730

For users that aren't logged in, the register and sign-in pages should be combined

When you're not logged in, there are currently two links only one of which is ever applicable. These should be combined so things are more streamlined for both new and old users.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/42
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-21T04:27:13.630

Add Supreme Court

Since scraping and parsing are basically working as expected, the Supreme Court could be added too:

http://www.supremecourt.gov/opinions/slipopinions.aspx

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/30
Originally Reported By: Brian Carver
Originally Created At: 2010-04-13T20:35:47.944

Add Fed. Cir. Motion Orders

At this page:

http://www.cafc.uscourts.gov/motions/search.asp

The Fed. Cir. provides no-cost access to orders resolving certain precalendar motions that are acted on by the clerk of the court from this page. These are not listed on the usual Opinions & Orders page. These are not of major importance right away, but since they are available it would be nice to add them.

However, they should not be added until the case-number duplicate vs SHA-1 duplicate issue is resolved, because adding these under the current dupe-checker would mean that we only get the first pre-calendar motion in the database, and not the ultimate opinion (or any subsequent motion orders). That would be bad. That these are available gives us another reason to want to check dupes by comparing SHA-1 of documents.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/16
Originally Reported By: Brian Carver
Originally Created At: 2010-04-02T03:31:48.740

Fed Cir scraper stopped working

There have been three Fed. Cir. opinions released over the last few days, but none of them have made it onto the site. I ran the scraper manually with scrape/13 and it said "It worked. Duplicate found at 4." and so in some sense it even knows that the first duplicate is the fourth one down, but it doesn't seem to be putting the prior three into the database.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/12
Originally Reported By: Brian Carver
Originally Created At: 2010-03-30T04:16:51.817

Format when browsing the case lists

Currently when one browses all the cases, either from all the courts or a specific Circuit, a given entry looks like this:

Mid-Continent Casualty Co. v. American Pride Bldg., 09-11238
Monday, March 29th, 2010
Status: Precedential/Published.
Download PDF: From the court | Our backup

The first line is all italicized and is a link to the site's text/html version of the opinion, and Status and Download PDF are bold.

When I'm browsing opinions/all I find myself really wishing that it indicated which Circuit each case was coming from, but I have to mouse-over the links to get that info. I think even on the Circuit-specific lists, it would be fine to list it. People used to looking at court citations would not be surprised by that and might even expect it.

I'd suggest:
Italicize the case name only and make the case name only be the link to the opinion page. Then after the case number add:

(1st Cir.)
(2d Cir.)
(3rd Cir.)
(4th Cir.)
(5th Cir.)
(6th Cir.)
(7th Cir.)
(8th Cir.)
(9th Cir.)
(10th Cir.)
(11th Cir.)
(D.C. Cir.)
(Fed. Cir.)

Note that the Second Circuit really is abbreviated (2d Cir.) without the 'n' and there seems to be no good reason that I know of for why the folks that decide this stuff decided this, but it is the convention that people will expect.

By not making the case number and the Circuit be linked-text it leaves open the possibility that when we get multiple documents for the same case number, we might enable people to click on the case number and go to an overview page for all documents related to that case.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/14
Originally Reported By: Brian Carver
Originally Created At: 2010-04-01T18:08:24.962

Sign-in and register forms should auto-focus the cursor

Just use the same code as on the home and search pages.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/34
Originally Reported By: Anonymous
Originally Created At: 2010-04-17T10:13:48.535

Debug needs to be set to FALSE

All other bugs block this. Once they're fixed, DEBUG goes to FALSE.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/39
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-20T05:52:14.930

Alert deletion is too easy

We need a confirm page before executing the deletion. People might work hard on them, and be frustrated if they miss the button and click on delete instead.

I know I would be.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/22
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-09T23:52:43.093

Site needs a favicon

Ideas welcome. More generic and easily made, the better.

Identifyability welcome.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/23
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-10T00:01:59.512

Slug not created for ~600 cases

Looking in the URL, these cases have /None/ as their slug. Not good, need a quick fix.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/243
Originally Reported By: Mike Lissner
Originally Created At: 2013-05-12T00:29:21.511

Sign-in redirection needs to be sorted out.

Not sure what this takes, but it sucks when you lose your context because you weren't signed in when you thought you were.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/31
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-14T23:51:37.033

New scrapers needed for US Courts of Special Jurisdiction

Don't know a ton about this court, but we're about to add it when we pull in F2 in the next week or two. Would be good to get a scraper up and running for it.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/171
Originally Reported By: Mike Lissner
Originally Created At: 2011-07-22T15:01:18.058

Scraper does not retrieve non-precedential opinions from 3rd Circuit

I would think this can be solved by simply (mostly) duplicating the existing 3rd Circuit scraper code, but feeding it the non-precedential URL below:

http://www.ca3.uscourts.gov/recentop/week/recnon2day.htm

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/9
Originally Reported By: Brian Carver
Originally Created At: 2010-03-27T22:20:03.756

The alerts sidebar shouldn't show up for anonymous users

It's saying to save the alert they need to log in, but they did a search, not an alert.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/19
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-07T22:38:22.716

"%20" in short name / case # of a case

Not sure what caused this, but this is a 2d Cir. opinion from today, April 7, 2010:
Sumbundu v. Holder, 07-3736-ag%20

The URL/link does NOT contain the "%20", but the link text does.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/20
Originally Reported By: Brian Carver
Originally Created At: 2010-04-08T04:58:23.088

Once per day scraping is too slow

We should move towards real-time scraping as much as possible.

This could be done pretty easily with a daemon that checks each site every 20 minutes, and compares the HTML of the site with a SHA1 of the previous visit.

If different, run the full scraper. If same, wait 20 minutes, repeat.

The catch here is that issue #29 is likely a blocker for this, since updating the search index is rather compute intensive. This would also enable real-time email alerts (which would be awesome).

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/40
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-21T04:15:20.203

Duplicate case in database

I did a search for "copyright" today to see if it would list the 1st Circuit's Raytheon opinion. (It did--nice!) but then noticed that
Mid-Continent Casualty Co. v. American Pride Bldg., 09-11238 (11th Cir.) shows up twice, once with a SPACE before " Mid-Continent" in the URL.

Not sure what's going on there.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/13
Originally Reported By: Brian Carver
Originally Created At: 2010-04-01T16:19:36.020

Add older data to corpus

I think we have opinions starting on March 13, 2010. That's a weird start date. Is it possible/easy to have the scraper do a one-time run to retrieve all the older opinions that happen to still exist on the various Circuit sites? This would add a couple of years of older opinions for most circuits. Alternatively, is it possible for the scraper to just do a one-time run where it goes back to Jan. 1, 2010 and then be able to say that our coverage begins with 2010 and if you want older stuff, tough luck?

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/32
Originally Reported By: Brian Carver
Originally Created At: 2010-04-15T14:09:18.961

The document text on /$court/$case needs to be in a collapsable toggle

I imagine this will be a jQuery thing. Shouldn't be too hard, but I lack the skill right now.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/2
Originally Reported By: Mike Lissner
Originally Created At: 2010-03-22T08:13:37.482

Multiple-field search operator not working?

@(caseName,docText) Strickland

according to our Advanced search page should give results that contain Strickland in BOTH the caseName and the document text. Sometimes it appears to give results that contain EITHER Strickland in the caseName or the docText and with some other queries it doesn't seem to work at all. In either case, it also throws up this yellow error:

We completed your search, but @ is not a valid attribute.
Valid attributes are @court, @casename, @docStatus and @doctext.

and at a minimum that error can't be right because the advanced search page suggested such searches are grammatical (or our instructions on the advanced search page are wrong, but then how is one supposed to do a multiple-field search, if not like that?)

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/36
Originally Reported By: Brian Carver
Originally Created At: 2010-04-19T08:13:47.978

Duplicate alerts need to be avoided if they are common

Currently when an alert is created by a user, it can be the exact duplicate of an alert that is already in the system. This is problematic because it requires space in our database, and because the emailer will have to check both alerts each day/week/month.

If there are many duplicate alerts queries in the system, we could optimize things by checking for dups at the time of creation or editing. Then, at the time of deletion, if there is only one user associated with the alert, we can delete the alert entirely. If there is more than one, we can simply delete the association between the user and the alert. Leaving the alert in the DB for the other users associated with it.

If a user is editing an alert that is shared by another user, the association would need to be torn down, and the new alert created.

Whether to do this will be a balance between added complication in the codebase, and the needs of our computer.

It would add some complication to the codebase.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/15
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-02T02:28:03.803

Navigation links are sometimes on two lines

I'm not sure if this is because of a font setting or what, but I've seen it on one computer now, and I don't like it.

The one it's happening on is FF3.5.8.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/33
Originally Reported By: Anonymous
Originally Created At: 2010-04-17T10:12:39.480

Scraper doesn't get Amended opinions from 1st Circuit

Here is an amendment to an opinion (hence the "A" at the end of the pdf) that was released on Feb. 28, 2010 by the 1st Cir:

http://www.ca1.uscourts.gov/pdf.opinions/09-1020E-01A.pdf

The site has opinions released just before and after this date, but not this one.

From views.py

        # next: docType
        docType = docTypes[i].text.strip()
        if "unpublished" in docType.lower():
            doc.documentType = "U"
        elif "published" in docType.lower():
            doc.documentType = "P"
        else:
            # it's an errata, or something else we don't care about
            i += 1
            continue

Is that the code that is making us skip this document or just fail to classify it?

The above pdf is an example of "Errata" and they are sometimes very important. If the scraper is currently discarding them, then that's not ideal.

It goes back to the issue of how to check for duplicate documents. If the scraper relies on case name and number, then these amendments will also be missed, but if the scraper relies on SHA1 comparisons, then one has to download a document before one knows if it is a duplicate, probably resulting in lots of unnecessary downloads.

The ideal solution will likely be to configure each Circuit so that it downloads documents and runs SHA1 on them UNTIL it finds a duplicate and then stops downloading docs from that Circuit. If done right, this should only result in one unnecessary download per Circuit per day and would hopefully guarantee that no documents are missed.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/8
Originally Reported By: Brian Carver
Originally Created At: 2010-03-27T22:14:31.590

Search box should have cursor onload

This requires javascript, but somebody needs to figure out how. It should be easy, but filing it so it doesn't get forgotten.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/18
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-07T22:37:32.338

When very few hits, layout is bad on the alert preview page

The sidebar gets all wonky.

STR:

do a crazy alert preview that returns no results.
observe sidebar.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/7
Originally Reported By: Mike Lissner
Originally Created At: 2010-03-27T21:04:07.002

No Unpublished/Non-Precedential opinions from 2nd, 5th, or 11th Circuits.

Try the following searches:

@docStatus U @court ca2

@docStatus U @court ca5

@docStatus U @court ca11

None yield any results.

2nd Circuit opinions whose file name ends in "so" are "Summary Orders" and should be classified as Unpublished/Non-Precedential. I'm not sure if we're not scraping them or not classifying them right.

5th Circuit opinions that are Unpublished/Non-Precedential are listed on the right-hand side of their opinions web page, and so again, I don't know if we're not scraping them or mis-classifying them.

11th Circuit opinions that are Unpublished/Non-Precedential come from a separate page on their site: http://www.ca11.uscourts.gov/opinions/indexunpub.php and I'm also unsure if we're not scraping that page or misclassifying what we gather there.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/43
Originally Reported By: Brian Carver
Originally Created At: 2010-04-21T05:57:45.279

Erroneously using file name as case number for 6th Cir.

Right now it looks like we're using the file name to deduce the case number, but the 6th Circuit has some goofy file-naming convention that makes this produce the wrong result.

If we instead pulled the case numbers from the tables produced on this page:

http://www.ca6.uscourts.gov/cgi-bin/newopn.pl

instead of:

http://www.ca6.uscourts.gov/cgi-bin/opinions.pl which is used in view.py right now, then I think the scraper could still retrieve the proper case numbers without parsing the pdfs.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/10
Originally Reported By: Brian Carver
Originally Created At: 2010-03-27T22:54:30.953

Add Oral Argument Audio

Oral Arguments Audio:

Seven of the thirteen Circuit courts currently provide oral argument audio online as follows:

1st Circuit:
http://www.ca1.uscourts.gov/files/audio/audiorss.php (RSS)
files are in form: http://www.ca1.uscourts.gov/files/audio/##-####.mp3

2d Circuit:
None that I can find.

3rd Circuit:
Last 7 days listed here:
http://www.ca3.uscourts.gov/oralargument/ListArguments7.aspx
files are in form: http://www.ca3.uscourts.gov/oralargument/audio/##-####PlaintiffvDefendant.wma

Entire archive listed here:
http://www.ca3.uscourts.gov/oralargument/ListArgumentsAll.aspx

4th Circuit:
None that I can find.

5th Circuit:
http://www.ca5.uscourts.gov/OralArgumentRecordings.aspx
files are in form: http://www.ca5.uscourts.gov/OralArgRecordings/09/##-#####_M-D-YYYY.wma

6th Circuit:
None that I can find.

7th Circuit:
http://www.ca7.uscourts.gov/fdocs/docs.fwx (past week)
files are in form: http://www.ca7.uscourts.gov/fdocs/docs.fwx?submit=showbr&shofile=##-####_001.mp3

8th Circuit:
http://8cc-www.ca8.uscourts.gov/circ8rss.xml
files are in form: http://8cc-www.ca8.uscourts.gov/OAaudio/2010/2/######.MP3 (case number w/o hyphen)

9th Circuit:
http://www.ca9.uscourts.gov/media/
files are linked to in form: http://www.ca9.uscourts.gov/media/view_subpage.php?pk_id=0000005305 (random #?)
and then on a subsequent page: http://www.ca9.uscourts.gov/datastore/media/2010/04/09/##-#####.wma

10th Circuit:
None that I can find.

11th Circuit:
None that I can find.

D.C. Circuit:
Policy against providing the tapes to the public until after the case has been completely closed and even then not online.

Federal Circuit:
http://oralarguments.cafc.uscourts.gov/ but this only provides a search box in which you must enter the date.
THEN files are in form: http://oralarguments.cafc.uscourts.gov/mp3/####-####.mp3 (case # uses full yr: 2009-####)

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/28
Originally Reported By: Brian Carver
Originally Created At: 2010-04-12T19:32:23.039

Need to do some URL shortening...

I am working on getting the emails going out, and I've noticed that there is a real problem with the length of the URLs for cases.

Currently, urls are of the form:

courtlistener.com/court/caseNameShort
courtlistener.com/court/caseNumber

I'm thinking it would be pretty cool, and rather easy to add a URL shortening service for the purpose of emails and unique locations of documents.

I poked around, and found that .li ccTLDs can be purchased here:
http://www.switch.ch/

And, I found that crt.li is available for 17 CHF (15USD). I'd prefer ctl.nr, but the .nr ending costs $500 via wire transfer.

With that, and the sha1 sum (or the case number), pretty short URLs could be made:

.crt.li/e370a40765e5e6705b8578787b70dd20ed69cdf1
.crt.li/caseNumber

Something to think about.

Bitbucket: https://bitbucket.org/mlissner/search-and-awareness-platform-courtlistener/issue/37
Originally Reported By: Mike Lissner
Originally Created At: 2010-04-20T01:59:42.078