GithubHelp home page GithubHelp logo

eellak / gsoc2018-3gm Goto Github PK

View Code? Open in Web Editor NEW
41.0 13.0 9.0 142.21 MB

💫 Automated codification of Greek Legislation with NLP

Home Page: https://openlaws.ellak.gr/

License: GNU General Public License v3.0

Python 39.73% Jupyter Notebook 12.21% TeX 3.59% Shell 0.53% CSS 11.46% HTML 10.37% Makefile 0.61% TypeScript 21.22% JavaScript 0.28%
government-documents legal-texts text-mining codification government-gazette nlp automation python3 gsoc-2018 natural-language-processing

gsoc2018-3gm's Issues

Domain certificate expired

Hi, anytime I try to access to your project's link I get this error on my browser (Firefox)

Did Not Connect: Potential Security Issue

Firefox detected an issue and did not continue to openlaws.ellak.gr. The website is either misconfigured or your computer clock is set to the wrong time.

It’s likely the website’s certificate is expired, which prevents Firefox from connecting securely.

What can you do about it?

openlaws.ellak.gr has a security policy called HTTP Strict Transport Security (HSTS), which means that Firefox can only connect to it securely. You can’t add an exception to visit this site.

The issue is most likely with the website, and there is nothing you can do to resolve it. You can notify the website’s administrator about the problem.

Learn more…

Hope to solve this issue soon, thanks

3gm.ellak.gr

Please configure 3gm.ellak.gr to work with the VM at 83.212.109.156 .

Full history of a codified version of a law

Add the ability to see the full history of a codified version of a law.
This requires creating a page where the user can see at the same time all changes made by all amending laws to a legal text.
This is not difficult in terms of the algorithm / code. The required code is almost complete.
The difficulty lies on the development of the right user interface that will facilitate user experince user and enable the better understanding of the applied changes to the end user.

Broaden legislative acts extraction

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Docker Support

Docker support would allow much easier installation and should be easier to be platform agnostic.

The current installation documentation not only have its own platform specific instructions but also has multiple links to another project's installation documentation, all of which has their own platform specific instructions.

Responsive Web Application

Description

Make the web application responsive (using Bootstrap)

Deliverables

  • Responsive layout for small and medium screen size for all pages of the web application
  • Specification for route templates (e.g. see EUR-lex standards)

Estimated Time : 4 days

Add ground truth evaluation tool

A tool for comparison with ground truth is needed. The tool will include

  1. Downloading the ground truth (e.g. Raptarchis' Codification from Ministry of Interiors)
  2. Parsing the ground truth to create comparable extracts of legislation
  3. Use WER to count the accuracy of our method wrt ground truth

Create legal dictionaries in Greek

Description

Man applications of natural language processing and machine learning to text can benefit from a controlled lexicon of expert-selected terms (i.e., a dictionary). This is especially true of highly technical language, such as legal text. However, no open source and freely-available dictionaries of this nature have been available in Greek. Creating new legal dictionaries would greatly benefit 3gm but also automatic codification of Greek legal text in general.

Deliverables

  • dictionary of geopolitical entities, actors and divisions (e.g., countries, states, provinces)
  • legal dictionary containing common terms, courts and acts
  • financial dictionary
  • dictionary of public administration offices and public administrations
  • dictionary of naval terms and flags

Total time: 3 weeks

Law Summarization

Developments in Greek politics during the last few years have mandated, large pieces of legislative texts to be published in the GGG at once, the most recent being the publication of the fourth memorandum. Effective summarization of laws therefore, becomes more and more useful.

I propose training a machine learning algorithm that can provide a comprehensive summary of each law or act in relation to its size.

Show Articles Titles

It would be good to add the required functionality in order to export and display the titles of the articles

Train NER with spaCy and embody it in the project

Description

Train an NER (Named Entity Recognizer) special for Government Gazette Texts using the NLP library spaCy. The NER should be extended (do not override the pre-existing labels and do not add new labels if they are not needed since that will drop the model's accuracy).

The pre-existing Greek NER can be found here

Deliverables

  • Tag Map
  • Annotatted dataset (> 3000 sentences)
  • spaCy's trained model
  • Embody to the project (module & web application)
  • API Endpoints

Estimated Time: 1.5 month

Greek Government Gazette Corpus Analysis

Description

Analyzing a corpus allows us to draw conclusions concerning its contents. Another way to understand how legislation is organized is to study closely the legislative graphs produced by the codifier.

Deliverables

  • Report of corpus analysis on the GGG corpus as a whole, containing info about specific metrics such as frequency distributions, collocations, diversity of words or percent of distinct words in the document, most frequent words as well as dynamic and structural data.from the legislative graphs.

Total time: 10 days

Improve RESTful API

Description & Deliverables

Related blog post

  • API Endpoints
  • Token based authentication
  • Token issuing functionality
  • Limits on requests
  • Provable testing with locust.io

Estimated Time: 1 week

Ability for interactive feedback / amendments on the algorithmically generated codified text

Add the ability for interactive feedback / amendments on the algorithmically generated codified text by the end users.
In some cases, the resulting codified text contains erroneous references or is missing some references (links). For those cases , it would be nice to have a procedure that will allow:

  • Simple users to verbally describe a problem
  • Advanced users to interactively process / delete / modify / insert the correct references between 2 legal texts.

Bug in numbering of articles

In cases where more than one articles are being inserted into another Law, the system
retains by the numbering of that article the first imported, omits the others and considers Article 0 as the last one imported.
e.g. for Law 4512/2018 Article 353 which introduces new Articles 71A, 71B and 71C into another Law concludes as follows : https://3gm.ellak.gr/statute/l_4512_2018/codified
That is, the text of 71C was introduced as C.

UI Refactoring

Change UI as follows:

codify_law.html

Τρέχουσα Μορφή του ν. 1234/4325

  • Σύνδεσμος Για ιστορικό Εκδόσεων
  • Ποιοι νόμοι τον τροποποιούν + hyperlinks
  • (Optionally) Ποιους νόμους τροποποιεί.
  • Ετικέτες και παρόμοια θεματολογία

history.html
Accordion elements

Ο ν. 4009/2011 όπως ισχύει σήμερα

Ιστορικό

Ευρετήριο

  • ν. 4485/2017
  • ν. 4405/2016
  • ν. 4310/2014
  • ...
  • Αρχική μορφή του ν. 4009/2011

ν. 4485/2017

... (κείμενο των αλλαγών)
(Δεσμός:) [Εμφάνιση του ν. 4009/2011 μέχρι και τις αλλαγές του ν. 4485/2017]
Απόσπασμα Συνδέσμων + Status εφαρμογής.

ν. 4405/2016

... (κείμενο των αλλαγών)
(Δεσμός:) [Εμφάνιση του ν. 4009/2011 μέχρι και τις αλλαγές του ν. 4405/2016]

ν. 4310/2014

... (κείμενο των αλλαγών)
(Δεσμός:) [Εμφάνιση του ν. 4009/2011 μέχρι και τις αλλαγές του ν. 4310/2014]

Αρχική μορφή του ν. 4009/2011

Broaden legislative acts extraction

The codifier module currently detects, codifies and stores all laws and presidential decrees that are found in the Greek Government Gazette issues.Even though these types of amendments are most important in Greek legislation they do not account for the largest part of GGG issues.

We want to broaden extraction capabilities using regular expressions to include parliamentary regulations (Κανονισμός της Βουλής), treaties (Συνθήκες), Prime Minister, Minister and Deputy Minister decisions, as well as acts of appointment, dismissal and transfer of public officials. An extensive account of types of legislative acts per GGG issue can be found in the [Ethnikon Typografeion website] (http://www.et.gr/index.php/f-e-k/teyxi).

As mentioned the legislative act extraction is currently performed using regular expressions in the entities module. An alternative to this is would be to train a neural network that can detect legislative acts and determines the type although this would be a complex solution depending on the number of issue types.

Since the GGG corpus contains a great number of types of legislative we should prioritize those found in main issue types such as 'Α', 'Β', 'Γ' , 'Δ'.

Broaden Fact Extraction

Description

Broad range of fact extraction functionality, such as:

  1. Monetary amounts, non-monetary amounts, percentages, ratios
  2. Conditional statements and constraints, like "λιγότερο από" or "μετά από"
  3. Dates, recurring dates, and durations
  4. Courts, regulations, and citations

Work here should be done primarily using the re built-in module.
You are as well free to use any tool (e.g. ML or something similar) to make it better

Deliverables

The code should be committed to entities.py module:

  • Monetary amounts, non-monetary amounts, percentages, ratios
  • Conditional statements and constraints, like "λιγότερο από" or "μετά από"
  • Dates, recurring dates, and durations
  • Courts, regulations, and citations

Estimated Time : 2 weeks

CLI Interface for codifier tool

  1. source at sys.argv[1]
  2. target at sys.stdin
  3. output at sys.stdout

Example usage:

codifier.py ammendment-1.txt <initial-version.txt >ammended-version.txt

Optionally --input and --output flags

Extension

<initial-version.txt codifier.py ammendment-1.txt |
codifier.py ammendment-2.txt |
codifier.py ammendment-3.txt >final-version.txt

Help Page

Implement Help Page for explaining tags and functionality.

NER Annotator

Description

Develop an NER annotator module (in the form of a web app with flask) for annotating NER. One excellent (closed source sadly) is prodigy.ai

Deliverables

  • NER annotator module

Estimated Time: 3 weeks

Train and Develop tools for Classification

Description & Deliverables

  • Train Segmentation models for legal concepts such as pages or sections.
  • Pre-trained classifiers for document type and clause type
  • Develop Tools for building new clustering and classification methods
  • API Endpoints for the above

Estimated Time: 2-3 months

Improve word and document embeddings

Description

Do work on document and word embeddings. Refer to the Wiki Page for more information. We are using gensim as a library.

Deliverables

  • Improved model
  • Demonstratable similarity analyzer

Estimated Time: 1 week

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.