dreamproit / bill-similarity Goto Github PK
View Code? Open in Web Editor NEWCalculate similarity of bill documents using a variety of NLP approaches
Calculate similarity of bill documents using a variety of NLP approaches
This issue is to estimate the task to:
Related to dreamproit/BillMap#13
This is the equivalent of the pipeline that is already working to populate the database for BillMap, described here:
https://github.com/dreamproit/bill-similarity/blob/investigate_simhashes/docs/SQL_APPROACH.adoc#current-data-pipeline-and-storage
In repository provided on start there were several files in xml
format such as samples/congress/116/uslm/BILLS-116hconres9enr.xml
etc.
And there were a pasing script to get sections
from each bill.
But looks like that this script doesn't work with the other bills from the set we download via congress tool.
So the main question is:
How (where) can I get more, preferrably the whole set of bills that i can split to sections for further work?
May be (that's just my suggestion) we should transform the parsing script so it would parse that set?
Or there is some step of transformation that i still haven't found yet, isn't it?
Anyway the main point is to get more bills to get get more sections from them.
@dmytro-ustynov pls pay attention to this section when you file an issue:
While you are creating an issue you can link it to the GitHub Project.
And Then
When issue's already created you can add it to the sprint, estimate it and change its status.
This is a reminder for me (TODO)
Just a start issue for project onboarding.
some issues to explain and investigate:
Related issues that are also helpfull to figure out in context of searching similarities.
The current simhash README says how to set up the system (good); and run all bills (good). What is missing is to know how to
Also, we should combine this information with details of how the simhash + SQL query (by @alexbojko) works and what the differences are from pure simhash.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.