View Code? Open in Web Editor
NEW
Legal Taxonomy (https://taxonomy.legal/) Classifier on Reddit /r/legaladvice
legal_advice's Introduction
This program classifies legal issues into a binary value for each National Subject Matter Index (NSMI). (https://nsmi.lsntap.org/browse-v2 )
"Category" means 20 indexes.
"Class" means sub categories under the category.
Each article has a binary value(0 or 1) that indicates if this article is related to a specific legal class.
We ignore unlabeled entries when constructing a model.
Among 100+ classes from NSMI-v2, we extracted 36 classes which has more than 10 positive submissions.
After getting the classifier, we chose 16 classes that has reasonable recall.
First, the program converts each text into an tf-idf vector.
After getting the vectors, we apply sklearn LogisticRegression model with L1 loss.
We predict the model with 10-fold cross-validation.
We apply the classifier to the 900,000 submissions on reddit data.
The classifier produces hard and soft labels.
We get the final prevalence using freq-e (https://github.com/slanglab/freq-e )
legal_advice's People
Contributors
Stargazers