- Use Python/BeautifulSoup to set up a lawsuit database here.
- Each court has subcategories.
- Each subcategory contains different result of lawsuit, which each case documented by date and case number.
- Each category and the corresponding content in it shall be recorded.
- Output in CSV or sqlite.
- Use NER to extract name of person/company in each document.
- Based on the summary result, what insights can you get?
- Neglecting the deduplication step, who (person/company) are the most frequently involved in lawsuit?
The notebook file here consists of two parts: webcrawling and data analysis.