integrates spatial and textual data processing tools into a modular software package which features preprocessing, geocoding, disambiguation and visualization

german-nlp

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

german-reddit

Extraction of a German Reddit Corpus

gps-corpus-builder

Automatically exported from code.google.com/p/gps-corpus-builder

htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

jlcl-style

Experiments to modernize the LaTeX class of the JLCL

jparser

A readability parser which can extract title, content, images from html pages

justext

Heuristic based boilerplate removal tool

laclos

LAnguage-CLassified OpenSubtitles

microblog-explorer

Perform crawls of social networks (identi.ca, reddit, friendfeed) to gather internal and external links and identify their language

py3langid

Faster, modernized fork of the language identification tool langid.py

python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

shoten

simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

toponyms

Old prototype for toponym extraction in historical texts written in German

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

trafilatura_gui

tweets-tools

Diverse tools used with Twitter data

adbar Goto Github PK

Hi there! 👋

Links

Activity

Programming experience

Most popular blog posts

Adrien Barbaresi's Projects

Recommend Projects

Recommend Topics

Recommend Org

Jobs