Pedro Ortiz Suarez's Projects
Source code for paper Neural Architectures for Nested NER through Linearization
My bad solutions to Advent of Code-2023
The Alephn Site
Tensorflow implementation of contextualized word representations from bi-directional language models
A notebook with CamemBERT experiments.
The website of CamemBERT
A polite and user-friendly downloader for Common Crawl data
Tools to download and cleanup Common Crawl data
πΈ A simple way to extract data from Common Crawl
A collection of utilities related to CTC
π€ The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools
a Deep Learning Framework for Text
Easily apply transformer models to downstream NLP tasks
An extremely fast entity-fishing client
An extremely simple and naΓ―ve program to deduplicate huge plain text files.
Terminal tool that converts files encoding to UTF-8
HPLT to WET conversion
ISO 639 and IETF Language Code Lookup Tool
A minimal & modern LaTeX template for your (bachelor's | master's | doctoral) thesis
Data and models for lemmatising and POS-tagging modern French (16-18th c.)
A new set of utilities to work with the OSCAR Corpus
Converts OSCAR's jsonl files into parquet
Parquet2text
My personal website
Pedro's Personal Website in German
Pedro's Personal Website in English