tajakuzman Goto Github PK

followers: 10.0 following: 3.0 repos: 20.0 gists: 0.0

Name: Taja Kuzman

Type: User

Company: Jožef Stefan Institute

Bio: PhD student in Computational Linguistics with a MA in Translation (FR, EN&SI). Main interests: large language models, language technologies and resources

Twitter: TajaKuzman

Location: Ljubljana, Slovenia

Taja Kuzman's Projects

achademio

AI assistant, based on the GPT-3.5 model by OpenAI, designed to enhance your proficiency in writing research papers. Allows you to adapt your content to academic standards, transform bullet points into eloquent text, or enhance the quality of your writing through error detection.

agile-automatic-genre-identification-benchmark

A benchmark for evaluating robustness of automatic genre identification models to test their usability for the automatic enrichment of large text collections with genre information.

applying-genre-on-macocu-bilingual

cross-lingual-and-cross-dataset-experiments-with-genre-datasets

crosslingual-genre-bias-analysis

genre-datasets-comparison

ginco-genre-annotation-guidelines

Genre Annotation Guidelines for GINCO corpora

hate-speech-classification

Classification of hate speech and implicitness of hate speech, using Transformer language models (BERT). This repository can be used as an introduction to text classification with BERT-like models.

machinetranslate.org

Open resources and community for machine translation

ner-recognition

An evaluation of various encoder Transformer-based large language models on the named entity recognition task. The models are compared on 6 datasets, manually-annotated with named entitites.

notion_widgets

A set of HTML widgets that could be embedded into Notion.so https://www.notion.so/ pages. For more see https://blog.shorouk.dev/notion-widgets-gallery/

objectivity_prediction_web_app

A ML web app which detect objectivity of the text

parlamint-translation

A pipeline for machine translation (using OPUS-MT models) of parliamentary text collections in 30+ languages (ParlaMint corpora). The pipeline includes parsing TEI XLM and CONLL-u files, linguistic processing with the Stanza pipeline, machine translation and word alignment with the Eflomal tool.

semshift_esslli2023

Hands-on sessions for ESSLLI course "Computational approaches to semantic change detection"

taja-kuzman-home-page

Home page to Taja Kuzman's GitHub repository.

task7

Variety identification

tdm-notebooks

Example notebooks and tutorials from Constellate, the text analysis service from ITHAKA.

text-representations-in-fasttext

Analysing different text representations for genre identification. I parse CONLL-u files and extract various representations of a text (running text, lemmas, part-of-speech), then train a Fasttext model on each to see which representation is the most beneficial for the genre identification task.

topic-classification-fasttext-transformers

Training and evaluating topic classification models (fastText and Transformer-based language models) for topic classification of Slovenian news texts. The repository can be used as a tutorial to learn topic classification.

tajakuzman Goto Github PK

Taja Kuzman's Projects

Recommend Projects

Recommend Topics

Recommend Org

Jobs