GithubHelp home page GithubHelp logo

Linguistic Engineering and Text Analysis Department

Welcome to the official GitHub repository of the Linguistic Engineering and Text Analysis Department at NASK (National Research Institute)! 🌐 This repository houses our projects, research papers, and tools related to linguistic engineering, natural language processing, and text analysis. We aim to advance the field of linguistics and language technology through our collaborative efforts. 🚀

About Us

The Linguistic Engineering and Text Analysis Department is dedicated to exploring and harnessing the power of language in various applications. Our team of linguists, data scientists, and software engineers work together to develop innovative solutions for text analysis, information extraction, summarization, text classification, and much more. 📚🧠💻

Projects

  1. Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service

BAN-PL

  1. StyloMetrix s a powerful tool that enables the creation of text representations in the form of StyloMetrix vectors. Each metric in the vector quantifies a specific linguistic feature, allowing for a detailed analysis of the text's style through numeric values. With the ability to customize metrics, StyloMetrix is a versatile solution for tasks such as stylometric analysis, machine learning classifiers, statistical analyses, and linguistic reference. Available for Polish, English, and Ukrainian.

  2. Summarizer is an innovative tool designed for generating concise and informative summaries of text documents. Using advanced natural language processing techniques, Summarizer distills the key points and main ideas from lengthy texts into coherent summaries.

  3. PrivMasker is a tool for anonymizing personal and sensitive data in documents. Depending on the text type and user preferences, an optional selection of masked components is available, including names, contact details (phone numbers, email addresses), physical addresses, dates, identification numbers, and monetary amounts.

Research Papers

Our department actively contributes to the scientific community through research papers published in top-tier conferences and journals. Some of our recent papers include:

  • "Styles with Benefits. The StyloMetrix Vectors for Stylistic and Semantic Text Classification of Small-Scale Datasets and Different Sample Length" - Published at PPRAI 2022. You can find the paper here. 📝🔬🌐
  • "The Grammar and Syntax Based Corpus Analysis Tool For The Ukrainian Language". Access the paper here. 📝🔎🔄
  • "Team Up! Cohesive Text Summarization Scoring Sentence Coalitions" - Published at ICAISC 2020. You can find the paper here. 📝🔬🌐
  • "BAN-PL: a Novel Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service". 2023. arXiv:2308.1059. Access the paper here. 📝
  • "Towards Harmful Erotic Content Detection through Coreference-Driven Contextual Analysis". 2023. arXiv:2310.14325. Access the paper here. 📝
  • "StyloMetrix: An Open-Source Multilingual Tool for Representing Stylometric Vectors". 2023. arXiv:2309.12810. Access the paper here. 📝

Contribution Guidelines

We welcome contributions from the open-source community to enhance our projects and advance the field of linguistic engineering. If you are interested in contributing, please follow our guidelines outlined in the CONTRIBUTING.md file of each project repository. 🙌🔧📝

Contact Us

For any inquiries, collaborations, or questions, feel free to reach out to us:

Email: [email protected] ✉️

Website: https://www.science.nask.pl

ziliat-nask's Projects

ban-pl icon ban-pl

Polish Dataset of Banned Harmful and Offensive Content from Wykop.pl web service

datasets icon datasets

Reference repository for data used in articles

reading-club icon reading-club

A repository for our Reading Club, where we discuss and analyze academic papers on various topics.

summattack icon summattack

SummAttack is an open-source framework designed for conducting adversarial attacks on large language models specifically tailored for the summarization task.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.