GithubHelp home page GithubHelp logo

mreyesc / lemmatization-lists Goto Github PK

View Code? Open in Web Editor NEW

This project forked from michmech/lemmatization-lists

0.0 2.0 0.0 26.79 MB

Machine-readable lists of lemma-token pairs in 23 languages.

License: Open Data Commons Open Database License v1.0

lemmatization-lists's Introduction

Lemmatization Lists

These are large-coverage, machine-readable lemma/token pairs in several languages which I have collected (legally) from various sources, mostly as part of my work on the Global Glossary project. I use these for query expansion during fulltext searches: if a user searches for the lemma walk, the query is expanded to also search for the tokens walking, walked etc.

These are plain text files (zipped). Each line contains one lemma/token pair separated by a tab character in this sequence: lemma, tab, token. The files are encoded in UTF-8 with Windows-style line breaks.

  • Asturian (ast) (108,792 pairs)
  • Bulgarian (bg) (30,323 pairs)
  • Catalan (ca) (591,534 pairs)
  • Czech (cs) (36,400 pairs)
  • English (en) (41,760 pairs)
  • Estonian (et) (80,536 pairs)
  • French (fr) (224,002 pairs)
  • Galician (gl) (392,856 pairs)
  • German (de) (358,473 pairs)
  • Hungarian (hu) (39,898 pairs)
  • Irish (ga) (415,502 pairs)
  • Manx Gaelic (gv) (67,177 pairs)
  • Italian (it) (341,074 pairs)
  • Persian/Farsi (fa) (6,273 pairs)
  • Polish (pl) (3,296,232 pairs)
  • Portuguese (pt) (850,264 pairs)
  • Romanian (ro) (314,810 pairs)
  • Scottish Gaelic (gd) (51,624 pairs)
  • Slovak (sk) (858,414 pairs)
  • Slovene (sl) (99,063 pairs)
  • Spanish (es) (497,560 pairs)
  • Swedish (sv) (675,137 pairs)
  • Ukrainian (uk) (193,703 pairs)
  • Welsh (cy) (359,224 pairs)

Licence

Sources

lemmatization-lists's People

Contributors

michmech avatar

Watchers

James Cloos avatar Manuel Reyes avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.