GithubHelp home page GithubHelp logo

nlp-tools's Introduction

NLP tools

This repository is used to record the tools we are like to use in Natural Language Processing.

Websites

NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

http://nlpprogress.com/

Papers with code ★

Just as the title says.

https://paperswithcode.com/

Preprocess

NLTK ★

Interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries.

https://www.nltk.org/

Standfor CoreNLP

POS tagger, NER, the parser, the coreference resolution system, sentiment analysis, bootstrapped pattern learning, information extraction and the basic dependencies

https://stanfordnlp.github.io/CoreNLP/

Gensim ★

tf-idf, LSA, LDA, word2vec

https://radimrehurek.com/gensim/

GloVe

word2vec

https://nlp.stanford.edu/projects/glove/

ELMo

contextualized word representation

https://allennlp.org/elmo

Bert

sentence encode

https://bert-as-service.readthedocs.io/en/latest/index.html

sentencepiece

https://github.com/google/sentencepiece

jieba

Chinese text segmentation, POS, NER, dependancy parsing , etc

https://github.com/fxsjy/jieba

pyltp

Chinese text segmentation, POS, NER, dependancy parsing, etc

https://github.com/HIT-SCIR/pyltp

HanLP

Chinese text segmentation, POS, NER, dependancy parsing, etc

https://github.com/hankcs/HanLP

Chinese-Word-Vectors

Chinese word2vec, provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora.

https://github.com/Embedding/Chinese-Word-Vectors

Algorithm

sklearn

machine learning

https://scikit-learn.org/stable/index.html

pulp

Linear Programming

https://pythonhosted.org/PuLP/

scipy

numerical integration, interpolation, optimization, linear algebra, and statistics, etc.

https://www.scipy.org/scipylib/index.html

crf++

implementation of crf

https://taku910.github.io/crfpp/

Library

Transformers/hugging face

SOTA NLP for tf2.0 and PyTorch, including BERT, GPT, XLNet, OpenAI etc.

https://github.com/huggingface/transformers

allennlp

implementations of high quality models for almost any NLP problem

https://allennlp.org/

ignite

a high-level library to help with training neural networks in PyTorch

https://pytorch.org/ignite/index.html

onmt

an open source ecosystem for neural machine translation and neural sequence learning

https://opennmt.net/

torchtext

Generic data loaders, abstractions, and iterators for text

https://github.com/pytorch/text

Googletrans

Google Translate API (unofficial)

elestlesearch

distributed search engine

https://elasticsearch-py.readthedocs.io/en/master/

Chinese analyzer: IK analysis for elasticsearch

Other

scrapy

An open source and collaborative framework for extracting the data you need from websites.

https://scrapy.org/

mongoDB

a general purpose, document-based, distributed database

https://www.mongodb.com/

nlp-tools's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.