GithubHelp home page GithubHelp logo

nlp-tutorial's Introduction

Tutorial: Natural Language Processing in Python =====

This repo contains material for a workshop on Natural Language Processing with Python.

Audience -----

The target audience of this workshop are students, researchers, developers, hobbyists and anyone interested in knowing more about Natural Language Processing and Text Analytics.

Some very basic knowledge of Python is assumed (e.g. if you have seen some Python script before, you're good to go), but no previous NLP knowledge is required.

Presentations -----

Different versions of this workshop have been delivered at different events:

  • PyCon UK 2016: 3h session (slides presentations/2016-pyconuk-slides.pdf)
  • PyCon Ireland 2016: 1.5h session (slides presentations/2016-pyconie-slides.pdf)
  • PyCon Italy 2017: 3.5h session (slides presentations/2017-pyconitaly-slides.pdf)

Environment Set up -----

The code has been tested with Python 3.4 and 3.5. Support for Python 2.7 is best-effort, if you find an issue please report it.

This paragraph describes how to set up your environment locally.

Step 1 - clone this repo:

git clone https://github.com/bonzanini/nlp-tutorial
cd nlp-tutorial

Step 2 - create and activate a Python virtual environment:

virtualenv nlp-venv
source nlp-venv/bin/activate

Step 2 (alternative) - create a Conda environment:

conda create --name nlp-venv python=3.5
source activate nlp-venv

Step 3 - install libraries:

pip install -r requirements.txt

This will download and install NLTK, scikit-learn and jupyter (plus dependencies).

NLTK requires some data to be installed separately (more details on the NLTK website).

From the command line, you can download the required packages:

python -m nltk.downloader punkt stopwords reuters

Alternatively, from a Python interactive shell:

>>> import nltk
>>> nltk.download()

Then use the GUI to select the requires packages (punkt, stopwords, reuters).

Tip: even if you can use "all" as package name to install all the NLTK data, it's not a great thing to do over a flakey conference wi-fi. This will download approx. 2Gb and if we all do it at the same time we'll kill the conference wi-fi :)

Finally - run Jupyter:

jupyter notebook

In order to test that your environment is correctly set. Please open the notebook "00 Environment Test" and follow the instructions.

matplotlib backend issues -----

There might be a few issues related to matplotlib backends as described on their documentation, especially on macOS.

By editing/creating the file ~/.matplotlib/matplotlibrc with the following line:

backend: TkAgg

the issue should be fixed. If not, please refer to the matplotlib docs

Authors -----

Main authors:

License -----

Code (mainly in notebooks folder) under MIT license.

Documentation and slides under CC-BY license.

Data

nlp-tutorial's People

Contributors

bonzanini avatar jelford avatar miguelmalvarez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.