GithubHelp home page GithubHelp logo

todd-cook / ml-you-can-use Goto Github PK

View Code? Open in Web Editor NEW
34.0 2.0 6.0 79.74 MB

Practical ML and NLP with examples.

License: Other

Python 2.28% Jupyter Notebook 97.64% Shell 0.06% Dockerfile 0.02%
wikipedia-corpus probablistic-language computer-vision latin nlp frequency-distribution classics keras tensorflow text-classification

ml-you-can-use's Introduction

ML-You-Can-Use

Build Status CircleCI codecov.io

Practical Machine Learning and Natural Language Processing with examples.

Featuring

  • Interesting applications of ML, NLP, and Computer Vision
  • Practical demonstration notebooks
  • Reproducible experiments
  • Illustrated best practices:
    • Code extracted from notebooks for:
      • automatic formatting with Black
      • Type checking via MyPy annotations
      • Linting via Pylint
      • Doctests whenever possible

Setup

Download this repo using git with the submodule command, e.g.:

git pull --recurse-submodules

Submodules are used to pull in some data and external data processing utilities that we'll use for preprocessing some of the data.

Install Python 3

Create Virtual Environment

mkdir p3
 `which python3` -m venv ./p3
 source setPythonHashSeed.sh
 source p3/bin/activate

Install Requirements

pip install -r requirements.txt

For running all notebook examples

pip install -r requirements-dev.txt

Note: some examples will have a conda environment.yaml file that you will want to use.

Installing Test Corpora

Many notebooks use data that needs to be installed, do so by running the install script.

install_corpora.sh

  • installs Python ssl certificates
  • installs CLTK data for Latin and Greek
  • installs NLTK data

Testing

./runUnitTests.sh

Interactivity

juypter notebook

Notebooks

Getting data

Labeling Data

Modeling Language

Detecting Duplicate Documents

Classifying Texts

Detecting Loanwords

Wikipedia Corpus Processing

Quality Embeddings

Computer Vision - Object Detection

Summarizing Texts

Searching and Search Relevance

References and Acknowledgements

ml-you-can-use's People

Contributors

todd-cook avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ml-you-can-use's Issues

Training dataset

Hi @todd-cook!

I'm attempting to reproduce your results using BERT on the Crowdflower Search Results Relevance dataset, but I'm facing some issues. Your notebook says that the training set contains 20,571 labeled samples, but when I download the dataset from Kaggle it contains only 10,159 samples.

Am I missing something?

Why Sum of log probabilities

Why sometimes is the output of sum_log_probabilities positive and sometimes negative?
Log of probabilities are always negative because they are between 0 and 1; also sum of them must be negative, but why in some examples is this value positive?
in this link

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.