GithubHelp home page GithubHelp logo

lxmls / lxmls-toolkit Goto Github PK

View Code? Open in Web Editor NEW
214.0 72.0 207.0 27.28 MB

Machine Learning applied to Natural Language Processing Toolkit used in the Lisbon Machine Learning Summer School

License: Other

Python 23.03% Perl 1.00% Jupyter Notebook 75.93% Shell 0.05%

lxmls-toolkit's Introduction

Travis-CI Build Status Requirements Status

LxMLS 2023

Machine learning toolkit for natural language processing. Written for Lisbon Machine Learning Summer School (lxmls.it.pt). This covers

  • Scientific Python and Mathematical background
  • Linear Classifiers
  • Sequence Models
  • Structured Prediction
  • Syntax and Parsing
  • Feed-forward models in deep learning
  • Sequence models in deep learning
  • Reinforcement Learning

Machine learning toolkit for natural language processing. Written for LxMLS - Lisbon Machine Learning Summer School

Instructions for Students

Install with Anaconda or pip

If you are new to Python, the simplest method is to use Anacondato handle your packages, just go to

https://www.anaconda.com/download/

and follow the instructions. We strongly recommend using at least Python 3.

If you prefer pip to Anaconda you can install the toolkit in a way that does not interfere with your existing installation. For this you can use a virtual environment as follows

virtualenv venv
source venv/bin/activate (on Windows: .\venv\Scripts\activate)
pip install pip setuptools --upgrade
pip install --editable . 

This will install the toolkit in a way that is modifiable. If you want to also virtualize you Python version (e.g. you are stuck with Python2 on your system), have a look at pyenv.

Bear in mind that the main purpose of the toolkit is educative. You may resort to other toolboxes if you are looking for efficient implementations of the algorithms described.

Running

  • Run from the project root directory. If an importing error occurs, try first adding the current path to the PYTHONPATH environment variable, e.g.:
    • export PYTHONPATH=.

Development

To run the all tests install tox and pytest

pip install tox pytest

and run

tox

Note, to combine the coverage data from all the tox environments run:

  • Windows
    set PYTEST_ADDOPTS=--cov-append
    tox
    
  • Other
    PYTEST_ADDOPTS=--cov-append tox
    

lxmls-toolkit's People

Contributors

andre-martins avatar antoniogois avatar askinkaty avatar christopherbrix avatar davidbp avatar dcferreira avatar e-bug avatar filippoc avatar gonmelo avatar gracaninja avatar hershaw avatar ibenes avatar israfelsr avatar joaolages avatar kelina avatar kepler avatar luispedro avatar madrugado avatar marianaalmeida avatar miguelbalmeida avatar pedrobalage avatar pschydlo avatar q0o0p avatar ramon-astudillo avatar robertodessi avatar tnunes avatar venelink avatar zmarinho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lxmls-toolkit's Issues

Change to Python3

Aside of the obvious reasons, Jupyter notebook in anaconda is giving more and more problems

emission scores outside the logsum in the backward algorithm ?

I find the run_forward and run_backward algorithms a bit messi since the pseudocode provided uses
probabilities and the code uses scores.

Shouldn't the emission scores be outside the logsum ?

In the sequence_classification_decoder.py function we have, in run_backward:

backward[pos, current_state] = logsum( backward[pos+1, :]
+ transition_scores[pos, :, current_state]
+ emission_scores[pos+1, :])

In the run_forward algorithm we have the emisison_scores OUTSIDE the logsum function

forward[pos, current_state] = logsum(forward[pos-1, :] + transition_scores[pos-1, current_state, :])
forward[pos, current_state] += emission_scores[pos, current_state]

My intuition suggests that the forward algorithm is correct. I have derived the following formula (which we might put in the document to facilitate the comprenhension of the algorithm)

logsumexp

where we have:
logsum( forward + transition_scores ) + emission_scores

Unclear state of `develop` branch

We are piling up some last minute changes in develop. At the same time there have been some direct fixes in student/master.

It is unclear if all changes in develop should be propagated to master for example these files

labs/images_for_notebooks/parsing/eisner_comp_left_2.svg
labs/images_for_notebooks/parsing/eisner_comp_right_2.svg
labs/images_for_notebooks/parsing/eisner_inc_left.svg
labs/images_for_notebooks/parsing/eisner_inc_left_2.svg
labs/images_for_notebooks/parsing/eisner_inc_right.svg
labs/images_for_notebooks/parsing/eisner_inc_right_2.svg
labs/images_for_notebooks/parsing/eisner_init.svg
labs/images_for_notebooks/parsing/eisner_pseudocode.png

@dcferreira are these needed?. I see no reference to them.

Mira algorithm step size

Mira with regularizer=1 gives no update to the parameters for the Amazon dataset.. It looked better when we've changed the floor division '//' to '/' in the stepsize formula. Was there probably a thought behind using the floor division?

Feature: Unit testing derived from notebooks

Current unit-tests need a number of changes:

  • Since we now have notebooks a big advantage would be to create them from these notebooks similarly to labs/convert_notebooks.sh does for scripts

  • This will also solve partially the upcoming

#71

since test will have content-driven names and not day order (e.g. day1)

solve.py is not compatible with Python3

solve.py script in the branch student is not compatible with Python3 in several aspects:

  • raising exceptions
  • calling print function
  • urllib2.urlopen is replaced with urllib.request.urlopen

On the other hand, in the lab guide the students are expected to use Python3, that's why I propose to fix solve.py and make it Python3-compatible. I've already fixed the above issues locally but I don't feel confident enough to make a pull request.

Thanks!

NOTEBOOK: Basic Tutorials

Write exercise text and code for the Basic Tutorials day into a notebook under

labs/notebooks/basic_tutorials/

use

labs/notebooks/non_linear_classifiers/

as reference

Windows users have codec problems when invoking pdb

At least one Windows user reported having problems when adding the line "import pdb; pdb.set_trace()" to a file on Day 1. This would yield a codecs related error message.

Changing that to "import ipdb; ipdb.set_trace()" fixed the issue. This requires installing ipdb beforehand with the command "pip install ipdb", run from the command line.

We should test this on a Windows machine and, if confirmed, change the guide to add this remark for Windows users.

Update Student Branch

After this years updates are finished, we will need to create the student branch.

  • Update README.md to tell students to install symbolically the toolkit ala python setup.py develop, see #73 (comment)
  • Check code for the exercises is removed and actualized
  • Fix solve.py to use long names following #71 instead of e.g. day1 and change code for the updates

(Day 5) No en_perline001.txt in the student branch.

Day 5 guide asks students to run

python wordcount.py en_perline001.txt > results.txt

but there's no file inside the big_data folder. Also, the guide says to navigate to the "wordcount" folder while the script is actually inside the "big_data" folder but this can be easily fixed in the guide.

Exercise 2.7 in notebooks is not clear

In Exercise 2.7 the students are supposed to add smoothing by changing the argument to the train_supervised. But the way the notebook is phrased makes it seem like students are supposed to implement train_supervised, or add smoothing to the implementation.

MrJob word count example doesn't work on windows

For some reason, MrJob doesn't run the reducer step correcly on windows. This has been verified in several windows laptops during the labs. A quick fix is adding a combiner function implementation to the MRJob subclass, doing the same thing as the reducer:

def combiner(self, word, counts):
    yield (word, sum(counts))

I'm not sure why this works. Nevertheless, it has been verified in both mrjob v0.4.4 and v0.5.0-dev, on python Anaconda 2.7.9, 64bit.

NOTEBOOK: Parsing

Write exercise text and code for the Parsing day into a notebook under

labs/notebooks/parsing/

use

labs/notebooks/non_linear_classifiers/

as reference

Encoding needs to be specified everywhere a file is read

The encoding used by open() in python is platform dependent, so we should specify it everywhere.

@pedrobalage reported this in:

conll_file = open(path.join(base_deppars_dir, language + "_train.conll"))

conll_file = open(path.join(base_deppars_dir, language + "_train.conll"))

But I guess this happens in more places.

Find the randomness in non-linear classifiers

The accuracy for the exercises of non-linear classifiers (numpy and pytorch) are given results in a range of +/-2.
For this reason, the unit tests for these days are with a high tolerance factor (2).

You should check why the results are not the same for different executions of these classifiers. After finding the problem (possibly a random initialization of some function), fix it (possible by defining a seed) in order to allow better unit tests.

You may change the unit tests to guess the correct accuracy results with a lower tolerance (1e-2).

Small problems with python basics notebook

Hi,

I have noticed few small problems with python basics notebook:

0.2

There is a difference between the code in the lab guide and the code in the notebook. The "a +=1" line should NOT be indented, according to the guide, and should thus lead to infinite cycle. In the notebook the code is the same as the example right above it.

0.3 Exceptions

More of a usage case comment, than actual error - in order to actually get ValueError, I need to specifically insert something like "a" (the string with the quotes). If I just insert the character a (without quotes), the notebook interprets it as a variable and I get NameError for undefined variable rather than the value error and the exception.

0.6 and 0.7

The code for 0.6 and 0.7 is double (there are two 0.6 sections, the first one having the code for both 0.6 and 0.7).

Feature: Non Ambiguous names for days

Using day0, day1, etc is ambiguous and error prone since we change the order often. Also it is difficult to relate exercises with tests. I propose the following:

  • Use clearer naming scheme for each chapter e.g. linear_classifiers.
  • Keep the name in references in the guide
  • Use clearer naming scheme for each exercise e.g. backpropagation_numpy
  • Keep the name in references in the guide

Data reader fails when installing with pip

What happens:

When installing the guide the absolute paths are different and the lxmls.readers do not find the data.

How to solve this:

Dependencies of local files inside code should be avoided. One pythonic solution would be to force the user to give the data path when instantiating the data.

Way to reproduce:

virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
pip install .    # As opposed to python setup.py develop

the run the code

import lxmls.readers.sentiment_reader as srs
corpus = srs.SentimentCorpus("books")

note that python setup.py develop will work.

Fix build fail due to deprecated code in parsing day

Build fails in travis-ci, see

https://travis-ci.org/LxMLS/lxmls-toolkit/jobs/509740861

PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra

We should change numpy.matrix to numpy arrays. If this is not done we will have to disable the test for the parsing day (which is left out this year)

Less than 2000 documents read in day 1

Some students, instead of reading 2000 total documents and 1600 training documents, reported seeing the following numbers:
840
672

At least two operating systems had this problem:
Win 7 64-bit
Ubuntu 14.04 64-bit

MIRA and SVM are implemented in student branch

According to the guide, the student should implement MIRA and SVM (Exercises 1.3 and 1.5, respectively) but these algorithms are already implemented in the student branch. In fact, they seem to be the same implementations as in the master branch since "git diff master student" doesn't output anything in both mira.py and svm.py.

[Hugo] lxmls.readers.sentiment_reader - encoding problem

Em anexo o ficheiro lxmls/readers/sentiment_reader.py corrigido.
No "windoze" o python usa um encoding diferente do mac e linux (utf-8) e não consegue ler correctamente os ficheiros do sentiment/books.
A solução é na chamada ao "open" indicar qual é o encoding pretendido.

NOTEBOOK: Structured Predictors

Write exercise text and code for the Structured Predictors day into a notebook under

labs/notebooks/structured_predictors/

use

labs/notebooks/non_linear_classifiers/

as reference

small error on the perceptron code

Small "bug" in the perceptron and the Mira algorithms.

The permutation of the data is done at each epoch according to the documentation. Nevertheless the code performs a permutation to the data that reamins the same across all epochs.

The code is currently:

    # # Randomize the examples
    perm = np.random.permutation(nr_x)
    for epoch_nr in xrange(self.nr_epochs):
        for nr in xrange(nr_x):
            ...

but it should be

    for epoch_nr in xrange(self.nr_epochs):
        # # Randomize the examples
        perm = np.random.permutation(nr_x)
        for nr in xrange(nr_x):
            ...

The documentation:

screen shot 2015-05-29 at 21 13 07

Same error of permutation in MIRA (as it was in the perceptron)

I have changed this in master and student branches.
I'll do a pull request. Waiting for the guide itself to be updated. Instead of "Implement the MIRA algorithm" the titlte should be "use the MIRA implementation and do parts 1 to 4 as in the previous exercise".

toolkit compiling error

Hi all..

I recently tried to compile dempendecies for my LxMLS toolkit. I had the following problems. Thank you in advance.

~$ git clone https://github.com/LxMLS/lxmls-toolkit.git
Cloning into 'lxmls-toolkit'...
remote: Counting objects: 1800, done.
remote: Compressing objects: 100% (16/16), done.
remote: Total 1800 (delta 6), reused 0 (delta 0), pack-reused 1784
Receiving objects: 100% (1800/1800), 22.31 MiB | 57.00 KiB/s, done.
Resolving deltas: 100% (1091/1091), done.
Checking connectivity... done.
~$ cd lxmls-toolkit
~/lxmls-toolkit$ pip install -r pip-requirements.txt
Downloading/unpacking configparser==3.2.0r3 (from -r pip-requirements.txt (line 1))
  Downloading configparser-3.2.0r3.tar.gz
  Running setup.py (path:/tmp/pip_build_iarroyof/configparser/setup.py) egg_info for package configparser

Downloading/unpacking pyyaml (from -r pip-requirements.txt (line 2))
  Downloading PyYAML-3.11.tar.gz (248kB): 248kB downloaded
  Running setup.py (path:/tmp/pip_build_iarroyof/pyyaml/setup.py) egg_info for package pyyaml

Downloading/unpacking nltk (from -r pip-requirements.txt (line 3))
  Downloading nltk-3.0.3.tar.gz (1.0MB): 1.0MB downloaded
  Running setup.py (path:/tmp/pip_build_iarroyof/nltk/setup.py) egg_info for package nltk

    warning: no files found matching 'Makefile' under directory '*.txt'
    warning: no previously-included files matching '*~' found anywhere in distribution
Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/lib/python2.7/dist-packages (from -r pip-requirements.txt (line 4))
Requirement already satisfied (use --upgrade to upgrade): scipy in /usr/lib/python2.7/dist-packages (from -r pip-requirements.txt (line 5))
Requirement already satisfied (use --upgrade to upgrade): matplotlib in /usr/lib/pymodules/python2.7 (from -r pip-requirements.txt (line 6))
Downloading/unpacking mrjob (from -r pip-requirements.txt (line 7))
  Downloading mrjob-0.4.4.tar.gz (186kB): 186kB downloaded
  Running setup.py (path:/tmp/pip_build_iarroyof/mrjob/setup.py) egg_info for package mrjob

    no previously-included directories found matching 'docs'
    warning: no files found matching '*.sh' under directory 'bootstrap'
Downloading/unpacking theano (from -r pip-requirements.txt (line 8))
  Downloading Theano-0.7.0.tar.gz (2.0MB): 2.0MB downloaded
  Running setup.py (path:/tmp/pip_build_iarroyof/theano/setup.py) egg_info for package theano

    warning: manifest_maker: MANIFEST.in, line 8: 'recursive-include' expects <dir> <pattern1> <pattern2> ...

Downloading/unpacking ordereddict (from configparser==3.2.0r3->-r pip-requirements.txt (line 1))
  Downloading ordereddict-1.1.tar.gz
  Running setup.py (path:/tmp/pip_build_iarroyof/ordereddict/setup.py) egg_info for package ordereddict

Downloading/unpacking unittest2 (from configparser==3.2.0r3->-r pip-requirements.txt (line 1))
  Downloading unittest2-1.1.0-py2.py3-none-any.whl (96kB): 96kB downloaded
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /usr/lib/python2.7/dist-packages (from matplotlib->-r pip-requirements.txt (line 6))
Requirement already satisfied (use --upgrade to upgrade): tornado in /usr/lib/python2.7/dist-packages (from matplotlib->-r pip-requirements.txt (line 6))
Requirement already satisfied (use --upgrade to upgrade): pyparsing>=1.5.6 in /usr/lib/python2.7/dist-packages (from matplotlib->-r pip-requirements.txt (line 6))
Requirement already satisfied (use --upgrade to upgrade): nose in /usr/lib/python2.7/dist-packages (from matplotlib->-r pip-requirements.txt (line 6))
Downloading/unpacking boto>=2.2.0 (from mrjob->-r pip-requirements.txt (line 7))
  Downloading boto-2.38.0-py2.py3-none-any.whl (1.3MB): 1.3MB downloaded
Downloading/unpacking filechunkio (from mrjob->-r pip-requirements.txt (line 7))
  Downloading filechunkio-1.6.tar.gz
  Running setup.py (path:/tmp/pip_build_iarroyof/filechunkio/setup.py) egg_info for package filechunkio

Requirement already satisfied (use --upgrade to upgrade): simplejson>=2.0.9 in /usr/lib/python2.7/dist-packages (from mrjob->-r pip-requirements.txt (line 7))
Requirement already satisfied (use --upgrade to upgrade): six>=1.4 in /usr/lib/python2.7/dist-packages (from unittest2->configparser==3.2.0r3->-r pip-requirements.txt (line 1))
Downloading/unpacking traceback2 (from unittest2->configparser==3.2.0r3->-r pip-requirements.txt (line 1))
  Downloading traceback2-1.4.0-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): argparse in /usr/lib/python2.7 (from unittest2->configparser==3.2.0r3->-r pip-requirements.txt (line 1))
Downloading/unpacking linecache2 (from traceback2->unittest2->configparser==3.2.0r3->-r pip-requirements.txt (line 1))
  Downloading linecache2-1.0.0-py2.py3-none-any.whl
Installing collected packages: configparser, pyyaml, nltk, mrjob, theano, ordereddict, unittest2, boto, filechunkio, traceback2, linecache2
  Running setup.py install for configparser
    error: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/configparser_helpers.py'
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_iarroyof/configparser/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-PSZTAo-record/install-record.txt --single-version-externally-managed --compile:
    running install

running build

running build_py

creating build

creating build/lib.linux-x86_64-2.7

copying configparser.py -> build/lib.linux-x86_64-2.7

copying configparser_helpers.py -> build/lib.linux-x86_64-2.7

running install_lib

copying build/lib.linux-x86_64-2.7/configparser_helpers.py -> /usr/local/lib/python2.7/dist-packages

error: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/configparser_helpers.py'

----------------------------------------
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_iarroyof/configparser/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-PSZTAo-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_iarroyof/configparser
Storing debug log for failure in /home/iarroyof/.pip/pip.log

After that, I suppose it is needed su privileges, but it didn't work:

~/lxmls-toolkit$ sudo pip install -r pip-requirements.txt
[sudo] password for iarroyof: 
Downloading/unpacking configparser==3.2.0r3 (from -r pip-requirements.txt (line 1))
  Downloading configparser-3.2.0r3.tar.gz
  Running setup.py (path:/tmp/pip_build_root/configparser/setup.py) egg_info for package configparser

Downloading/unpacking pyyaml (from -r pip-requirements.txt (line 2))
  Downloading PyYAML-3.11.tar.gz (248kB): 248kB downloaded
  Running setup.py (path:/tmp/pip_build_root/pyyaml/setup.py) egg_info for package pyyaml

Downloading/unpacking nltk (from -r pip-requirements.txt (line 3))
  Downloading nltk-3.0.3.tar.gz (1.0MB): 1.0MB downloaded
  Running setup.py (path:/tmp/pip_build_root/nltk/setup.py) egg_info for package nltk

    warning: no files found matching 'Makefile' under directory '*.txt'
    warning: no previously-included files matching '*~' found anywhere in distribution
Requirement already satisfied (use --upgrade to upgrade): numpy in /usr/lib/python2.7/dist-packages (from -r pip-requirements.txt (line 4))
Requirement already satisfied (use --upgrade to upgrade): scipy in /usr/lib/python2.7/dist-packages (from -r pip-requirements.txt (line 5))
Requirement already satisfied (use --upgrade to upgrade): matplotlib in /usr/lib/pymodules/python2.7 (from -r pip-requirements.txt (line 6))
Downloading/unpacking mrjob (from -r pip-requirements.txt (line 7))
  Downloading mrjob-0.4.4.tar.gz (186kB): 186kB downloaded
  Running setup.py (path:/tmp/pip_build_root/mrjob/setup.py) egg_info for package mrjob

    no previously-included directories found matching 'docs'
    warning: no files found matching '*.sh' under directory 'bootstrap'
Downloading/unpacking theano (from -r pip-requirements.txt (line 8))
  Downloading Theano-0.7.0.tar.gz (2.0MB): 1.3MB downloaded
Cleaning up...
Exception:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/usr/lib/python2.7/dist-packages/pip/commands/install.py", line 278, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1197, in prepare_files
    do_download,
  File "/usr/lib/python2.7/dist-packages/pip/req.py", line 1375, in unpack_url
    self.session,
  File "/usr/lib/python2.7/dist-packages/pip/download.py", line 572, in unpack_http_url
    download_hash = _download_url(resp, link, temp_location)
  File "/usr/lib/python2.7/dist-packages/pip/download.py", line 433, in _download_url
    for chunk in resp_read(4096):
  File "/usr/lib/python2.7/dist-packages/pip/download.py", line 421, in resp_read
    chunk_size, decode_content=False):
  File "/usr/lib/python2.7/dist-packages/urllib3/response.py", line 225, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/lib/python2.7/dist-packages/urllib3/response.py", line 174, in read
    data = self._fp.read(amt)
  File "/usr/lib/python2.7/httplib.py", line 567, in read
    s = self.fp.read(amt)
  File "/usr/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
  File "/usr/lib/python2.7/ssl.py", line 341, in recv
    return self.read(buflen)
  File "/usr/lib/python2.7/ssl.py", line 260, in read
    return self._sslobj.read(len)
SSLError: The read operation timed out

Storing debug log for failure in /home/iarroyof/.pip/pip.log
~/lxmls-toolkit$ 

Any help will be very appreciated.

NOTEBOOK: Sequence Models

Write exercise text and code for the Sequence Models day into a notebook under

labs/notebooks/sequence_models/

use

labs/notebooks/non_linear_classifiers/

as reference

Notebooks for every day

As @davidbp has been suggested for a while, we should have all days in jupyter notebooks.

I did this fir the new pytorch deep days see labs/notebooks/non_linear_classifiers/ and labs/notebooks/non_linear_sequence_classifiers/.

There is also a script labs/convert_notebooks.sh to automatically create the *.py version for those who want to work remote.

It seems a good idea to try to derive the unit tests for each day in a similar way, see #72

svm is provided in student branch

(I'm making an issue so we don't forget for next year.)

The lab guide asks to implement SVM primal, but it is provided in the student branch in classifiers/svm.py.

If we remove it, solve.py should download the master version. Right now it does not do that.
Solve also downloads perceptron.py, but that is unnecessary since it is also already provided.

lxmls.classifiers.mira

No 'lxmls.classifiers.mira' aparecem várias referências a 'y[inst:inst+1,0]' que é equivalente a 'y[inst,0]'
Se o 1 é constante era melhor simplificar.
Se queremos variar, era melhor usar uma variável.
Ou adicionar um comentário a explicar quais as variações que seriam possíveis...

NOTEBOOK: Linear Classifiers

Write exercise text and code for the Linear Classifiers day into a notebook under

labs/notebooks/linear_classifiers/

use

labs/notebooks/non_linear_classifiers/

as reference

Ex. 5.5 breaks with theano.config.floatX=float32

Currently Exercise 5.5 (Theano MLP with batch) only works if floatX is set to "float64", even though it is usually float32.

It breaks because the data set is loaded as float64:

train_x = scr.train_X.T
train_y = scr.train_y[:, 0]
test_x = scr.test_X.T
test_y = scr.test_y[:, 0 ]

This fixes it:

train_x = train_x.astype(theano.config.floatX)
train_y = train_y.astype("int32")
test_x = test_x.astype(theano.config.floatX)
test_y = test_y.astype("int32")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.