This is a simple text classification blog post - quick and easy!

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Well, that error is telling you that your model is a tuple not a <code class="notransl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Ok, glad you figured it out! <a

Text Classification with NLTK and Scikit-Learn about bbengfort.github.io HOT 15 CLOSED

bbengfort commented on July 20, 2024

Text Classification with NLTK and Scikit-Learn

from bbengfort.github.io.

Comments (15)

bbengfort commented on July 20, 2024 1

@drorata that's correct on all counts! lambda x: x is a good example of an identity function and the reason you need to specify this is because your text is already preprocessed, so the identity function is called by Scikit-Learn and passes the tokens right through to be vectorized.

from bbengfort.github.io.

AtomicSpider commented on July 20, 2024

Hi,
First of all, thanks for the code. It's really helpful.

I'm getting an error while training on a custom dataset.

Error:

vectorizer = model.named_steps['vectorizer']
AttributeError: 'tuple' object has no attribute 'named_steps'

Dataset:
X:

['What is the name of this asset?',
 'What is this asset called?',
 'Is the asset healthy?',
 "How is the asset's health?",
 'Show me asset info.',
 'Show me asset information.',
 'Tell me about the asset.',
 'Show me asset history.']

['asset_name',
 'asset_name',
 'asset_health',
 'asset_health',
 'asset_info',
 'asset_info',
 'asset_info',
 'asset_history']

from bbengfort.github.io.

bbengfort commented on July 20, 2024

Well, that error is telling you that your model is a tuple not a Pipeline object - make sure you instantiate a pipeline, something to the effect of:

from sklearn.pipeline import Pipeline 

model = Pipeline([
    ('vectorizer', TfidfVectorizer()), 
    ('classifier', NaiveBayes()), 
])

from bbengfort.github.io.

mjahanshahi commented on July 20, 2024

Hi Ben!

I'm having trouble unpickling a model that I tried with this classifier. I am a python newbie but I suspect I need to pickle the class as well? Can you suggest how I could do that?

from bbengfort.github.io.

bbengfort commented on July 20, 2024

@mjahanshahi just pickle the entire object:

import pickle 

model = Pipeline([
    ('preprocessor', NLTKPreprocessor()),
    ('vectorizer', TfidfVectorizer(
        tokenizer=identity, preprocessor=None, lowercase=False
    )),
    ('classifier', MultinomialNB()),
])

model.fit(docs, labels) 

with open('bayes_model.pkl', 'wb') as f:
    pickle.dump(model, f)

from bbengfort.github.io.

mjahanshahi commented on July 20, 2024

Thanks @bbengfort
I keep getting the following error message when I unpickle using your directions:
AttributeError: Can't get attribute 'NLTKPreprocessor' on <module 'main' >
Does the class need to be pickled somehow?

from bbengfort.github.io.

mjahanshahi commented on July 20, 2024

I just realized what I was doing wrong. I was trying to load the model in a different environment (different python script) so none of the components were being inherited? I was doing this because I wanted to run some side by side tests of different classifiers but I think I know better now.

Thanks for your reply though!

from bbengfort.github.io.

bbengfort commented on July 20, 2024

Ok, glad you figured it out!

…

On Fri, Aug 4, 2017 at 2:03 PM, Maryam ***@***.***> wrote: I just realized what I was doing wrong. I was trying to load the model in a different environment (different python script) so none of the components were being inherited? I was doing this because I wanted to run some side by side tests of different classifiers but I think I know better now. Thanks for your reply though! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAth7tWjAZvBqPZcVpHZn4ELKMD9GHV5ks5sU10MgaJpZM4IiMYG> .

from bbengfort.github.io.

shadowcode92 commented on July 20, 2024

Hi
Sir thank you for this code. I got one error during prediction.
"print(model.named_steps['classifier'].labels_.inverse_transform(yhat))
AttributeError: 'SGDClassifier' object has no attribute 'labels_' "
Could you please help me to find out solution for this.

from bbengfort.github.io.

bbengfort commented on July 20, 2024

When you look at the build_and_evaluate function, there is a line of code:

model, secs = build(classifier, X, y)
model.labels_ = labels

That happens after the model build is complete. This attribute is the LabelEncoder object, which allows for the inverse transform; it seems like this step is missing from your code if there is no attribute labels_.

from bbengfort.github.io.

drorata commented on July 20, 2024

@bbengfort Very nice post! One remark/question. What do you mean by an identity function. Would it be something like lambda x: x? Correct me if I'm wrong, you need this as the documents in the corpus are already preprocessed using NLTKPreprocessor, right?

from bbengfort.github.io.

njanmo commented on July 20, 2024

Hi, thanks for this awesome intro to NLTK, I was wondering if there was a simple way to adapt this to process tweets from the twitter_sample corpus / other corpora? (apologies for the basic question, I am a python newbie)

from bbengfort.github.io.

bbengfort commented on July 20, 2024

Yes, in the step that says from nltk.corpus import movie_reviews as reviews - simply import the corpus you'd like to use; make sure it has a .raw() method -- this returns the string containing the text and you should be good to go.

from bbengfort.github.io.

roomm commented on July 20, 2024

Same error as @shadowcode92 :
"print(model.named_steps['classifier'].labels_.inverse_transform(yhat))
AttributeError: 'SGDClassifier' object has no attribute 'labels_' "

and the line model.labels_ = labels is present.

from bbengfort.github.io.

bbengfort commented on July 20, 2024

@roomm please see my response to @shadowcode92 -- there is a line of code in the build_and_evaluate function that assigns the labels_ attribute to keep track of the LabelEncoder this is just a handy shortcut, you can directly use labels.inverse_transform(yhat) if you'd prefer.

from bbengfort.github.io.

Text Classification with NLTK and Scikit-Learn about bbengfort.github.io HOT 15 CLOSED

Comments (15)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs