GithubHelp home page GithubHelp logo

Comments (15)

bbengfort avatar bbengfort commented on July 20, 2024 1

@drorata that's correct on all counts! lambda x: x is a good example of an identity function and the reason you need to specify this is because your text is already preprocessed, so the identity function is called by Scikit-Learn and passes the tokens right through to be vectorized.

from bbengfort.github.io.

AtomicSpider avatar AtomicSpider commented on July 20, 2024

Hi,
First of all, thanks for the code. It's really helpful.

I'm getting an error while training on a custom dataset.

Error:

vectorizer = model.named_steps['vectorizer']
AttributeError: 'tuple' object has no attribute 'named_steps'

Dataset:
X:

['What is the name of this asset?',
 'What is this asset called?',
 'Is the asset healthy?',
 "How is the asset's health?",
 'Show me asset info.',
 'Show me asset information.',
 'Tell me about the asset.',
 'Show me asset history.']

Y:

['asset_name',
 'asset_name',
 'asset_health',
 'asset_health',
 'asset_info',
 'asset_info',
 'asset_info',
 'asset_history']

from bbengfort.github.io.

bbengfort avatar bbengfort commented on July 20, 2024

Well, that error is telling you that your model is a tuple not a Pipeline object - make sure you instantiate a pipeline, something to the effect of:

from sklearn.pipeline import Pipeline 

model = Pipeline([
    ('vectorizer', TfidfVectorizer()), 
    ('classifier', NaiveBayes()), 
])

from bbengfort.github.io.

mjahanshahi avatar mjahanshahi commented on July 20, 2024

Hi Ben!

I'm having trouble unpickling a model that I tried with this classifier. I am a python newbie but I suspect I need to pickle the class as well? Can you suggest how I could do that?

from bbengfort.github.io.

bbengfort avatar bbengfort commented on July 20, 2024

@mjahanshahi just pickle the entire object:

import pickle 

model = Pipeline([
    ('preprocessor', NLTKPreprocessor()),
    ('vectorizer', TfidfVectorizer(
        tokenizer=identity, preprocessor=None, lowercase=False
    )),
    ('classifier', MultinomialNB()),
])

model.fit(docs, labels) 

with open('bayes_model.pkl', 'wb') as f:
    pickle.dump(model, f)

from bbengfort.github.io.

mjahanshahi avatar mjahanshahi commented on July 20, 2024

Thanks @bbengfort
I keep getting the following error message when I unpickle using your directions:
AttributeError: Can't get attribute 'NLTKPreprocessor' on <module 'main' >
Does the class need to be pickled somehow?

from bbengfort.github.io.

mjahanshahi avatar mjahanshahi commented on July 20, 2024

I just realized what I was doing wrong. I was trying to load the model in a different environment (different python script) so none of the components were being inherited? I was doing this because I wanted to run some side by side tests of different classifiers but I think I know better now.

Thanks for your reply though!

from bbengfort.github.io.

bbengfort avatar bbengfort commented on July 20, 2024

from bbengfort.github.io.

shadowcode92 avatar shadowcode92 commented on July 20, 2024

Hi
Sir thank you for this code. I got one error during prediction.
"print(model.named_steps['classifier'].labels_.inverse_transform(yhat))
AttributeError: 'SGDClassifier' object has no attribute 'labels_' "
Could you please help me to find out solution for this.

from bbengfort.github.io.

bbengfort avatar bbengfort commented on July 20, 2024

When you look at the build_and_evaluate function, there is a line of code:

model, secs = build(classifier, X, y)
model.labels_ = labels

That happens after the model build is complete. This attribute is the LabelEncoder object, which allows for the inverse transform; it seems like this step is missing from your code if there is no attribute labels_.

from bbengfort.github.io.

drorata avatar drorata commented on July 20, 2024

@bbengfort Very nice post! One remark/question. What do you mean by an identity function. Would it be something like lambda x: x? Correct me if I'm wrong, you need this as the documents in the corpus are already preprocessed using NLTKPreprocessor, right?

from bbengfort.github.io.

njanmo avatar njanmo commented on July 20, 2024

Hi, thanks for this awesome intro to NLTK, I was wondering if there was a simple way to adapt this to process tweets from the twitter_sample corpus / other corpora? (apologies for the basic question, I am a python newbie)

from bbengfort.github.io.

bbengfort avatar bbengfort commented on July 20, 2024

Yes, in the step that says from nltk.corpus import movie_reviews as reviews - simply import the corpus you'd like to use; make sure it has a .raw() method -- this returns the string containing the text and you should be good to go.

from bbengfort.github.io.

roomm avatar roomm commented on July 20, 2024

Same error as @shadowcode92 :
"print(model.named_steps['classifier'].labels_.inverse_transform(yhat))
AttributeError: 'SGDClassifier' object has no attribute 'labels_' "

and the line model.labels_ = labels is present.

from bbengfort.github.io.

bbengfort avatar bbengfort commented on July 20, 2024

@roomm please see my response to @shadowcode92 -- there is a line of code in the build_and_evaluate function that assigns the labels_ attribute to keep track of the LabelEncoder this is just a handy shortcut, you can directly use labels.inverse_transform(yhat) if you'd prefer.

from bbengfort.github.io.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.