GithubHelp home page GithubHelp logo

Comments (5)

SimonBlanke avatar SimonBlanke commented on May 22, 2024 2

I just found the "error". You create the parameters in the search space, but you are not using any of them. The model does not change during the optimization run.

Your objective function should look like this:

def model(opt):
    clf_xgb = xgb.XGBClassifier(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
        learning_rate=opt["learning_rate"],
        objective="binary:logistic",
        # eta=0.4,
        # max_depth=8,
        subsample=0.5,
        base_score=np.mean(y_labels),
        eval_metric="logloss",
        missing=None,
        use_label_encoder=False,
        seed=42,
    )

    scores = cross_val_score(
        clf_xgb, freq_df, y_labels, cv=5
    )  # default is 5, hyperactive example is 3

    return scores.mean()

# Configure the range of hyperparameters we want to test out
search_space = {
    "n_estimators": list(range(500, 5000, 100)),
    "max_depth": list(range(6, 12)),
    "learning_rate": [0.1, 0.3, 0.4, 0.5, 0.7],
}

It is funny how I missed this in my answers above. If you are convinced something is wrong it is sometimes hard to see the obvious.

I will close this issue now, but if you have further questions about this you can ask them here.

from hyperactive.

SimonBlanke avatar SimonBlanke commented on May 22, 2024

Could you provide the entire script? How do i get "jc"? I would like to reproduce this bug. Could you also provide a random_state that shows the bug?

from hyperactive.

SimonBlanke avatar SimonBlanke commented on May 22, 2024

I already have the suspicion that the SimulatedAnnealingOptimizer "sticks" to the edge of the search space. This can happen for those kinds of local optimizers when n_iter is very small.

from hyperactive.

suciokhan avatar suciokhan commented on May 22, 2024

jc is a script of helper functions I have for taking a dictionary of social media posts, their authors, dates, and recipients, pulling out the post texts, cleaning them, tokenizing them, converting into tf-idf, and generating labels for them. I also get the exact same score when I do not use SimulateAnnealing, and instead use the default random optimizer. freq_df is a dataframe of the tf-idf values for each token in the corpus, where each row is a separate document and the header has each token.

from hyperactive.

suciokhan avatar suciokhan commented on May 22, 2024

Apologies, but I'm not sure what you mean by providing a random_state; this is admittedly my first rodeo :)

from hyperactive.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.