Attempted tuning a xgboost binary classifier on tf-idf data adjusting n_estimators, ma

Could you provide the entire ? How do i get "jc"? I would like to reproduce this

No score variance over 50 iterations despite multiple parameters switched about hyperactive HOT 5 CLOSED

suciokhan commented on May 22, 2024

No score variance over 50 iterations despite multiple parameters switched

from hyperactive.

Comments (5)

SimonBlanke commented on May 22, 2024 2

I just found the "error". You create the parameters in the search space, but you are not using any of them. The model does not change during the optimization run.

Your objective function should look like this:

def model(opt):
    clf_xgb = xgb.XGBClassifier(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
        learning_rate=opt["learning_rate"],
        objective="binary:logistic",
        # eta=0.4,
        # max_depth=8,
        subsample=0.5,
        base_score=np.mean(y_labels),
        eval_metric="logloss",
        missing=None,
        use_label_encoder=False,
        seed=42,
    )

    scores = cross_val_score(
        clf_xgb, freq_df, y_labels, cv=5
    )  # default is 5, hyperactive example is 3

    return scores.mean()

# Configure the range of hyperparameters we want to test out
search_space = {
    "n_estimators": list(range(500, 5000, 100)),
    "max_depth": list(range(6, 12)),
    "learning_rate": [0.1, 0.3, 0.4, 0.5, 0.7],
}

It is funny how I missed this in my answers above. If you are convinced something is wrong it is sometimes hard to see the obvious.

I will close this issue now, but if you have further questions about this you can ask them here.

from hyperactive.

SimonBlanke commented on May 22, 2024

Could you provide the entire script? How do i get "jc"? I would like to reproduce this bug. Could you also provide a random_state that shows the bug?

from hyperactive.

SimonBlanke commented on May 22, 2024

I already have the suspicion that the SimulatedAnnealingOptimizer "sticks" to the edge of the search space. This can happen for those kinds of local optimizers when n_iter is very small.

from hyperactive.

suciokhan commented on May 22, 2024

jc is a script of helper functions I have for taking a dictionary of social media posts, their authors, dates, and recipients, pulling out the post texts, cleaning them, tokenizing them, converting into tf-idf, and generating labels for them. I also get the exact same score when I do not use SimulateAnnealing, and instead use the default random optimizer. freq_df is a dataframe of the tf-idf values for each token in the corpus, where each row is a separate document and the header has each token.

from hyperactive.

suciokhan commented on May 22, 2024

Apologies, but I'm not sure what you mean by providing a random_state; this is admittedly my first rodeo :)

from hyperactive.

Recommend Projects

No score variance over 50 iterations despite multiple parameters switched about hyperactive HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs