GithubHelp home page GithubHelp logo

Comments (10)

owen-t avatar owen-t commented on May 18, 2024

Hi,

The evaluation steps may have finished before you were able to see any updates in TensorBoard. Have you tried running wit more evaluation steps? Do you see updates during training?

from sagemaker-python-sdk.

winstonaws avatar winstonaws commented on May 18, 2024

Hi,

We pushed a new image which uses a different parameter to control the frequency of evaluations. You can specify it as follows:

hyperparameters={'throttle_secs': 30}

Where throttle_secs is the minimum amount of elapsed time between evaluations. By default this value is 600, so it'll only update once every 10 minutes.

We'll update our example notebooks to document this.

from sagemaker-python-sdk.

samuelhkahn avatar samuelhkahn commented on May 18, 2024

Thanks for the response @winstonaws! Appreciate you guys being so active in helping out the new SageMaker community!

from sagemaker-python-sdk.

chang2394 avatar chang2394 commented on May 18, 2024

@winstonaws : I have tried setting the value of throttle_secs using hyperparameters as mentioned above, but it is not getting reflected while running the job.
Please let me know if i am doing something wrong.

hyperparams = {
    'learning_rate': 0.001,
    'dropout_rate' : 0.2,
    'save_checkpoints_steps' : 100,
    'save_checkpoints_secs': None,
    'keep_checkpoint_max': None,
    'min_eval_frequency': 100,
    'throttle_secs': 10,
    'eval_throttle_secs': 10
}

estimator = TensorFlow(
    entry_point='xxx.py',
    source_dir='xxx',
    role=role,
    output_path=model_artifacts_location,
    code_location=custom_code_upload_location,
    hyperparameters=hyperparams, 
    train_instance_count=1,
    train_instance_type=train_instance,
    training_steps=10000,
    evaluation_steps=100,
    base_job_name=job_name)

from sagemaker-python-sdk.

samuelhkahn avatar samuelhkahn commented on May 18, 2024

I am not seeing tensorboard working with the above suggested changes as well. I thought it may have fixed it, but it doesn't look like it.

from sagemaker-python-sdk.

chang2394 avatar chang2394 commented on May 18, 2024

@samuelhkahn : were you able to find any workaround for this ?

@winstonaws : please suggest what should be done in order to update the evaluation throttle duration.

from sagemaker-python-sdk.

winstonaws avatar winstonaws commented on May 18, 2024

@samuelhkahn @chang2394 What version of the python SDK are you using? The fixes don't go out automatically to the notebook instances at the moment, unfortunately. The fix needed is in this change: #105 Can you try updating to the latest version and rerunning it? You can do this by running this in your notebook:

!pip install --upgrade sagemaker

If it's still not working correctly, what behavior are you seeing? Does it differ from your experience when trying out https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_resnet_cifar10_with_tensorboard/tensorflow_resnet_cifar10_with_tensorboard.ipynb ? When I run that example I can see the TensorBoard UI as soon as I call fit, and every time the training job evaluates (which happens at the throttle_secs frequency), I see the scalars update.

@chang2394
Can you confirm you are running TensorFlow 1.6? You can do this by viewing the image used by the training job in the SageMaker console, or you can set the version explicitly using the framework_version constructor argument.

from sagemaker-python-sdk.

chang2394 avatar chang2394 commented on May 18, 2024

@winstonaws : I am able to see data in tensorboard UI, not sure what was the cause of the issue. Right now, I am using sagemaker 1.2.4 and tensorflow 1.6.

from sagemaker-python-sdk.

winstonaws avatar winstonaws commented on May 18, 2024

@chang2394 Great! Are you still having any other problems with tensorboard?

from sagemaker-python-sdk.

chang2394 avatar chang2394 commented on May 18, 2024

@winstonaws No, it seems to be working fine as of now. Thanks a lot :)

from sagemaker-python-sdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.