Comments (10)
Hi,
The evaluation steps may have finished before you were able to see any updates in TensorBoard. Have you tried running wit more evaluation steps? Do you see updates during training?
from sagemaker-python-sdk.
Hi,
We pushed a new image which uses a different parameter to control the frequency of evaluations. You can specify it as follows:
hyperparameters={'throttle_secs': 30}
Where throttle_secs is the minimum amount of elapsed time between evaluations. By default this value is 600, so it'll only update once every 10 minutes.
We'll update our example notebooks to document this.
from sagemaker-python-sdk.
Thanks for the response @winstonaws! Appreciate you guys being so active in helping out the new SageMaker community!
from sagemaker-python-sdk.
@winstonaws : I have tried setting the value of throttle_secs using hyperparameters as mentioned above, but it is not getting reflected while running the job.
Please let me know if i am doing something wrong.
hyperparams = {
'learning_rate': 0.001,
'dropout_rate' : 0.2,
'save_checkpoints_steps' : 100,
'save_checkpoints_secs': None,
'keep_checkpoint_max': None,
'min_eval_frequency': 100,
'throttle_secs': 10,
'eval_throttle_secs': 10
}
estimator = TensorFlow(
entry_point='xxx.py',
source_dir='xxx',
role=role,
output_path=model_artifacts_location,
code_location=custom_code_upload_location,
hyperparameters=hyperparams,
train_instance_count=1,
train_instance_type=train_instance,
training_steps=10000,
evaluation_steps=100,
base_job_name=job_name)
from sagemaker-python-sdk.
I am not seeing tensorboard working with the above suggested changes as well. I thought it may have fixed it, but it doesn't look like it.
from sagemaker-python-sdk.
@samuelhkahn : were you able to find any workaround for this ?
@winstonaws : please suggest what should be done in order to update the evaluation throttle duration.
from sagemaker-python-sdk.
@samuelhkahn @chang2394 What version of the python SDK are you using? The fixes don't go out automatically to the notebook instances at the moment, unfortunately. The fix needed is in this change: #105 Can you try updating to the latest version and rerunning it? You can do this by running this in your notebook:
!pip install --upgrade sagemaker
If it's still not working correctly, what behavior are you seeing? Does it differ from your experience when trying out https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_resnet_cifar10_with_tensorboard/tensorflow_resnet_cifar10_with_tensorboard.ipynb ? When I run that example I can see the TensorBoard UI as soon as I call fit, and every time the training job evaluates (which happens at the throttle_secs frequency), I see the scalars update.
@chang2394
Can you confirm you are running TensorFlow 1.6? You can do this by viewing the image used by the training job in the SageMaker console, or you can set the version explicitly using the framework_version constructor argument.
from sagemaker-python-sdk.
@winstonaws : I am able to see data in tensorboard UI, not sure what was the cause of the issue. Right now, I am using sagemaker 1.2.4 and tensorflow 1.6.
from sagemaker-python-sdk.
@chang2394 Great! Are you still having any other problems with tensorboard?
from sagemaker-python-sdk.
@winstonaws No, it seems to be working fine as of now. Thanks a lot :)
from sagemaker-python-sdk.
Related Issues (20)
- JumpStart Uncompressed Format Causing AttributeError for JumpStart ID Local Container Mode HOT 1
- [HuggingFace] use default Hugging Face Inference DLC for non TGI models when using `ModelBuilder` HOT 1
- ModelBuilder tag is appending to user agent everytime build() is called
- ModelBuilder support for XGBoost container is impaired due to boto3 dependencies issues
- SageMaker/serve didn't bring the latest code for save handler
- Update fastapi dependency version > 0.100 for pydantic v2 support HOT 2
- Invalid dash-separated options for description-file HOT 1
- Error running a pipeline with a Processing Job using a LocalSession HOT 11
- Not able to deploy Serverless Endpoint with requirements.txt HOT 7
- Support Tuning Step in Local Mode HOT 1
- Incompatible `tblib` version for Python 3.10
- Programmatic way to get available training/inference containers. HOT 1
- Output of function step is not compatible with `sagemaker.clarify.ModelConfig()` HOT 1
- Unknown parameter in PrimaryContainer: "ModelDataSource" with older boto version HOT 1
- Environment variables not passed when deploying Sagemaker Estimator HOT 1
- Tune Step with Script Mode Estimator won't start due to lacking MetricDefinition in Pipeline definition HOT 1
- Support for Hugginface multimodal models HOT 3
- Unable to deploy huggingface-llm 1.3.3 HOT 12
- Encrypting code artifact with SSE-S3 instead of SSE-KMS
- Attribute error when passing kms_key to sklearn_processor.run method
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sagemaker-python-sdk.