GithubHelp home page GithubHelp logo

quiltdata / quilt-sagemaker-demo Goto Github PK

View Code? Open in Web Editor NEW
36.0 8.0 35.0 2.85 MB

Example custom model image trainable and distributable via AWS SageMaker

Dockerfile 0.11% Python 0.43% Jupyter Notebook 99.37% Shell 0.10%

quilt-sagemaker-demo's Introduction

quilt-sagemaker-demo

The code and environment samples in this repository comprise an MVP for running custom machine learning training and deployment on Amazon SageMaker.

Amazon SageMaker is a machine learning SDK that allows you to build and deploy machine learning algorithms on Amazon AWS. The SageMaker documentation focuses on building and deploying models using pre-trained Docker images. However it is also possible (and extremely useful!) to build and deploy your own custom Docker images.

This repository provides a simple recipe for doing so. To get started, read the article (link forthcoming).

quilt-sagemaker-demo's People

Contributors

residentmario avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

quilt-sagemaker-demo's Issues

What is iam:GetRole?

If you are running locally, make sure that the account you are running this notebook under has all of the necessary permissions: S3ReadOnlyAccess, SagemakerFullAccess, iam:GetRole, and ECRFullAccess

What is "iam:GetRole" and how do you add it to a role?

Kernel not found.

Hi ! Thank you for the docker file!
I am facing an error while running this, when I guess:
jupyter nbconvert --execute --ExecutePreprocessor.timeout=-1 --to notebook --inplace build.ipynb
is getting xecuted, I am getting this error:

[NbConvertApp] Converting notebook build.ipynb to notebook Traceback (most recent call last): File "/usr/local/bin/jupyter-nbconvert", line 10, in <module> sys.exit(main()) File "/usr/local/lib/python3.5/site-packages/jupyter_core/application.py", line 267, in launch_instance return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs) File "/usr/local/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance app.start() File "/usr/local/lib/python3.5/site-packages/nbconvert/nbconvertapp.py", line 338, in start self.convert_notebooks() File "/usr/local/lib/python3.5/site-packages/nbconvert/nbconvertapp.py", line 508, in convert_notebooks self.convert_single_notebook(notebook_filename) File "/usr/local/lib/python3.5/site-packages/nbconvert/nbconvertapp.py", line 479, in convert_single_notebook output, resources = self.export_single_notebook(notebook_filename, resources, input_buffer=input_buffer) File "/usr/local/lib/python3.5/site-packages/nbconvert/nbconvertapp.py", line 408, in export_single_notebook output, resources = self.exporter.from_filename(notebook_filename, resources=resources) File "/usr/local/lib/python3.5/site-packages/nbconvert/exporters/exporter.py", line 179, in from_filename return self.from_file(f, resources=resources, **kw) File "/usr/local/lib/python3.5/site-packages/nbconvert/exporters/exporter.py", line 197, in from_file return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw) File "/usr/local/lib/python3.5/site-packages/nbconvert/exporters/notebook.py", line 32, in from_notebook_node nb_copy, resources = super(NotebookExporter, self).from_notebook_node(nb, resources, **kw) File "/usr/local/lib/python3.5/site-packages/nbconvert/exporters/exporter.py", line 139, in from_notebook_node nb_copy, resources = self._preprocess(nb_copy, resources) File "/usr/local/lib/python3.5/site-packages/nbconvert/exporters/exporter.py", line 316, in _preprocess nbc, resc = preprocessor(nbc, resc) File "/usr/local/lib/python3.5/site-packages/nbconvert/preprocessors/base.py", line 47, in __call__ return self.preprocess(nb, resources) File "/usr/local/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 379, in preprocess with self.setup_preprocessor(nb, resources, km=km): File "/usr/local/lib/python3.5/contextlib.py", line 59, in __enter__ return next(self.gen) File "/usr/local/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 324, in setup_preprocessor self.km, self.kc = self.start_new_kernel(cwd=path) File "/usr/local/lib/python3.5/site-packages/nbconvert/preprocessors/execute.py", line 271, in start_new_kernel km.start_kernel(extra_arguments=self.extra_arguments, **kwargs) File "/usr/local/lib/python3.5/site-packages/jupyter_client/manager.py", line 246, in start_kernel kernel_cmd = self.format_kernel_cmd(extra_arguments=extra_arguments) File "/usr/local/lib/python3.5/site-packages/jupyter_client/manager.py", line 170, in format_kernel_cmd cmd = self.kernel_spec.argv + extra_arguments File "/usr/local/lib/python3.5/site-packages/jupyter_client/manager.py", line 82, in kernel_spec self._kernel_spec = self.kernel_spec_manager.get_kernel_spec(self.kernel_name) File "/usr/local/lib/python3.5/site-packages/jupyter_client/kernelspec.py", line 236, in get_kernel_spec raise NoSuchKernel(kernel_name) jupyter_client.kernelspec.NoSuchKernel: No such kernel named conda_python3

Could you help me with what I am doing wrong?

UnexpectedStatusException: Error for Training job

[NbConvertApp] Executing notebook with kernel: python3
2020-02-12 13:39:23.305764: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-02-12 13:39:23.305848: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2020-02-12 13:39:23.305883: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-10-0-107-85.ec2.internal): /proc/driver/nvidia/version does not exist
2020-02-12 13:39:23.306646: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-12 13:39:23.322750: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300040000 Hz
2020-02-12 13:39:23.323857: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a100eba270 executing computations on platform Host. Devices:
2020-02-12 13:39:23.323885: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
[NbConvertApp] ERROR | Error while converting 'build.ipynb'
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/nbconvert/nbconvertapp.py", line 410, in export_single_notebook
output, resources = self.exporter.from_filename(notebook_filename, resources=resources)
File "/usr/local/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 179, in from_filename
return self.from_file(f, resources=resources, **kw)
File "/usr/local/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 197, in from_file
return self.from_notebook_node(nbformat.read(file_stream, as_version=4), resources=resources, **kw)
File "/usr/local/lib/python3.6/site-packages/nbconvert/exporters/notebook.py", line 32, in from_notebook_node
nb_copy, resources = super(NotebookExporter, self).from_notebook_node(nb, resources, **kw)
File "/usr/local/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 139, in from_notebook_node
nb_copy, resources = self._preprocess(nb_copy, resources)
File "/usr/local/lib/python3.6/site-packages/nbconvert/exporters/exporter.py", line 316, in _preprocess
nbc, resc = preprocessor(nbc, resc)
File "/usr/local/lib/python3.6/site-packages/nbconvert/preprocessors/base.py", line 47, in call
return self.preprocess(nb, resources)
File "/usr/local/lib/python3.6/site-packages/nbconvert/preprocessors/execute.py", line 405, in preprocess
nb, resources = super(ExecutePreprocessor, self).preprocess(nb, resources)
File "/usr/local/lib/python3.6/site-packages/nbconvert/preprocessors/base.py", line 69, in preprocess
nb.cells[index], resources = self.preprocess_cell(cell, resources, index)
File "/usr/local/lib/python3.6/site-packages/nbconvert/preprocessors/execute.py", line 448, in preprocess_cell
raise CellExecutionError.from_cell_and_msg(cell, out)
nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:

epochs=3
history = model.fit_generator(train_generator,
epochs=epochs,
validation_data=validation_generator,
validation_steps=total_validate//batch_size,
steps_per_epoch=total_train//batch_size,
callbacks=callbacks)

#033[0;31m---------------------------------------------------------------------------#033[0m

"No Such Key" error when running the t4.Package.install command

Hi,

I am trying to follow along your instructions from the medium post. I get to this point:

    import t4
    t4.Package.install("quilt/fashion_mnist", registry="s3://quilt-example", dest=".")

and then I get the following error:

NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

I tried several different s3 buckets, but I get the same exact error. Could you please help?
Thanks so much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.