GithubHelp home page GithubHelp logo

erikaduan / aws_notes Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 868 KB

A repository of AWS command line code snippets and AWS service usage tips for data scientists, platform administrators and product managers.

License: Creative Commons Zero v1.0 Universal

aws data-science

aws_notes's Introduction

AWS usage notes - a comprehensive guide for AWS newbies

This repository contains a series of guides on how to set up and use AWS services required for data analysis and data science. The style conventions are:

  • Code, including AWS command line interface code, is indented in code blocks.
  • AWS console options are styled in bold italic text.
  • Resource names are styled in bold text.

AWS resource Topic Why this is important
๐Ÿค  Identity & Access Management (IAM) Create new AWS user groups, users and access policies IAM is needed to create and manage users and user groups, who often require different access permissions to different AWS resources. Platform governance and security policies tend to be managed via IAM, to ensure that different users have the appropriate level of access to cloud resources for their work requirements.
๐Ÿชฃ S3 bucket Manage S3 bucket permissions Data sets and data objects must be stored in a central location. S3 is the central data storage service in AWS and object storage permissions can be further finetuned using S3 bucket permissions.
๐Ÿ“” Sagemaker Enable Sagemaker IAM roles SageMaker supports data science work by providing a user-friendly integrated development environment (IDE) connected to EC2 instances and docker images for users to program in languages like Python and R. This is where data analysts and data scientists work to clean and analyse data and build statistical or machine learning models. To enable SageMaker functionality, SageMaker service permissions must be managed so that SageMaker can interact with all other required AWS services i.e. S3, Lambda and Glue.
๐Ÿ“” Sagemaker Introduction to SageMaker SageMaker provides users with at least two different ways of accessing a linux virtual environment for data science work; through Jupyter notebook instances or data science docker images via the SageMake Studio IDE. Notebook instances are useful for individual exploratory data science work whereas SageMaker Studio is more useful for production environment ML models requiring MLOps support. It is important to understand these differences to make an informed decision about where to host different types of data science projects.
๐Ÿ“” Sagemaker Manage R and Python environments

Tips on learning to use AWS

  • AWS provides management console (i.e. GUI) and command line options to perform operations. The command line interface, also called CloudShell, can be accessed at the top right panel via the >_ icon.
  • Create AWS services using shell scripts as this is the most reproducible deployment method (there's nothing wrong with clicking a lot of console buttons, it's just a reproducible practice to deploy and document your actions using shell scripts or code templates).
  • AWS resources can also be accessed using the Python software development kit (SDK) boto3. For data transformations, use the awswrangler Python SDK.

Other resources

aws_notes's People

Contributors

erikaduan avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.