GithubHelp home page GithubHelp logo

yebuahk / healthcare-fraud-identification-using-pca-anomaly-detection Goto Github PK

View Code? Open in Web Editor NEW

This project forked from himasagaratluri/healthcare-fraud-identification-using-pca-anomaly-detection

0.0 1.0 0.0 12.74 MB

License: MIT License

Jupyter Notebook 100.00%

healthcare-fraud-identification-using-pca-anomaly-detection's Introduction

Introduction

IDC study states that 40% of Enteprises in year 2019 will be working to include AI/ML as a part of their transformative strategy. Today, AI/ML is beyond the hype cycle and there are usecases that are providing real business value. Customer looking to start their AI/ML journey understands that AI/ML is hard and are looking to partner with Cloud Providers for support. They are making a choice not based on very specific capabilities or solutions. Customers understands AI/ML projects are explorative and requires multiple iterations to get it right and requires broader capabilities from partners. 10,000+ customers are using AWS today for AI/ML services because they understand that AWS provides deepest and broadest set of services for AI/ML workloads.

In today's workshop, we discuss the capabilities of Amazon SageMaker a machine leanring platform for Developers and Data Scientists. We will define a problem statement and will solve it by apply Machine Learning using Amazon SageMaker. Amazon SageMaker takes undifferentiated heavy lifting involved in Machine Learning process and allows developers and data scientists to focus on solving business problem by following a build, train and deploy pattern.

Let's work on a healthcare fraud identification usecase and apply machine learning to identify anomalous claims that rrequire further investigation. You may find the concepts used in the workshop bit mathemaical. But, I would request you to develop the intution to understand the potential applications of techniques rather than focussing on maths involved. The technique used in the workshop is broadly applicable to multiple problems related to outlier detection on multi-variate data.

Learning Objectives

  1. Develop intution for steps involved in the Machine Learning Process
  2. Understand and Implement end to end machine learning on Amazon SageMaker to Build, Train and Deploy a model.
  3. Clone a public gitrepo automatically in Amazon Sagemaker Notebook during the launch.
  4. Perform feature engineering on categorical data using Word Embeddings with CBOW-Bag of Words-technique
  5. Train and use PCA algorithm for feature extraction
  6. Understand how to calculate anomaly score from principal components of PCA model.
  7. Perform visualization to understand anomalous claims.

Machine Learning process

Machine Learning Process

Lab 0 - Launch an Amazon SageMaker Jupyter Notebook

Prerequisites and assumptions

  1. To complete this lab, you need an personal Laptop and an AWS account that provides access to AWS services.

Steps

  1. Sign In to the AWS Console
  2. Click Services, search for Amazon SageMaker and Click Amazon SageMaker in the dropdownFind SageMaker
  3. After you land on Amazon SageMaker console, click on Notebook InstancesSageMaker Console
  4. Click Create NotebookCreate Notebook
  5. Give Notebook a name you can remember and fill out configuration details as suggested in the screenshots below.Create Notebook Instance
  6. Select IAM RoleSelect Existing Role
  7. Create a new role if one doesn't exist. Create new role
  8. Privide a path to clone public git repo that we will use today for our workshop to download data dictionary and Jupyter IPython NotebookSelect Git Repo
  9. Provide the path of Git Repo.Provide Git url
  10. Click Create Notebook InstanceCreate Notebook Instance
  11. In the Amazon SageMaker Console-->Notebook Instances, wait for your notebook instance to start. Observe change from Pending to In Service status.Creation pendingNotebook In Service
  12. Remember the name of your notebook instance and Click Open Jupyter for your notebook.Notebook In Service
  13. Validate your data and notebook cloned from Git RepoValidate Git Clone

Lab 1 - Finish your Lab in Jupter Notebook

  1. Click on healthcare-fraud-identification-using-PCA-anomaly-detection.ipynb and start working on your lab. From here onwards all the instruction will be in the Jupyter Notebook. Come back and after you have completed all the steps in the Jupyter Notebook and finish rest of the steps suggested below.

Finish the Lab

  1. Congratulations! you have finished all the labs. Please make sure to delete all resources as mentioned in the section below.

Cleanup Resources

  1. Go to Amason Sagemaker console to shutdown your notebook instance, select your instance from the list.Select Stop from the Actions drop down menu. Stop Notebook Instance
  2. After your notebook instance is completely Stopped, select Delete fron the Actions drop down menu to delete your notebook instance.Delete Notebook Instance
  3. Navigate to S3 Console. S3 Console
  4. Find Bucket created in Lab 1 and Click to list objects in the bucket.Find Bucket
  5. Navigate to the model-tar.gz and delete it by using Actions menu.Delete Model
  6. Navigate to the training data file healthcare_fraud_identification_feature_store and delete it by using Actions menu.Delete Training Data
  7. After all the objects are deleted in the bucket. Go ahead and delete the bucket using the Actions menu.Delete Bucket

healthcare-fraud-identification-using-pca-anomaly-detection's People

Contributors

awsvik avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.