GithubHelp home page GithubHelp logo

data-stuff's Introduction

data-stuff

Data Science Stuff/Portfolio

Prophet Anomaly Detection

A locally hosted application that's used to identify anomalous data trends from key raw & transformed data sources. The application leverages python machine learning packages, as well as database connections to query data, predict upper & lower bounds, and highlight areas where data is below or above expected thresholds. The application triggers alerts, which are sent directly to me via a Zapier <> Mattermost integration, which significantly reduces downtime, as well as time taken to diagnose the root cause of anomalous data.

Snowflake Spend Optimization & Forecasting

As your analytics infrastructure grows, so does your database spend. The cost of a database can be greater than the value of the analytics infrastructure if the database and its accompanying data models are not optimized. This occurred at my company, our costs began to exceed the value provided and I identified the need to optimize key data models. In order to do this I needed to focus on improving clustering, incremental build logic, and data stored in cache during each job. In order to accomplish this, I developed a custom SQL script that aggregates usage and calculates cost at an individual data model level using custom calculations based on our contract rates as they pertain to specific warehouse types. Snowflake only aggregates cost at an individual warehouse level, which limits our ability to identify more granular costs. This script allowed me to identify high-spend & high-usage data models to target for optimization. Once the data models were identified, optimized, and enough time had passed to establish new daily spend baselines, I performed time series forecasting, using Facebook's prophet package, to calculate expected spend for the next fiscal year. The forecast provided our finance team with guidance on the upper and lower bounds of database spend to negotiate the minimum required contract price that would allow our analytics infrastructure to continue to grow at the rate our company required.

Customer Retention

Our team released a cloud offering of our previously on-prem solution. The revenue model was based on a freemium model that allowed free usage for up to a certain number of registered users. We already knew that keeping a cloud workspace and its user base engaged was key to converting free workspaces to paid workspaces. What we wanted to identify were the key features with the greatest impact on keeping a workspace engaged. The purpose of this was to guide product design decisions that would encourage workspaces and their users to engage with these key features earlier on in their lifecycle. To accomplish this, I wrote a sql script, that I extracted as a dataframe, containing a large set of features. Then I created a python method that looped through several machine learning models, trained them using the dataframe, and output key performance metrics. Then I selected the most performant model based on these metrics, evaluated feature importance, and informed the product team on where to focus their efforts.

data-stuff's People

Contributors

enelson720 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.