GithubHelp home page GithubHelp logo

rknutalapati / databrickscontent Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ryanchynoweth44/databrickscontent

0.0 0.0 0.0 3.69 MB

Examples surrounding Databricks.

Scala 3.59% Python 8.91% Jupyter Notebook 87.51%

databrickscontent's Introduction

Introduction

This repository aims to provide various Databricks tutorials and demos.

If you would like to follow along, check out the Databricks Community Cloud.

Demos

Stream Databricks Example

The demo is broken into logic sections using the New York City Taxi Tips dataset. Please complete in the following order:

  1. Send Data to Azure Event Hub (python)
  2. Read Data from Azure Event Hub (scala)
  3. Train a Basic Machine Learning Model on Databricks (scala)
  4. Create new Send Data Notebook
  5. Make Streaming Predictions

Databricks Delta

The demo is broken into logic sections. Please complete in the following order:

  1. Setup Environment
  2. Data Ingestion
  3. Bronze Data to Silver Data
  4. A quick ML Model
  5. Silver Data To Gold Data
  6. A Few Cool Features of Delta
  7. Summary

Programmatically Generate a Databricks Access Token

Using Service Principals to Automate the creation of a Databricks Access Token

  1. README
  2. Reference Blog

Delta Lake Views

This is a lie. Delta Lake does not actually support views but it is a common ask from many clients. Whether views are desired to help enforce row-level security or provide different views of data here are a few ways to get it done.

  1. README
  2. Hive Views with Delta Lake
  3. Delta Lake "Views"

Delta Lake CDC Operations

Batch processing changes within a delta lake is common practice and easy to do. We provide a few examples on how to use the Delta Lake time travel capabilities to get different views on how a table has changed between two versions.

  1. README
  2. Python Script
  3. Scala Script

Databricks Autoloader

An example of using the Autoloader capabilities for file-based processing. Ensures exactly one-time processing for files.

  1. README

Resources

In this directory I keep a central repository of articles written and helpful resource links with short descriptions.

Below are a number of link with quick descriptions on what they cover.

  • Upsert Databricks Blog

    • This blog provides a number of very helpful use cases that can be solved using an upsert operation. The parts I found most interesting were different functionality when it came to the actions available when rows are matched or not matched. Users have the ability to delete rows, updates specific values, insert rows, or update entire rows. The foreachBatch function is crucial for CDC operations.
  • Upsert Notebook Example:

    • Python and Scala example completing an upsert with the foreachBatch function.
  • Delta Table Updates

    • Shows various scenarios for updating delta tables via updates, inserts, and deletes.
    • There is specific information surrounding schema evolution with the upsert operations, specifically, schema can evolve when using insertAll or updateAll, but it will not work if you try inserting a row with a column that does not exist yet.
    • There can be 1, 2, or 3 whenMatched or whenNotMatched clauses. Of these, at most 2 can be whenMatched clauses, and at most 1 can be a whenNotMatched clause.
  • Z-ordering Databricks Blog

  • Optimize and Partition Columns

  • Dynamic Partition Pruning

Contact

Please feel free to recommend demos or contact me if there are any confusing/broken steps. For any additional comments or questions email me at [email protected].

Disclaimer

These examples are not affiliated or purposed to be official documentation for Databricks. For official documentation and tutorials please go to the Databricks Academy or the Databricks blog

databrickscontent's People

Contributors

ryanchynoweth44 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.