GithubHelp home page GithubHelp logo

dp-203-notes's Introduction

DP-203-Notes

1) Distributing Data in Synapse Analytics

Round-Robin Distributed

Data is distributed evenly in a random fashion

Best for:

  1. There is no clear distribution key
  2. You don't have frequent joins with other tables
  3. Uniform distribution
  4. Temporary staging table
  5. A simple starting point

Hash Distributed

Data is distributed deterministically by using a hash function

Best for:

  1. Large tables like fact tables or historical transaction tables
  2. Tables with frequent inserts, updates and deletes

Example of DDL


CREATE TABLE YOUR_TABLE 
(
    COLUMN1 INT NOT NULL, 
    COLUMN2 INT NOT NULL, 
    COLUMN3 INT NOT NULL, 
    COLUMN4 INT NOT NULL, 
    COLUMN5 VARCHAR(20) 
)
WITH 
(
    CLUSTERED COLUMNSTORE INDEX
    DISTRIBUTION = HASH(COLUMN1)
)


Replicated

Full copy of the table is replicated to every node

Best for:

  1. Small lookup of dimension tables that are frequently joined with larger tables

2) Pruning Data

Data Skipping

Data skipping is automatically collected when data is written into a Delta Table

Delta Lake on Databricks uses the minimum and maximum values to speed up queries


Z-Ordering

Is a technique for colocating related information in the same set of files and it is automatically used by the data-skipping algorithms of delta lake on databricks to substantially reduce the amount of data to be read


Dynamic File Pruning (DFP)

Allows files to be skipped within partitions

It is good for non partitioned tables, of for joins of non-partitioned columns

spark.databricks.optimizer.dynamicParitionPruning
spark.databricks.optimizer.deltaTableSizeThreshold
spark.databricks.optimizer.deltaTableFilesThreshold

3) Compressing Data


Compression Options


Rowstore

  • Row Compression
  • Page Compression

Columnstore

  • Columnstore compression by default
  • Columnstore archival compression

Compression of Rowstore objects

  • Available in azure SQL database
  • Row or page compression can be enabled or disabled both online or offline
  • Disk space requirements when enabling or disabling are the same as when you are creating or rebuilding an index

Example

ALTER TABLE TABLE1 REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW)

Compression of Columnstore objects

  • Is enabled by default
  • Used by clustered index or nonclustered columnstore index
  • Indexes with columnstore archival compression are slower thant those without it
  • Use only to reduce the storage size of data that is not accessed frequently

dp-203-notes's People

Contributors

afonsofeliciano avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.