GithubHelp home page GithubHelp logo

dennyglee / dennyglee.github.io Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 131.24 MB

about me! data dork, scribe, geek, ultimate frisbee fan, mountain climber (barely!), wanna be cyclist... occasionally awake

Home Page: https://dennyglee.github.io/

License: Apache License 2.0

dennyglee.github.io's Introduction

Hi there, my name is Denny Lee, and I am a developer advocate at Databricks, a long-time contributor to Apache Spark™ and MLflow, Delta Lake maintainer, LLM Avalanche creator, Data Brew by Databricks caster (Spotify, Apple Podcasts, YouTube). and long-time Seattle-ite. In my past life, I was part of Microsoft Engineering for SQL Server, Cosmos DB, Bing, and part of the Project Isotope incubation team that brought Apache Hadoop into Microsoft.

Here are my some of my presentations and videos at YT @dennyglee; I'm also (co-)author of the following books including Delta Lake: The Definitive Guide (Early Release), Learning Spark, 2nd Edition. If you're interested in posts in coffee, foodie, travel, cycling, and data posts, check out my personal blog.

I am the (co-)author of the following posts, links, and assets are in reverse chronlogical order. Some posts were originally on my wordpress site but have been moved over to GitHub, blemishes and all, for posterity.

This blog is inspired by Frank McSherry's musings which I highly recommend you follow especially if you're a fan of Rust like I am


date topics source description
2023/08/31 Delta What is the Delta Lake Transaction Log?
2023/08/25 Spark, Delta Why Structured Streaming and Delta Lake for Batch ETL?
2023/07/27 LLMs Quick Start with llama.cpp with Llama 2 and Macbook M2 Air
2023/06/29 LLMs, Spark Databricks Introducing English as the New Programming Language for Apache Spark
2023/06/29 Delta Lake Databricks Announcing Delta Lake 3.0 with New Universal Format and Liquid Clustering
2023/06/26 LLMs site LLM Avalanche: Over 40 speakers and 900 people attended this LLM conference-within-a-conference to kick start Data + AI Summit 2023
2023/03/20 Spark, Delta Why does altering a Delta Lake table schema not show up in the Spark DataFrame?
2022/12/13 Delta Lake delta.io Building a more efficient data infrastructure for machine learning with Open Source using Delta Lake, Amazon SageMaker, and EMR
2022/11/10 community Integration Developer News How Developers Can Manage and Contribute to Successful Open-Source Projects
2022/08/11 Delta Lake delta.io Apache Flink Source Connector for Delta Lake tables
2022/08/02 Delta Lake delta.io Delta 2.0 - The Foundation of your Data Lakehouse is Open
2022/06/15 Databricks Databricks Defining the Future of Data & AI: Announcing the Finalists for the 2022 Databricks Data Team OSS Award
2022/05/18 Delta Lake delta.io Multi-cluster writes to Delta Lake Storage in S3
2022/05/05 Delta Lake delta.io Delta Lake 1.2 - More Speed, Efficiency and Extensibility Than Ever
2022/04/27 Delta Lake delta.io Writing to Delta Lake from Apache Flink
2022/03/24 Delta Lake, Trino Starburst Starburst and Databricks Collaborate on the Trino Delta Lake Connector
2022/03/16 Delta Lake Databricks Extending Delta Sharing to Google Cloud Storage
2022/03/12 Delta Lake, PrestoDB PrestoDB Native Delta Lake Connector for Presto
2022/01/31 Delta Lake Databricks Make Your Data Lakehouse Run, Faster With Delta Lake 1.1
2022/01/28 Delta Lake Databricks The Ubiquity of Delta Standalone: Java, Scala, Hive, Presto, Trino, Power BI, and More!
2022/01/21 Delta Lake Databricks Extending Delta Sharing for Azure
2021/12/01 Delta Lake Databricks The Foundation of Your Lakehouse Starts With Delta Lake
2021/04/23 podcasts Databricks How We Launched a Podcast: Lessons, (Minor) Mishaps & Key Takeaways
2021/04/21 Delta Lake Databricks Attack of the Delta Clones (Against Disaster Recovery Availability Complexity)
2021/02/10 Delta Lake Databricks Automatically Evolve Your Nested Column Schema, Stream From a Delta Table Version, and Check Your Constraints
2020/12/22 Delta Lake Databricks Natively Query Your Delta Lake With Scala, Java, and Python
2020/11/20 Delta Lake Databricks How Scribd Uses Delta Lake to Enable the World's Largest Digital Library
2020/09/29 Delta Lake Databricks Diving Into Delta Lake: DML Internals (Update, Delete, Merge)
2020/08/27 Delta Lake Databricks Enabling Spark SQL DDL and DML in Delta Lake on Apache Spark 3.0
2020/06/18 Delta Lake Databricks Time Traveling with Delta Lake: A Retrospective of the Last Year
2020/05/19 Delta Lake Databricks Schema Evolution in Merge Operations and Operational Metrics in Delta Lake
2020/04/14 health Databricks COVID-19 Datasets Now Available on Databricks: How the Data Community Can Help
2020/01/29 Delta Lake Databricks Query Delta Lake Tables from Presto and Athena, Improved Operations Concurrency, and Merge performance
2019/11/05 ML Databricks Using AutoML Toolkit's FamilyRunner Pipeline APIs to Simplify and Automate Loan Default Predictions
2019/10/03 Delta Lake Databricks Simple, Reliable Upserts and Deletes on Delta Lake Tables using Python APIs
2019/09/24 Delta Lake Databricks Diving Into Delta Lake: Schema Enforcement & Evolution
2019/09/10 ML Databricks Using AutoML Toolkit to Automate Loan Default Predictions
2019/08/21 Delta Lake Databricks Diving Into Delta Lake: Unpacking The Transaction Log
2019/08/14 Delta Lake, ML Databricks Productionizing Machine Learning with Delta Lake
2019/06/18 Delta Lake, Streaming Databricks Simplifying Streaming Stock Analysis using Delta Lake and Apache Spark: On-Demand Webinar and FAQ Now Available!
2019/05/02 ML Databricks Detecting Financial Fraud at Scale with Decision Trees and MLflow on Databricks
2019/04/30 ML, MLflow Databricks Using Dynamic Time Warping and MLflow to Detect Sales Trends
2019/04/30 ML, MLflow Databricks Understanding Dynamic Time Warping
2018/11/13 ML Databricks Applying your Convolutional Neural Network: On-Demand Webinar and FAQ Now Available!
2018/10/29 Delta Databricks Simplifying Change Data Capture with Databricks Delta
2018/10/22 ML Databricks Training your Neural Network: On-Demand Webinar and FAQ Now Available!
2018/10/03 ML, MLflow Databricks MLflow v0.7.0 Features New R API by RStudio
2018/10/01 ML Databricks Introduction to Neural Networks: On-Demand Webinar and FAQ Now Available!
2018/09/18 ML, Spark Databricks Simplify Market Basket Analysis using FP-growth on Databricks
2018/09/13 ML Databricks Identify Suspicious Behavior in Video with Databricks Runtime for Machine Learning
2018/09/12 ML, MLflow Databricks MLflow On-Demand Webinar and FAQ Now Available!
2018/09/09 Delta Lake Databricks Building a Real-Time Attribution Pipeline with Databricks Delta
2018/09/09 ML Databricks Loan Risk Analysis with XGBoost and Databricks Runtime for Machine Learning
2018/08/08 MLflow Databricks MLflow 0.4.2 Released
2018/07/19 Spark Databricks Simplify Advertising Analytics Click Prediction with Databricks Unified Analytics Platform
2018/07/19 Spark, Delta Databricks Simplify Streaming Stock Data Analysis Using Databricks Delta
2018/07/19 Streaming, Spark, Delta Databricks Make Your Oil and Gas Assets Smarter by Implementing Predictive Maintenance with Databricks
2018/07/09 Spark Databricks Analyze Games from European Soccer Leagues with Apache Spark and Databricks
2018/07/02 Spark, Streaming Databricks Build a Mobile Gaming Events Data Pipeline with Databricks Delta
2018/06/27 R Databricks Announcing RStudio and Databricks Integration
2017/11/07 CosmosDB github Lambda Architecture with Azure Cosmos DB and HDInsight (Apache Spark)
2017/07/01 Spark O'Reilly Introduction to Apache Spark 2.0
2017/02/18 Spark book Learning PySpark: Build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0
2016/12/02 Spark github How Apache Spark performs a fast count using the parquet metadata
2016/06/30 Spark Databricks Introducing Getting Started with Apache Spark on Databricks
2016/06/22 Spark Databricks, KDNuggets Apache Spark Key Terms, Explained
2016/06/08 Spark Databricks Another Record-Setting Spark Summit
2016/05/28 Spark On-Time Flight Performance with GraphFrames for Apache Spark
2016/05/24 Spark, Genomics Databricks Predicting Geographic Population using Genome Variants and K-Means
2016/05/24 Spark, Genomics Databricks Parallelizing Genome Variant Analysis
2016/05/24 Spark, Genomics Databricks Genome Sequencing in a Nutshell
2016/03/16 Spark, Graph Databricks On-Time Flight Performance with GraphFrames for Apache Spark
2016/02/11 Spark, ML InfoWorld Why you should use Spark for machine learning
2016/02/11 Spark Presentation: Jump Start into Apache® Spark™ 2.0
2016/02/02 Spark Databricks An Illustrated Guide to Advertising Analytics
2015/12/19 community Databricks Databricks launches Meetup-in-a-box for Apache Spark Meetup Organizers
2015/11/09 Spark insideBIGDATA Apache Spark is the Smartphone of Big Data
2015/09/24 Spark Databricks Spark Survey 2015 Results are now available
2015/08/31 Spark Databricks Data Exploration with Databricks
2015/06/09 Spark Databricks Introduction to Databricks
2015/06/04 Spark, ML Databricks Simplify Machine Learning on Apache Spark with Databricks
2014/01/06 HDFS, pig Quick Tip for Compressing Many Small Text Files within HDFS via Pig
2013/09/30 SSAS Analysis Services Multidimensional: It is the Order of Things
2013/05/14 random In the context of quantum entanglement and time travel – Stargate may be more correct than Star Trek
2013/04/26 Hive Optimizing Joins running on HDInsight Hive on Azure at GFS
2013/03/18 blob Why use Blob Storage with HDInsight on Azure
2013/03/12 Avro, Hadoop Using Avro with HDInsight on Azure at 343 Industries
2013/02/04 Spark Installing Spark 0.6.1 Standalone on OSX Mountain Lion (10.8)
2012/12/03 Hadoop, pig Getting your Pig to eat ASV blobs in Windows Azure HDInsight
2012/09/26 SSAS, Hive Microsoft SQL Server Analysis Services to Hive (backup)
2012/09/03 random In the context of quantum entanglement and teleportation – Stargate may be more correct than Star Trek
2012/06/28 SSAS Microsoft Microsoft SQL Server Analysis Services Multidimensional Performance and Operations Guide
2012/05/08 Hadoop Installing Hadoop on OSX Lion (10.7)
2012/03/01 Hadoop, BI BI and Big Data–the best of both worlds!
2012/02/17 Hadoop, JS Hadoop JavaScript– Microsoft’s VB shift for Big Data
2012/01/31 big data Moving data to compute or compute to data? That is the Big Data question
2012/01/24 big data Scale Up or Scale Out your Data Problems? A Space Analogy
2012/01/21 PowerPivot, Hadoop Connecting PowerPivot to Hadoop on Azure – Self Service BI to Big Data in the Cloud
2012/01/12 Hadoop, Azure A funky way to do Hive and Hadoop … on Azure
2011/12/15 Hadoop, Azure An Azure Elephant Never Forgets…
2011/10/01 MS-SQL Microsoft SQL Server 2008 R2: Analysis Services Performance Guide (backup)
2010/12/10 MS-SQL Microsoft Measuring and Understanding the Performance of Your SSIS Packages in the Enterprise (SQL Server Video)
2010/07/01 MS-SQL Microsoft Analysis Services ROLAP for SQL Server Data Warehouses (backup)
2010/06/01 MS-SQL Microsoft Scale-Out Querying for Analysis Services with Read-Only Databases (backup)
2009/12/22 Healthcare book Transforming Health Care Through Information: Case Studies (Health Informatics)
2009/12/16 MS-SQL book Professional Microsoft SQL Server Analysis Services 2008 with MDX
2009/05/12 MS-SQL Microsoft Disk Partition Alignment Best Practices for SQL Server
2008/11/05 MS-SQL Microsoft Reaching Compliance: SQL Server 2008 Compliance Guide (backup)
2008/04/17 MS-SQL Microsoft Analysis Services Distinct Count Optimization (backup)
2007/09/24 Privacy Analyzing Data while Protecting Privacy – A Differential Privacy Case Study
2007/09/01 MS-SQL Microsoft SQL Server 2005: Precision Considerations for Analysis Services Users (backup)
2006/03/02 Research paper (acknowledgement) Early establishment of a pool of latently infected, resting CD4+ T cells during primary HIV-1 infection
2001/10/01 MS-SQL book Professional SQL Server 2000 Data Warehousing with Analysis Services

dennyglee.github.io's People

Contributors

dennyglee avatar djliden avatar

Stargazers

Sebastian avatar

Watchers

 avatar

Forkers

djliden

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.