GithubHelp home page GithubHelp logo

Take the plunge of distributed machine learning training with Spark, Pytorch and TensorFlow

Hi there 👋

A wee bit about me: I am an experienced Software Engineer and people manager with technical expertise in Apache Spark, HDFS, AWS, Azure, machine learning, and distributed large-scale systems.

I'm highly motivated, and always excited about solving problems and learning. I possess a curious, positive, and can-do attitude. Drove success and improvement for both distributed machine systems and people systems, optimizing Spark cluster, driving +350% throughout at Akamai scale [billions of events a day, processing 1.3PT], saving the company money on compute and optimizing complex ml model deployment from months to 2–3 days cycle, by aligning people-based systems, influencing strategic software integrations, and adopting software best practices.

I have been honored with the Beacon award in the Databricks Ambassadors Program, a testament to my commitment to contributing to data and AI technologies and sharing my expertise with others.

github

🔭 Industry Contributions

virtual-kubelet-kotlin-spring - how to leverage virtual kubelete and manage serverless services from your Kubernetes cluster

build-e2e-ml-bigdata - full end-to-end application on creating machine learning pipelines on top of parquet compressed data leveraging cloud services.

Author of O’Reilly’s book: Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch, Adi Polak, 2023.

“Small Files in a Big Data World.” Chapter By Adi Polak, In 97 Things Every Data Engineer Should Know. Edited by Tobias Macey. O’Reilly, 2021: 131-133.

“Three Important Distributed Programming Concepts.” Chapter By Adi Polak, In 97 Things Every Data Engineer Should Know. Edited by Tobias Macey. O’Reilly, 2021: 175-176.

“Deploying Kubernetes in an Enterprise Environment.” in Kubernetes in the Enterprise Trends Report. DZone, 2020.

“Big Data Building Blocks: Selecting Architectures and Open-Source Frameworks.” In DZone 2019 Guide to Big Data. DZone, 2019.

Technical reviewer for Delta Lake: The Definitive Guide, O’Reilly Media, and Databricks, upcoming book, 2024.

Technical reviewer for Fundamentals of Data Observability. O’Reilly Media and Andy Patrella, 2023.

Technical reviewer for Introducing MLOps. How to Scale Machine Learning in the Enterprise. O’Reilly Media and Dataiku, 2020.

Committee member at conferences: Scale By the Bay 2021 & 2023, Data & AI/Spark Summit 2021, 2022 & 2023, Voxxeddays Australia 2021.

🌱 Teaching Experience

“Apache Spark ML First Steps. How to Build Your Own Machine Learning Model at Scale.” Presentation for O'Reilly Media, Inc., July 15, 2020.

“Demystifying Scalable Machine Learning with the Spark Ecosystem.” AI Superstream Series: Scaling AI” Course for O'Reilly Media, Inc., September 2021.

“CI/CD for Data Lakes, Managing your data like code.” Presentation for O'Reilly Media, Inc., December. 7, 2022.

“Scaling Machine Learning in 3 weeks.” Three weeks course for O'Reilly Media, Inc. February 10, 17 & 24, 2023.

👯 More Volunteering activities

FlipCon – co-organization of functional programming conference, 2018. KotlinTLV – co-leading the KotlinTLV meetup group, 2019. She Codes – Nationwide Director of Coding Skills, March 2017 to October 2018. BIPA – Team Lead at Germany - Bavaria Israel Partnership Accelerator, driving innovative solutions to traditional markets from 2016 to 2017.

📝 Articles

“Unlock The Full Business Value Of Data With A Better Engineering Process,” in Forbes. May 26, 2022. “COVID-19 and Mining Social Media - Enabling Machine Learning Workloads with Big Data,” InfoQ. October 2, 2022. “What is Serverless SQL? And How to Use it for Data Exploration,” Towards Data Science. December 1, 2020. “What is TensorFrames? TensorFlow + Apache Spark,” Microsoft Azure. March 25. 2019. “Data at Scale: Learn How Predicate Pushdown Will Save You Money.” Microsoft Azure. December 18, 2018. “Apache Spark — Catalyst Deep Dive,” Microsoft Azure. November 13, 2018.

Adi Polak's Projects

dapr icon dapr

Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

data-engineering icon data-engineering

A comprehensive collection of educational content for aspiring Data Engineers

datahub icon datahub

A Generalized Metadata Search & Discovery Tool

delta icon delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

dspy icon dspy

DSPy: The framework for programming with foundation models

eventhubs-producer-python icon eventhubs-producer-python

This project demonstrate how to create an Azure EventHubs producer with Python and key vault with a step by step tutorial

graphql icon graphql

GraphQL is a query language and execution engine tied to any backend service.

handson-ml icon handson-ml

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

infographics icon infographics

Infographics and presentation I made to share knowledge

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.