GithubHelp home page GithubHelp logo

isabella232 / agiledataplatform Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoftdocs/agiledataplatform

0.0 0.0 0.0 29.17 MB

Content and documentation around agility for the data platform including engineering testing, agile architectures and operational processes using DevOps techniques

License: Creative Commons Attribution 4.0 International

agiledataplatform's Introduction

Agile Data Platform

Architecture and documentation for a truly agile data platform that is operationally scalable, not just architecturally.

Introduction

This repository is intended to provide examples and explanations of how to architect a modern cloud data platform which is driven by DevOps processes and allows enterprise scale in terms of both capability and work throughput. It is designed to remove unnecessary blockers and issues which arise with more traditional approaches by following the example of application development. Concepts such as "Cattle vs Pets" and loose coupling will be central to this architecture, alongside a modular approach with strong, contract based interactions between components. The documents here describe the platform as a whole and should not be confused with data engineering or data modelling documentation both of which still have their place inside of data products alongside aspects such as data quality and compliance.

This information has been created because we see customers with very specific problems arising from traditional approaches to their platforms. As a general rule, data platform evolution follows the stages set out below. While it might be possible to jump straight to the "mature" end of the spectrum the skills required to do so are not usually present within an organisation. For this reason, we recommend progressing naturally through the stages as the organisation matures according to requirements. Always remember that the purpose is to process and present data, and delivering this goal by whatever means you can as early as you can will get buy in from the business while spending a year trying to build an advanced platform may not deliver the data for the business to use.

Cloud Data Platform Maturity Journey

Stage 1 Stage 2 Stage 3 Stage 4
Initial Cloud Data Platform Enterprise Scale Agile Platform Code First Platform
  • Initial move to the cloud for data.
  • Standard architecture
  • Ad hoc deployment
  • Ad hoc reporting
  • Scaling up
  • Automated deployment of templates
  • Mass onboarding of data
  • Centralised data team
  • Standardisation
  • Centralisation
  • Policy driven architecture
  • Segregated staging environments
  • Single version of the truth
  • Distributed Teams
  • Strong Engineering Approach
  • Automated Testing
  • Product driven development
  • Decentralised for operational scale
  • Advanced automation
  • Ephemeral staging environments
  • Competing products for evolution and progress
  • Fully code driven
  • Slow deployment
  • Hard to change
  • Slow to scale
  • Teams cannot scale
  • Deployments are high risk
  • Team throughput slows
  • Architecture is inflexible
  • Products interfaces become bottlenecks
  • Repeatability is hard
  • Copying solutions is difficult
  • Advanced skills required

The agile data platform can be broken down into several topics, each of which can be considered independently if retrofitting into an existing practice, or together if starting from scratch.

Agile Data Platform Introduction

Structured Testing

This is an often misunderstood subject within the data practice since it does not focus on testing the data but rather the implementation. Structured testing allows you to perform unit testing and integration testing against your data platform and pipeline components to ensure quality and consistency.

Structured Testing

Presentation - Introduction to Testing

DataOps

DataOps are a set of processes and methods for managing a project and product development. While this may include some automation it is not simply scripted deployment. DataOps is the bringing together of the skills and people needed to successfully build a data product. These include data modelling, data engineering, testing, infrastructure, security, networking, disaster recovery and backup, reporting, monitoring and of course support. Each of these skills and more must be represented within the team to allow frictionless progress to be made, removing the need for change controls between departments and placing the responsibility directly with the project team itself. The processes here allow for quality checks and tests to be made during each development cycle, giving confidence that the next product release will do everything expected, with any issues and feedback being dealt with by the team itself.

DataOps

Presentation - Introduction to DataOps

Agile Platform Architecture

Processes such as automation can make delivery more agile, but beyond a certain point team size becomes an issue and complexity starts to overtake agility and slows progress. To prevent this, a different architecture is needed which can break the problem down into smaller tasks and projects and make them more manageable. Not only to make the teams more manageable but to reduce operational complexity. Traditionally data platforms have been architected as end to end processes from source to destination with long interlinked ETL or ELT pipelines each interdependent to the point that one failure will break the whole system. With an agile data approach our aim is to follow the loosely coupled, highly focused "micro-service" approach from application development. Centralisation is often used in data platforms, and while this works initially it also adds unnecessary work and slows progress. The agile data architecture centralises only where necessary and justified.

Architecture

Presentation - Agile Data Platform Introduction

Supporting Information

Architectural Principles

Glossary of terms

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Legal Notices

Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.

Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.

Privacy information can be found at https://privacy.microsoft.com/en-us/

Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.

agiledataplatform's People

Contributors

dalusty avatar microsoft-github-operations[bot] avatar microsoftopensource avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.