GithubHelp home page GithubHelp logo

azure / azure-tdsp-projecttemplate Goto Github PK

View Code? Open in Web Editor NEW
522.0 130.0 440.0 57 KB

TDSP: Data science project template repository with standardized directory structure and document templates to support efficient project execution and collaboration.

Home Page: https://github.com/Azure/Azure-TDSP-ProjectTemplate

License: Creative Commons Attribution 4.0 International

R 100.00%
team

azure-tdsp-projecttemplate's Introduction

TDSP Project Structure, and Documents and Artifact Templates

This is a general project directory structure for Team Data Science Process developed by Microsoft. It also contains templates for various documents that are recommended as part of executing a data science project when using TDSP.

Team Data Science Process (TDSP) is an agile, iterative, data science methodology to improve collaboration and team learning. It is supported through a lifecycle definition, standard project structure, artifact templates, and tools for productive data science.

NOTE: In this directory structure, the Sample_Data folder is NOT supposed to contain LARGE raw or processed data. It is only supposed to contain small and sample data sets, which could be used to test the code.

The two documents under Docs/Project, namely the Charter and Exit Report are particularly important to consider. They help to define the project at the start of an engagement, and provide a final report to the customer or client.

NOTE: In some projects, e.g. short term proof of principle (PoC) or proof of value (PoV) engagements, it can be relatively time consuming to create and all the recommended documents and artifacts. In that case, at least the Charter and Exit Report should be created and delivered to the customer or client. As necessary, organizations may modify certain sections of the documents. But it is strongly recommended that the content of the documents be maintained, as they provide important information about the project and deliverables.

azure-tdsp-projecttemplate's People

Contributors

alphagit avatar danielleodean avatar deguhath avatar gopitk avatar hangzh-msft avatar josephhaaga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-tdsp-projecttemplate's Issues

Metrics in Project Charter

Hi Azure Team,

I have some questions about the follwing in the Metrics section in the Project Charter
https://github.com/Azure/Azure-TDSP-ProjectTemplate/blob/master/Docs/Project/Charter.md
What are the qualitative objectives? (e.g. reduce user churn)
What is a quantifiable metric (e.g. reduce the fraction of users with 4-week inactivity)
Quantify what improvement in the values of the metrics are useful for the customer scenario (e.g. reduce the fraction of users with 4-week inactivity by 20%)
What is the baseline (current) value of the metric? (e.g. current fraction of users with 4-week inactivity = 60%)
How will we measure the metric? (e.g. A/B test on a specified subset for a specified period; or comparison of performance after implementation to baseline)

In the example, is it about a classification model for customer churn? Assuming it is, the target variable of the example would be something like customer churn. In item#2 "reduce the fraction of users with 4-week inactivity", how can we decide a quantifiable metric in relation to the target variable (4-week inactivity would be a independent variable while the target is customer churn)?

At which stage in the lifecycle can metric values be measured?

Thank you.

C. D.

File names are not portable

When using this folder structure in an R package, I get this warning, indicating that the file names are potentially not portable to other operating systems:

* checking for portable file names ... WARNING
Found the following files with non-portable file names:
  inst/Azure-TDSP-ProjectTemplate/Docs/Data_Report/Data Defintion.md
  inst/Azure-TDSP-ProjectTemplate/Docs/Model/Baseline/Baseline Models.md
  inst/Azure-TDSP-ProjectTemplate/Docs/Model/Model 1/Model Report.md
  inst/Azure-TDSP-ProjectTemplate/Docs/Project/Exit Report.md
  inst/Azure-TDSP-ProjectTemplate/Docs/Project/System Architecture.docx
  inst/Azure-TDSP-ProjectTemplate/Docs/Model/Model 1

Make TDSP more closely match with AML implementation

The AML implementation of TDSP is nice in that it adds numbering to the folders under code. It makes it easy for anyone, even those that aren't as familiar with the process, to understand ordering. Additionally, the use of all lower case in the AML template seems easier to type and deal with.

Outdated structure folder "Code"

Hi all,

The "Code" folder has not been updated based on the TDSP Documentation (https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview)
image

My personal suggestion for the structure is as following:

  • DataPrep: rename it to Pipeline. The folder will contain all the scripts DS scripts: data wranling, train test split, preprocessing+feature engineering, modelling and scoring.

  • Model: good. it used to store models locally

  • Operationalization: contains the script to build the infrastructure and also the orchestrator that runs the pipeline

  • Notebooks: missing folder. It's useful to keep traces of notebooks either locally or from remote tools/platforms like Databricks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.