GithubHelp home page GithubHelp logo

pjarnhus / datasciencegym Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 2.0 104 KB

A series of little puzzles and training exercises aimed at improving python knowledge for data science

License: MIT License

Jupyter Notebook 98.93% Python 1.07%
python pandas pandas-tutorial pandas-python python3 tutorial tutorial-exercises

datasciencegym's Introduction

Data Science Gym - Gotta Solve Them All!

A series of little puzzles and training exercises aimed at improving python knowledge for data science

What is it?

This repository is a collection of small training exercises aimed at keep relevant data science skills honed. The focus is mainly on data manipulation and visualisation, but may branch into other topics as well.

Each Jupyter Notebook in the main folder contains five assignments. The first four assigments are centred around a topic, while the fifth assigment purposefully off-topic. Usually the fifth is some nifty little trick, which may come in handy in a data science workflow.

Getting started

In order to get started you need a python installation and Jupyter Notebooks. It is assumed, that you have this and know how to work with it. After that you can merely take the notebooks one at a time. If you do not have python installed, the Anaconda distribution can be recommended.

Should you get stuck, there is a corresponding notebook in the solutions folder with a worked through example.

You are of course always welcome to open an issue here on GitHub, if any of the exercises do not make sense. Chances are you are not the only one facing an issue.

List of topics

Pandas exercises

  • pandas_1: DataFrame shape, columns and data types
  • pandas_2: Group By
  • pandas_3: Merge and Concatenation
  • pandas_4: Missing values
  • pandas_5: I/O
  • pandas_6: Melt and Pivot
  • pandas_7: Time series - Part I: Creating and filtering on DatetimeIndex
  • pandas_8: Time series - Part II: Custom frequency on DatetimeIndex
  • pandas_9: Time series - Part III: Offsetting DatetimeIndex

Getting involved

The more exercises there are in a repo like this, the better it is. Everyone is welcome to contribute, as long as they follow the structure mentioned above and in the contribution guidelines

Even if you are not up for contributing exercises, you can always add a topic to the wishlist in the wiki!!

Enjoy the exercises, and let me know if anything is left wanting.

/Philip

datasciencegym's People

Contributors

kplauritzen avatar philipjarnhus avatar pjarnhus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

datasciencegym's Issues

Missing ToC in README

Can I get an overview of what content to expect in each assignment?

eg:

  • pandas_1.ipynb - Loading data etc
  • pandas_2.ipynb - other stuff

How to contribute changes to jupyter notebooks?

Is there a clear process?
What should I do about the metadata that is automatically generated when I run cells?
I can manually strip output, but that is not the only meta data added by jupyter automatically. See eg #4.

Maybe we should have a guide for setting up a git pre-commit hook?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.