GithubHelp home page GithubHelp logo

srinivas365 / pyspark_exercises Goto Github PK

View Code? Open in Web Editor NEW

This project forked from areibman/pyspark_exercises

0.0 0.0 0.0 17.93 MB

Practice your Pyspark skills!

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 100.00%

pyspark_exercises's Introduction

Pyspark Exercises

We created this repository as a way to help Data Scientists learning Pyspark become familiar with the tools and functionality available in the API. This repository contains 11 lessons covering core concepts in data manipulation. This repository was forked from Guipsamora's Pandas Exercises project and repurposed to solve the same exercises using the Pyspark API instead of Pandas.

Tutorials are great resources, but to learn is to do. So unless you practice you won't learn. Pyspark is no exception!

There will be three different types of files:
      1. Exercise instructions
      2. Solutions without code
      3. Solutions with code and comments

My suggestion is that you learn a topic in a tutorial, video or documentation and then do the first exercises. Learn one more topic and do more exercises. If you are stuck, don't go directly to the solution with code files. Check the solutions only and try to get the correct answer.

Suggestions and collaborations are more than welcome.🙂 Please open an issue or make a PR indicating the exercise and your problem/solution.

Contributing

As a community project, we're seeking help to converting this repo into a complete repository for mastering Pyspark.

We need assistance with the following:

Convert existing .ipynb files with Pandas solutions to Pyspark solutions.

Select an issue in the Issues tab corresponding to one of the tutorial directories. In your pull request, re-write the directory using Pyspark instead of pandas. So far, we've listed issues for every exercise in the repo.

Create new issues

We have a lot of refactoring to do outside of the lessons. If you see something that needs to be changed, please raise an issue. To contribute, please either raise an issue in the Issues tab, or raise a pull request for an existing issue.

Readme's

Our readme section could use some work. For instance, we should list ways to run Pyspark on local machines (Windows, MacOS, Linux).

Lessons

Getting and knowing Merge Time Series
Filtering and Sorting Stats Deleting
Grouping Visualization Indexing
Apply Creating Series and DataFrames Exporting

Chipotle
Occupation
World Food Facts

Chipotle
Euro12
Fictional Army

Alcohol Consumption
Occupation
Regiment

Students Alcohol Consumption
US_Crime_Rates

Auto_MPG
Fictitious Names
House Market

US_Baby_Names
Wind_Stats

Chipotle
Titanic Disaster
Scores
Online Retail
Tips

Pokemon

Apple_Stock
Getting_Financial_Data
Investor_Flow_of_Funds_US

Iris
Wine

Video Solutions

Video tutorials of data scientists working through the above exercises:

Data Talks - Pandas Learning By Doing

pyspark_exercises's People

Contributors

guipsamora avatar takaakifuruse avatar areibman avatar max-alletsee avatar gaurangtandon avatar pkro avatar romansnsk avatar doganck avatar mcgradymvp avatar skgurura avatar manjunath24 avatar freddie71010 avatar germavinsmoke avatar jeffcarey avatar aquaraga avatar mukultaneja avatar zee2413 avatar cconw avatar njutn95 avatar oleg104 avatar zaheer031 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.