GithubHelp home page GithubHelp logo

manchas2k4 / datapy_cadi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chipdelmal/datapy_cadi

0.0 1.0 0.0 162.84 MB

Materials for the "Data Wrangling" CADi workshop @ "Tecnológico de Monterrey"

License: GNU General Public License v3.0

Python 39.59% Jupyter Notebook 60.41%

datapy_cadi's Introduction

dataPy CADi

POPM. This repository contains the materials for the Data acquisition, wrangling and exploratory analysis in Python, three days intensive CADi ("Cursos de Actualización en las Disciplinas") for faculty members at "Tecnológico de Monterrey" Institute.

The course covers subjects include the parsing and handling of data from different social sources, as well as the use of current frameworks for data-driven analyses.

For other data-analysis related topics please take a look at the dataViz_CADi repository. Which contains exercises on data visualization in R, Python and Mathematica.


Contents

This workshop was created with flexibility in mind. As such, modules are fairly independent and can be followed in a different order than the one suggested here. For a topic-oriented breakdown of the contents, please have a look at the sitemap.

Day 01 (8h)

  1. Introduction: Objectives, scope, requirements and expectations.
  2. Python 101: Introduction to the programming language: description, core types, collections (data structures) and functions.
  3. Python Environments: Using anaconda and virtualenv for development.
  4. IDE's: Using Jupyter, Spyder, nteract, and Atom to write and launch our code.
  5. Git: Version control using github for code development, sharing and collaboration.

Day 02 (8h)

  1. Data Primer: Data science and how does data wrangling fit into it?
  2. Twitter: Interfacing with the API to get trends, tweets, tags, etcetera.
  3. Data Wrangling (part 1): Using pandas and matplotlib.
  4. Intermediate Python: Dealing with files, serialization and a simple cases of parallel computing.
  5. Google Trends: Retrieving trends from google searches.
  6. A Story to Tell: Data-driven storytelling.

Day 03 (8h)

  1. Advanced Python: Advanced topics (garbage collection, lambda functions).
  2. Pypi: Installing, browsing, and handling python packages.
  3. Data Wrangling (part 2): Using scikit-learn to parse, manipulate, and pre-analyze data.
  4. Python pkg: Creating and installing a custom python package.
  5. GeoData: How to work with geographic datasets.
  6. Plotting: matplotlib, seaborn and plotly descriptions and exercises from the dataViz repo.

Resources

Tools and Packages

  • anaconda: DataScience/Package manager platform for python and R.
  • atom: Versatile IDE for R, Python, Markdown, Javascript, amongst others.
  • matplotlib: Python's most popular package to plot data.
  • numpy: Highly efficient array manipulation in Python.
  • pandas: Popular dataframe manipulation in Python.
  • plotly: A good alternative for interactive plots in Python (similar to Shiny in R).
  • onlinegdb: Online Python interpreter (originally developed for C and C++).
  • repl.it: Online Python IDE and interpreter (also supports many other languages).
  • scikit-learn: Data analysis and machine learning platform for python.
  • sympy: Symbolic calculus in Python.
  • Google Earth Studio: Useful to create geographic visualizations (currently under beta program).
  • Scrapy: Web-scrapper application for Python
  • BeautifulSoup: An approachable web scraper application.
  • Spacy: Advanced natural language analysis library.
  • NLTK: Natural language toolkit for python.
  • Seaborn: Documentation for the seaborn statistical visualization package.
  • xlrd: Excel data reader.

Online

Books

Contact: [ [email protected] | [email protected] ]
My main projects: [ MGDrivE & MoNeT ]
My personal website: [ chipdelmal.github.io ]


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.