GithubHelp home page GithubHelp logo

nyudatabootcamp / materials Goto Github PK

View Code? Open in Web Editor NEW
26.0 10.0 46.0 18.03 MB

Materials used in class when teaching Data Bootcamp at NYU Stern.

License: MIT License

Jupyter Notebook 96.77% Python 1.49% TeX 1.74%
material data-bootcamp python pandas-tutorial

materials's Introduction

Data Bootcamp


A course exploring economic and financial data with Python at NYU's Stern School of Business. For current information and links to course material, try the Data Bootcamp website.

To see an interactive preview of some of the contents here, click the my binder button below:

Binder


November 2015. We are offering two sections of Data Bootcamp at NYU Stern in Spring 2016, one for undergrads (ECON-UB.0232, Tuesday and Thursday, 2-3:15, January 26 to May 10) and one for MBA students (ECON-GB.2313, Wednesday nights, 6-9pm, February 8 to May 9). They are standard for-credit courses at NYU.

The course was developed by Stern faculty and students with the assistance and support of executives at Amazon. The immediate goal is to train students to succeed as summer interns and full-time employees of technology companies, but the same skills are valued in finance, marketing, consulting, media, and other areas. We think of it as literacy for the modern age and a selling point in finding a job.

More concretely, the course is designed to (i) introduce students to sources of economic, financial, and business data and (ii) give programming newbies a sense of how modern software -- in this case Python -- makes life easier and more interesting. We'll let the data speak for itself. But coding is an essential skill in the modern world. You can do lots of things in Excel, but if you value your time -- and you should -- you'll find you can use it more efficiently with a modern programming language. We like to say we do it because we're lazy, laziness being a synonym here for efficiency. Former students tell us it's become a key to success in the business world. An alum with strong programming skills worries that this course will make him obsolete.

The Data Bootcamp course will NOT cover SQL databases. SQL Bootcamp is a separate non-credit course. Same team, same attitude, different content. For more information, scroll down.

You can find more information about the course on the course website. If you have questions, email Dave Backus ([email protected]) or track him down in his office (KMC 7-68) or at the Malt House.

Related information

Related courses. Several students have asked how Data Bootcamp compares to other courses with significant programming content. I've put together a list, but I recommend you ask around. Data Bootcamp does two things that are unusual: it's newbie-friendly, and there's a strong emphasis on data. The latter strikes me as important, in the sense that when you learn to program, it's helpful to have a purpose in mind. Ours is to collect and manage data.

Suggestions welcome. Post them at the "Issues" link at the top (look for the exclamation point in a circle) or email us at [email protected]. Thanks in advance. (all open issues)

Licensing. We encourage others to use this material and to acknowledge such use. Here's the boilerplate. On the off chance this crossed your mind, here's Richard Stallman's take on the license, which allows commercial development.

Part of the #nyuecon collection at NYU's Stern School of Business.

Another product of the #nyuecon Python factory @ NYU Stern.

Contributor information

We run a special pre-commit git hook to clean up the files before we let git commit them. In order to use this you need to move them into place. We have a python script that does this. So immediately after cloning the repository run the setup_hooks.py file from this directory (e.g. by calling python setup_hooks.py from the command line.)

materials's People

Contributors

cc7768 avatar danielcsaba avatar manojwajekar avatar mwaugh0328 avatar sglyon avatar szokeb87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

materials's Issues

2016 election

The election will likely be a hot topic this fall. We should make an example/lab for that data.

ref #2

value_counts and groupby

We should really cover this method in the summarize notebook -- The groupby example that uses count could be done with this.

Using value_counts might make a good precursor to groupby? Would have to think more about how to draw connection, but definitely should cover value_counts.

Ideas for plots

Create a map of earth with a plot of all Nike factories -- Size of dot represents number of workers and color of dot represents percent workers female.

From enigma.io -- dataset

statsmodels

I got a request from a student to cover basic numpy and stats models during the last 1/3 of the class.

Just making note of it here

Pandas 0.19 breaks the old DataReader

Some people in class ended up on pandas 0.19 somehow. This is problematic because they have gone byeond deprecation and actually broken the old pandas.web.io (or whatever it is called) package.

Need to update notebooks to reflect this.

nike factories

Create a map of earth with a plot of all Nike factories -- Size of dot represents number of workers and color of dot represents percent workers female.

From enigma.io -- dataset

ref #2

Add notes about "who" can be successful in this course

The majority of people who didn't do as well lost a lot of points because they didn't complete/turn in their homework assignments. We should emphasize this and encourage them to do it.

Related comment, previous programming experience was uncorrelated to quality of final project -- A bit of thought (and creativity) go along ways in developing a nice project.

Add this type of thought to syllabus.

comtrade expors

gif of world which shades colors over time according to their net exports using UN comtrade data

Can get from enigma.io -- dataset

ref #2

Error on the practice exam

Hi Guys!
I hope I'm doing this issue submission right. I'm getting an error accessing the practice exam. The answer key works.
Thanks!

screenshot 2016-11-05 11 51 01

Pandas Notebooks

Some useful thoughts from @mwaugh0328.

  • Start the notebooks with a conceptual and applied question we're going to answer. For example, for pandas cleaning the conceptual question could be, "What happens when the data you read in isn't in the format you want it? The applied version of that question would be, "Your boss wants you to do X with a certain dataset, but the dataset is all screwed up. He only cares about the answer and you have three hours to give him an answer... How do you cleanup the data so that you can give him an answer"
  • Less datasets per notebook. It gets really confusing to be going back and forth between different datasets. If we want to highlight different things (that might not all appear in an original dataset) then take the dataset and add them. For example, if we wanted to talk about missing values and only wanted to use the Chipotle data then we could just break that dataset by adding missing values.
  • More exercises. The per student realization of attention payed follows a random walk and so the variance increases at root t. The exercises play the part of an "ss rule" and draw everyone back to the center (except for students who have landed in the absorbing boundary of facebook)
  • Related to the previous two boxes: The first exercise of a class might be to explore a dataset. Have the students figure out what it contains, what they might want to do to it, and what questions we might want to ask of the data. This allows them to get an idea of what is in the dataset and helps students be able to start paying attention again even after they had previously drifted off because they will always know what data we are working with.

OECD/EuroStat/Enigma/... now in pandas-datareader

We have at least a few examples where we use pd.read_csv and grab some files using a url where we could use pandas-datareader instead -- For example, there is OECD health data in the cleaning notebook.

My inclination is to move of the data reading as possible to pandas-datareader because it is less likely to stay down than relying on a specific url.

Thoughts? I'm excited about some of their new additions!

subway deserts

Make our own "subway deserts" map. Could also do "closest station" voronoi map.

Can get from enigma.io -- dataset

Could also grab house prices and see whether on the voronoi diagram whether houses closest to the station were more expensive.

Also have data on how much each station gets used -- dataset

my binder

If we end up merging #21 and no longer have static previews of our notebooks, it would be great to add a my binder button so interested parties can get a live preview!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.