nyudatabootcamp / materials Goto Github PK

Materials used in class when teaching Data Bootcamp at NYU Stern.

License: MIT License

Jupyter Notebook 96.77% Python 1.49% TeX 1.74%

material data-bootcamp python pandas-tutorial

materials's Introduction

Data Bootcamp

A course exploring economic and financial data with Python at NYU's Stern School of Business. For current information and links to course material, try the Data Bootcamp website.

To see an interactive preview of some of the contents here, click the my binder button below:

November 2015. We are offering two sections of Data Bootcamp at NYU Stern in Spring 2016, one for undergrads (ECON-UB.0232, Tuesday and Thursday, 2-3:15, January 26 to May 10) and one for MBA students (ECON-GB.2313, Wednesday nights, 6-9pm, February 8 to May 9). They are standard for-credit courses at NYU.

The course was developed by Stern faculty and students with the assistance and support of executives at Amazon. The immediate goal is to train students to succeed as summer interns and full-time employees of technology companies, but the same skills are valued in finance, marketing, consulting, media, and other areas. We think of it as literacy for the modern age and a selling point in finding a job.

More concretely, the course is designed to (i) introduce students to sources of economic, financial, and business data and (ii) give programming newbies a sense of how modern software -- in this case Python -- makes life easier and more interesting. We'll let the data speak for itself. But coding is an essential skill in the modern world. You can do lots of things in Excel, but if you value your time -- and you should -- you'll find you can use it more efficiently with a modern programming language. We like to say we do it because we're lazy, laziness being a synonym here for efficiency. Former students tell us it's become a key to success in the business world. An alum with strong programming skills worries that this course will make him obsolete.

The Data Bootcamp course will NOT cover SQL databases. SQL Bootcamp is a separate non-credit course. Same team, same attitude, different content. For more information, scroll down.

You can find more information about the course on the course website. If you have questions, email Dave Backus ([email protected]) or track him down in his office (KMC 7-68) or at the Malt House.

Related information

Related courses. Several students have asked how Data Bootcamp compares to other courses with significant programming content. I've put together a list, but I recommend you ask around. Data Bootcamp does two things that are unusual: it's newbie-friendly, and there's a strong emphasis on data. The latter strikes me as important, in the sense that when you learn to program, it's helpful to have a purpose in mind. Ours is to collect and manage data.

Suggestions welcome. Post them at the "Issues" link at the top (look for the exclamation point in a circle) or email us at [email protected]. Thanks in advance. (all open issues)

Licensing. We encourage others to use this material and to acknowledge such use. Here's the boilerplate. On the off chance this crossed your mind, here's Richard Stallman's take on the license, which allows commercial development.

Part of the #nyuecon collection at NYU's Stern School of Business.

Another product of the #nyuecon Python factory @ NYU Stern.

Contributor information

We run a special pre-commit git hook to clean up the files before we let git commit them. In order to use this you need to move them into place. We have a python script that does this. So immediately after cloning the repository run the setup_hooks.py file from this directory (e.g. by calling python setup_hooks.py from the command line.)

materials's People

Contributors

Stargazers

Watchers

materials's Issues

Compelling reason to use fig, ax api.

Fig, ax to plot data from multiple dataframes

2016 election

The election will likely be a hot topic this fall. We should make an example/lab for that data.

ref #2

Kill off "Method 2" from bootcamp_graphics notebook

This is not a particularly useful way to make plots in my opinion. I'm happy to hear otherwise, but I vote for killing off the section about using plt.plot as a way to generate plots.

Fix link to code practice a.

value_counts and groupby

We should really cover this method in the summarize notebook -- The groupby example that uses count could be done with this.

Using value_counts might make a good precursor to groupby? Would have to think more about how to draw connection, but definitely should cover value_counts.

Ideas for plots

Create a map of earth with a plot of all Nike factories -- Size of dot represents number of workers and color of dot represents percent workers female.

From enigma.io -- dataset

Create a new exam

We posted the answers to the exam from spring in this repo...

statsmodels

I got a request from a student to cover basic numpy and stats models during the last 1/3 of the class.

Just making note of it here

NYC elevators

https://github.com/datanews/elevators

ref #2

Pandas 0.19 breaks the old DataReader

Some people in class ended up on pandas 0.19 somehow. This is problematic because they have gone byeond deprecation and actually broken the old pandas.web.io (or whatever it is called) package.

Need to update notebooks to reflect this.

nike factories

Create a map of earth with a plot of all Nike factories -- Size of dot represents number of workers and color of dot represents percent workers female.

From enigma.io -- dataset

ref #2

Add notes about "who" can be successful in this course

The majority of people who didn't do as well lost a lot of points because they didn't complete/turn in their homework assignments. We should emphasize this and encourage them to do it.

Related comment, previous programming experience was uncorrelated to quality of final project -- A bit of thought (and creativity) go along ways in developing a nice project.

Add this type of thought to syllabus.

comtrade expors

gif of world which shades colors over time according to their net exports using UN comtrade data

Can get from enigma.io -- dataset

ref #2

bootcamp_graphics notebook

Still has pandas.io.data (instead of datareader)

Error on the practice exam

Hi Guys!
I hope I'm doing this issue submission right. I'm getting an error accessing the practice exam. The answer key works.
Thanks!

Pandas Notebooks

Some useful thoughts from @mwaugh0328.

Start the notebooks with a conceptual and applied question we're going to answer. For example, for pandas cleaning the conceptual question could be, "What happens when the data you read in isn't in the format you want it? The applied version of that question would be, "Your boss wants you to do X with a certain dataset, but the dataset is all screwed up. He only cares about the answer and you have three hours to give him an answer... How do you cleanup the data so that you can give him an answer"
Less datasets per notebook. It gets really confusing to be going back and forth between different datasets. If we want to highlight different things (that might not all appear in an original dataset) then take the dataset and add them. For example, if we wanted to talk about missing values and only wanted to use the Chipotle data then we could just break that dataset by adding missing values.
More exercises. The per student realization of attention payed follows a random walk and so the variance increases at root t. The exercises play the part of an "ss rule" and draw everyone back to the center (except for students who have landed in the absorbing boundary of facebook)
Related to the previous two boxes: The first exercise of a class might be to explore a dataset. Have the students figure out what it contains, what they might want to do to it, and what questions we might want to ask of the data. This allows them to get an idea of what is in the dataset and helps students be able to start paying attention again even after they had previously drifted off because they will always know what data we are working with.

"bar" to "hist" in bootcamp_graphics exercise

Under 3 way to build graphs

OECD/EuroStat/Enigma/... now in pandas-datareader

We have at least a few examples where we use pd.read_csv and grab some files using a url where we could use pandas-datareader instead -- For example, there is OECD health data in the cleaning notebook.

My inclination is to move of the data reading as possible to pandas-datareader because it is less likely to stay down than relying on a specific url.

Thoughts? I'm excited about some of their new additions!

subway deserts

Make our own "subway deserts" map. Could also do "closest station" voronoi map.

Can get from enigma.io -- dataset

Could also grab house prices and see whether on the voronoi diagram whether houses closest to the station were more expensive.

Also have data on how much each station gets used -- dataset

my binder

If we end up merging #21 and no longer have static previews of our notebooks, it would be great to add a my binder button so interested parties can get a live preview!

GDP typo in bootcamp_web_apis.ipynb

There is an equation that says Y = C+I +H + NX

I think the H should be G!

Also this is a test for me...

uber demand curve + elasticities

From Bosco Ballé (UG Fall 2016 student)

http://blogs.wsj.com/economics/2016/09/19/ubers-pricing-formula-has-allowed-economists-to-map-out-a-real-demand-curve/

nyudatabootcamp / materials Goto Github PK

materials's Introduction

Data Bootcamp

Related information

Contributor information

materials's People

Contributors

Stargazers

Watchers

Forkers

materials's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs