GithubHelp home page GithubHelp logo

ireapps / pycar Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tommeagher/pycar14

101.0 23.0 36.0 6.89 MB

NICAR Python mini boot camp

Home Page: https://ireapps.github.io/pycar/pycar_intro.html

License: MIT License

HTML 44.18% Jupyter Notebook 55.73% Dockerfile 0.09%

pycar's Introduction

.-,--.      ,--.     ,.  .-,--.
 '|__/ . . | `-'    / |   `|__/
 ,|    | | |   .   /~~|-. )| \ 
 `'    `-| `--'  ,'   `-' `'  `
        /|
       `-'

Python mini bootcamp

In this two-day workshop, we'll learn the basics of the Python programming language and how to begin analyzing data in a Jupyter Notebook. What's a Notebook? It's an interactive coding environment that lets you blend words and code.

Confused? Of course.

Bear with us. It will all make sense soon.

Instructors, check out the Teacher's Guide.

Day 1

Intros

  • Who are you, what do you do, what do you want to learn?
  • What will we learn?
  • What can I do with it?
  • GOAL: Learn how to solve problems with code.

Key concepts of programming in Python:

  • Basic data types - strings, integers, lists
  • Lists are your friend!
  • etc

Bonus: a discussion on debugging and handy cheatsheet

Day 2

As with many data analyses, it all starts with a CSV. After a white board exercise, we'll start with a file of pseudocode, and we'll walk through writing the program in Python code, running each line in the Jupyter interpreter. We'll hold your hand through each step of the process.

This project is out-of-date. We'll try to update it in the near future.

This section covers gathering data from the web in two common formats.

In the first part, we'll scrape structured data from an HTML page using a GET request and write the data to a CSV. In the second part, we'll request data from an API to get information programmatically to create a spreadsheet. Our data comes in a new format: JSON. We'll do some more with the white board to show how it's basically a combination of data structures we already know about: Lists and dictionaries (arrays and objects).

Now we get to the heart of data analysis with an introduction to the powerful pandas library. Building on the basics we've already learned, and a little knowledge of SQL, we'll clean two related tables of data, join and filter them.

At the end of the day, we'll send you home with:

Help!

If you're working through this code at home and have trouble, please let us know.

The best way to reach us is by submitting an Issue on GitHub.

pycar's People

Contributors

aboutaaron avatar chrislkeller avatar elainewong avatar esagara avatar hbillings avatar katiepark avatar kevinschaul avatar meli-lewis avatar rdmurphy avatar richardsalex avatar robroc avatar scott2b avatar smbsimon avatar thejqs avatar tommeagher avatar zstumgoren avatar zufanka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pycar's Issues

Handout(s)?

One piece of early feedback we got last year was that some students really love paper and wanted handouts.
We provide a lot of links in the repo. Would any of these work as a good one or two page handout? Or do you know of another that would be better?

PyCAR 15 feedback thread: Official session feedback or anecdotes...

Dunno if there's a better way to catalog this - or if we even should - so figured an issue would work?

Data reporter with experience in R and SQL but not Python

My main thing is that even these basics are a lot to pack into a single day. Anything that helps me 1) commit syntax to memory or 2) gives me incentive to use python once I go back to my job is a good thing.

  • Ideally I’d use my own machine. I know this makes no sense from POV of instructors, but still.
  • A handout with some of the basics would be nice. The code is on Github, yeah, but having that piece of paper makes a difference.
  • I want to figure some of it out for myself, even if they’re just small things. A few moments where you’re like "now take a minute and import this library by yourself" will help me commit to memory the syntax.
  • We’re at such a basic level that it’s hard to figure out how I’m going to do amazing data work using python. The presenters could share projects they’ve worked on that python made possible. Not getting into the code, but just showing us why we should go back and keep using it.
  • Tell me why python is so awesome. What is easier in python than in R? Or excel even. We did go over some of this, but couldn’t repeat most it back to you.

project2: What does "**" mean?

Missed the conference and following the tutorial at home.

95: merged_data.append(dict(dict1[key], **dict2[key])): what does the "" mean? The comment in the file notes that "" expands the values - not sure what 'expand' means here.

StackOverflow notes that "**" indicates extra arguments may or may not be used. Not sure if that note applies to this context and if so, what extra arguments we may be referring to.

Add instructions for getting Jupyter running

I don't think we ever really told people how to run the exercises at home (I've been trying to get it to run locally and not having a ton of luck, though I think it's because my personal virtualenv setup is foobarred at the moment). That would probably be helpful info!

Variables, methods, objects

A small piece of feedback on PyCAR16 from @mjwebster:

I think it would help beginners a lot if you named variables in a way that makes it much easier to see what's a variable and what's not. For example, something like "vrow" or "vfilename" or "varfilename" or "var_row". Anything that sets them apart from everything else. It gets confusing when you have a variable named "writer' and then a command that includes "writer" and stuff like that.

I agree this is very confusing for newcomers to programming and to Python. Is there a way to simplify this, or to explain why we name the variable after the object created by the same-named method?

project 3 workflow

Ok, reiterating my question about project 3, hopefully in the right repository this time.

Is it the case that the file scraper.py is meant to be the working file for the narrative outlined in the README? So the idea would be to work through the Readme in that file? Or is the Readme meant to be explained via the shell and then get_json and json_to_csv (and scraper?) done in files?

How would we introduce a few levels of basic instruction that have a related and linear curriculum?

Based on suggestion at PyCAR 15 feedback thread

@mikejcorey writes:

There were Pythons part 1-3, but some other sessions also used terms like "intermediate" and "advanced", however, which made it somewhat confusing about where they fit in the progression, and as far as I know they weren't actually related to each other. I would suggest rebranding those as specialized (and highly worthwhile) topics: "Refactoring your code" or "object-oriented Python" and "Python for data analysis."

@esagara writes:

I feel we need to present python in a way that is easily accessible to new users - i.e. an environment they will be able to easily recognize when launching up python for the first time when they return home. The course should be about learning the language, not setting up a work environment - though I would like to see a separate session on that.
...
I like the Python 1, 2, 3, etc. labeling. It makes progression clear and concise. Python for Data Analysis or Object Oriented Python tend to be confusing to people new to programming.
...
Eight hours is a lot to tackle in a day. One hour is too short. I would like to see what people think about splitting up the course in chunks of two to four hours over multiple days. This allows both students and instructors to take a break. It would also give us a chance to assign homework where we ask students to look at other sessions - be it in Python or something unrelated. Then we can talk about briefly about how Python applies to those other sessions. This may make it easier to "flip the light switch" so to speak.

BeautifulSoup4 may need to be upgraded

I just tried bootstrapping the pycar repo using pip install -r requirements.txt. When importing BeautifulSoup it throws an error - ImportError: cannot import name 'HTMLParseError'. This is fixed by running pip install --upgrade beautifulsoup4. Apparently this has been an issue since Python 3.5. Do we need to update requirements.txt?

Software requirements

We need to give IRE a list this week of software we want pre-loaded on the machines for us. The good news is we'll be on Macs this year.

So I'm pretty sure we'll want:

  • Python 3
  • pip
  • virtualenv
  • virtualenvwrapper
  • requests
  • bs4
  • jupyter

Anything else I'm not thinking of?

project4/step_3: "bill" not defined

Line 28: for bill in objects:...
The error message "bill not define" results after I put in the command.
We do have the bills.json and bills.csv files, but no "bill" variable. Thoughts?

Can we standardize and optimize the tools and libraries that are used in teaching Python at NICAR?

Based on suggestion at PyCAR 15 feedback thread

@mikejcorey writes:

I sat in on at least one Python session where at least the first half-hour was spent trying to get everyone's IPython Notebook up and running. Partly this was because the instructor was trying to accommodate people using their own laptops, but people taking the class had to learn how IPython Notebook acts before they could start learning Python. IPython Notebook is a great tool, but that doesn't strike me as the most efficient or necessary use of time.

There are other, lighter-weight options, like using the shell in the command line directly, IPython without the notebook (which would give learners more feedback than Python by itself on the command line), or using Sublime Text's built-in Python interpreter.

I don't want to be needlessly restrictive in how people teach, but it would be great if students could sit down in any of those classes and be able to use a familiar environment in any of them once they've taken any of the others.

@rdmurphy writes:

Agreed. Even in PyCAR we clash a bit – the first lesson uses urllib.urlretrieve, then the third lesson uses requests. I was kinda surprised no one in our course asked why it suddenly changed.

Some of the Python courses used anaconda, others didn't. I feel like if we could standardize on a suite of tools (BeautifulSoup vs. PyQuery) that all the courses would build from, we could ensure consistency and that "drop in anywhere" ability.

@aboutaaron writes:

Sublime Text 3 saved our asses. Many folks found IPython confusing and while it was great for data types, it was a bit more complex down the road (to us). We ended up using the Build tool in Sublime Text to print the script output directly in the editor. This was really successful when we worked with and manipulated CSVs.
....
Oh, and it looks like there were several pycar folders on the desktop so we ran into some issues were folks were in the wrong directories. For example, one computer had around four pycar directories and it took us a minute to figure out exactly which one was which.

I think if we fork the project, we'll definitely need to scope the projects to their days and instructors unless it's entirely the same class being taught.

2 vs. 3

I feel like there should be some early coverage on Python 2 vs. Python 3. This is generally a source of confusion for newcomers. People tend to interpret higher numbers to mean latest version, and when it comes to major versions in particular, they tend to want to grab the latest and greatest.

I think there is a tendency by outsiders to perceive the ongoing run of parallel development to be some kind of debacle or rift in the community. I like to underscore the fact that it is quite the opposite, and that the conservativeness by which Python has made this transition is really one of the great things about the Python community. It seems like every time I try to do something with Ruby or Node, to name a couple of examples from personal experience, I have to have the latest and greatest of everything. Some development communities seem to have this driving need to continuously push the technology forward. I find I experience much less of this with Python, while it still manages to stay vibrant and growing and current.

Also, I have not scrutinized the code enough to know for sure, but I wonder if it would be a stretch for the code examples to all be both 2 and 3 compatible. I think this would mostly depend on external dependencies. If it can be done, it might prevent some headache as there are bound to be attendees who have already installed 3 before the workshop. Of course there is a teaching opportunity in running different versions side by side, but I feel like that falls a bit out of scope for what we are doing here.

The batting order

How do we want to do this next week? Here's a proposed lineup, but please make suggestions if you have other ways you want to do it.

Room 1
Intro - Tom
Project 1 - Tom/Heather
Project 2 - @hbillings
Lunch
Debugging chat - Heather
Project 3 - @esagara
Project 4 - Eric/Tom

Coach - @richardsalex

Room 2
Intro - @chrislkeller
Project 1 - @chrislkeller
Project 2 - @rdmurphy
Lunch
Debugging chat - ?
Project 3 - @katiepark
Project 4 - @kevinschaul

I'm not sure when everybody gets in, but maybe we can meet up Wednesday evening for a drink just to make sure we've dotted all our i's.

What do you think?

Change basics to include if statements

For Pycar 2018, we ended up finishing the basics exercise with introductions of if statement and nested if statements instead of quickly introducing for loops. It would be good to refactor the ending of the basics lesson to match this.

fix mistakes in scraper README

@thejqs noticed the slicing notation is wrong. It says rows[3:4] but should be rows[3:5]

And we accidentally call it "unemployment.csv" in one spot rather than movies.csv

Use Jupyter notebooks to teach all projects

This was mentioned in #11. Rather than using the interactive interpreter or the IPython interpreter, for a class of beginning coders, it will likely be a lot easier to use Jupyter notebooks to teach these principles.
We can still break out comments and fill-in-the-blank code blocks throughout, and it will let you run each line or block of code interactively, to get the sense of how each one works.

This is certainly open to discussion, but I think this is a great (and easily doable) idea to help people get up to speed more quickly. Thanks @cezary4 and @scott2b for the suggestion.

How can we make this better?

Friends, thanks for joining the team for this year's PyCAR bootcamps.

To start this off, would you each take a few minutes to review the repo, which the folks at IRE kindly forked from last year's lessons, and offer your thoughts on how we should update it?

After that discussion, we can divvy up the batting lineup for the day, but I wanted to start with some more open-ended thoughts.

What do you like that's in here (or you remember from last year) and what would you like to change?

As a reminder, we ended last year ( tommeagher#19 ) by pondering whether to deal with encoding and agreeing to jettison wakari. What else?

@hbillings, @kevinschaul, @katiepark, @chrislkeller, @rdmurphy, what do you think? Feel free to respond to this thread or open new issues as needed.

Oh, and @esagara, accept the invite to the team already.

Batting order for PyCAR16

Now that we've settled on the code and agenda for the day, we need to agree on who will teach which section. I'd like to do the early introductions of the day, but I'm flexible to pick up whatever project or section that isn't filled by one of you.

  • Introduction (@tommeagher)
  • The Basics (@tommeagher) - Starting with this slide, we'll introduce some key Python types like the interactive interpreter, strings, integers, lists, slicing and loops.
  • Project #1 (@richardsalex) - After a white board exercise, we'll start with a file of pseudocode, and we'll walk through writing the program in Python code, running the file at the command line.
  • A discussion on debugging (@zufanka)
  • Project #2 (@zufanka) - We have a CSV of baseball player salaries. Let's figure out who makes the most money and examine some other biographical information about them, using dictionaries.
  • Project #3 (@scott2b) - Scraping from html and ingesting JSON from an API

So, @richardsalex, @zufanka, @scott2b, which section of the day would you like to teach? If you each pick one project (or the basics), we should be set. The rest of the day, you'd be helping teach and coaching students, along with @cathydeng, who I believe is going to help coach in the morning.

Rework basics exercise

The solitaire example we used to introduce the class seemed to work pretty well as an intro to thinking about programming. I think it would be a good idea to rework the basics exercise to include/build on the solitaire example, so that people can have some context for what the different data types can represent. I've got some notes for running through that card game example, so I can try combining them with the info in the basics notebook if folks give me the thumbs-up!

Add newline='' argument to open?

When using Python 3 in Windows with csv, sometimes an extra carriage return is added after each row. Result: a blank row after each row of data.

I propose adding the newline = '' argument to open, as such:

with open(FILENAME, 'w', newline='') as outfile:
    writer = csv.writer(outfile)

SO question that addresses this.

Round 3!

It's time to start preparing for NICAR16 and our third iteration of PyCAR. This year, there will be a few changes.

First, our teaching squad will be evolving. Sadly @chrislkeller, @kevinschaul and @katiepark will not be able to join us in Denver. And @esagara and @rdmurphy will be teaching other classes. We wish all of them the best and salute them for their service in years past.

So this time we'll have myself, @hbillings, @zufanka and @richardsalex teaching this day-long class. Also this year, we'll be teaching one session, rather than two, in a slightly larger room, and we expect a full house.

Over the next few weeks, we'll restart some of the conversations on the open issues from last year, and we'll tweak the lesson plans a bit. @chrislkeller and @esagara have kindly agreed to help advise in that process.

I'll bug you all more soon, but wanted to get this started. I'm psyched to be doing this again and look forward to working with you.

Refactor project 2

The class this year was almost completely overwhelmed by project 2. It started to go astray somewhere around here.

There are many new concepts being introduced here:

  • adding and unpacking dicts
  • functions and return values
  • try/except
  • type conversion (from string to int)
  • nested for loops
  • this one-liner can be particularly difficult for newcomers to parse.
  • joining, which is more easily and commonly done in agate or pandas.

I've always liked that this project seems more explicitly about data cleaning and analysis than any of the others, but we need to think about how we can reorganize and simplify this exercise.

Balance the breakdown of the teachers...

@hbillings and I talked on Saturday night about a great many things & this came up: we need to involve more women to help teach this class.

I'm sure any of us will step aside if it means increasing the gender ratio.

Who's on first?

@eklucas and our friends at IRE need to know which room each of us will be in this year.

What do you think about this lineup, to start the discussion?

Room 0

Room 1

This is not the order of teaching for the day, just to get us each in a room. If you want to switch rooms, that's no problem, just let me know, on here or on email.

If I don't hear from any of you in the next 36 hours, I'll assume we're good, and ship this list to Liz.

On a related note, we could probably use some extra coaches to walk the room and help folks who get stuck. We'll particularly need help in Room 1, and when some of us have to step out to speak at other sessions.
Who do you think we should recruit to lend a hand?

Thanks,
Tom

Tweak ending to solitaire example

Everything went great until we got to the last command

for column in columns:
    if card_to_play == column["last_card"]["value"] - 1
        break

There's a big leap there, including a for loop for a list (columns) that we haven't created yet.
We called an audible and instead added a conditional to check if the card is one less than the previous, and if the suit is the same.

 if play_card["value"] == first_column_card["value"] - 1:
        if play_card["suit"] != first_column_card["suit"]:
            if play_card["color"] != first_column_card["color"]: # or create another dict with suits and colors to lookup
                #run a function moving cards
                pass
            else:
                print("You can't play on the same color")
        else:
            print("you can't play on the same suit")
else:
    print("You can't play this card here")

Travel safely to Atlanta...

Thanks to all of you--@hbillings, @esagara, @chrislkeller, @rdmurphy, @katiepark, @kevinschaul, @richardsalex--for the time, energy and enthusiasm you've put into getting ready for this year's PyCAR.
By my rough count, there have been more than 30 commits to this repo since Jan. 1. I think Thursday's classes are going to be awesome, and it's entirely due to your hard work. So thank you.

Now, get to Atlanta safely and painlessly. For those who are in town, we can huddle up tomorrow evening in the bar to discuss any last-minute details. Otherwise, we'll get rolling Thursday morning, and the first round of drinks Thursday afternoon is on me.

Is it possible to create a structure to create related lessons without losing the advantages of having many voices involved in the process?

Based on suggestion at PyCAR 15 feedback thread

@mikejcorey writes:

Could we organize a (small) committee of Python folks to facilitate this? The group could take suggestions and feedback such as what's happening on this thread, post a recommendation, get some more feedback, and take a yea-nay vote?

I hate committees as much as the next journalist, but Python has made a big impact on the NICAR community, and I don't want to ruin it by big-footing ourselves into the rest of the conference.

@esagara writes:

Can we standardize curriculum throughout the Python track (standardize may not be the right word)? Can we make it so concepts learnt in an intro course at the beginning of the day/conference are reinforced/expanded upon in more advanced classes further on in the day/conference? It would also help if we split up the main PyCAR course to sessions throughout the week. We could fill in gaps where needed from the more advanced classes.

@chrislkeller writes:

I think my own stumbling point in thinking through this is mentioned in your last heading: without losing the advantages of having many voices involved in the process. The gorgeous thing about NICAR is someone can obtain knowledge and share it with others in a really direct way. No one wants to lose that. No one wants a curriculum committee to thumbs up or thumbs down sessions.

But I too felt those who want to learn would benefit from core concepts being taught, explained and demonstrated over and over - repetition, practice and muscle memory of you will. Landmarks, touchstones, repeated use of terms, doing and seeing are all valuable when it comes to learning.

And this speaks to the scope of what we did with PyCAR right? Devil's advocate: If there are two sessions in which participants will be scraping websites, did we need to do that during PyCAR?

Refactor project 2 (as needed)

Heather, let us know if you need a hand with testing and tweaking the project. Do you want to work on a short debugging interlude for after lunch too?

Why is interacting with files using `rb` and `wb` safer?

We had this come up in our session – people wanted to know why using the b was safer. It'd be nice for us to actually be able to explain that beyond just saying, "Everybody does it! Trust us." 😄

(Which is basically what we did.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.