GithubHelp home page GithubHelp logo

haileyhoyat / 15-years-pycon Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tanyaschlusser/15-years-pycon

1.0 0.0 0.0 489 KB

Code to reproduce our PyCon 2018 poster "15 years of PyCon".

License: BSD 3-Clause "New" or "Revised" License

Python 39.60% Jupyter Notebook 60.40%

15-years-pycon's Introduction

15 years of PyCon

Thank you for your interest in our poster! This repo contains code to reproduce the analysis and visualization.

License notice

Some content in this repo is not ours, and the MIT license does not apply to that content. Please see the directory LICENSED_CONTENT for identification of licensed content and to read their respective licenses.

Setup

If all you want to do is see how to make the poster, skip to the visualization section.

Otherwise, for the analysis, you will need Python 3.5+ because I used print as a function and probably other things. If you use Python 3.4, you can't call help() on some things in SQLAlchemy because of a thing about inspect.py that's gone in 3.5+. I didn't realize that until cleaning up this repo for sharing though, so everything worked OK on 3.4.

pipenv --three
pipenv install --skip-lock
pipenv shell
# and `exit` to exit...

Enter each directory to do the relevant work for each step.

# data

This directory will contain the database (it's 30MB so it's on Dropbox not GitHub), plus the SQLAlchemy ORM. You don't need to directly run anything in here; the path to database.py is prepended to the Python path in both acquisition and analysis.

# acquisition

This directory contains a script run_all_acquisitions.py to run the data acquisition or download the database from Dropbox; it gives an interactive choice. It will put the database in data/PyCons.db.

(The interactive choice is just to run all the scraping code or to curl from here: https://www.dropbox.com/s/3muutb5uw15g5tp/PyCons.db?dl=1 if you'd rather do that manually.)

Scraping is partly manual to deal with different spellings of names, so expect to spend an hour or two answering 'Y' or 'n' to questions like 'is Enthought' the same as 'Enthought, LLC.'?

# analysis <>

The analysis is done in a Jupyter notebook, and shows attempts at simple word frequency, clustering, and Latent Dirichlet Allocation. In the end, it was clear manual labeling would be the best option. The Excel file in data/all_talks_byhand.xlsx contains the manual labels. It was converted to a JSON, then annotated to add the captions in visualization/data/topic_graph_byhand.json.

# visualization <>

This directory is independent of the rest of the project. If all you want to do is reproduce the poster, go there and follow the instructions. You do not need to pipenv install anything.

15-years-pycon's People

Contributors

tanyaschlusser avatar

Stargazers

Anne Thorpe avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.