Light

erictleung / 2017-new-coder-survey Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 52 KB

:beginner: Code to help clean and format the 2017 New Coder Survey by freeCodeCamp

R 99.31% Makefile 0.69%

coder-survey data data-cleaning dplyr freecodecamp

2017-new-coder-survey's Introduction

Hi there, I'm Eric! 👋

Instructions for living a life. Pay attention. Be astonished. Tell about it.

— Mary Oliver, American poet

🧑‍💻 Currently: Data Scientist (full-time), Engineer (at heart), and Educator (in practice)
🔭 I’m currently working on writing math articles for freeCodeCamp, and fun data side projects.
🌱 I’m currently learning about Emacs Lisp to build my own packages, marketing, Econometrics with R, Bayesian statistics, and Causal inference as personal/professional endeavors for the future.
📚 Currently reading: The Diamond Age: Or, a Young Lady's Illustrated Primer by Neal Stephenson.
👯 I’m looking to collaborate on open source tools that empower individuals to solve problems and learn.
💬 Ask me about open science, open-source culture, data science, healthcare reform, and education reform.
😄 Pronouns: he/him
⚡ Fun fact: I like writing with fountain pens on good stationery and notebooks.
📫 How to reach me: Twitter or LinkedIn

Last updated: 2024-01-30

Twitter • Website • Email • Wikipedia • Mastodon • LinkedIn • StackOverflow • Threads • Quora • Tableau • DEV

2017-new-coder-survey's People

Contributors

Stargazers

Watchers

2017-new-coder-survey's Issues

Clean up "Other" answer for "If you have listened to coding-related podcasts before, which ones have you found helpful?"

Find relatively most common answers given
Create new column for popular answers

Clean up "Other" answer for "If you have watched coding-related YouTube videos before, which channels have you found helpful?"

Find relatively most common answers given
Create new column for popular answers

Clean job interests

This year, surveyors could select multiple answers for job interests. So to make it easier for interpretation is to convert the text into binary booleans.

Fix inconsistent response formatting in columns 96--115

See freeCodeCamp/2017-new-coder-survey#3

Generalize removal obvious outliers

I've removed a couple outliers already, but here I want to add a function to remove other obvious outliers.

A list of some things to look for:

Part 2
- ExpectedEarning == "xxxxx"
- MoneyForLearning has more than three zeros, look at two or more
- Age == 120
- Gender Other has some non-relavent answers
- ChildrenNumber > 50
- Income > 100,000,000

Update renaming survey part 2 function

Clean number of months programming

Change e.g. 04 to 4
Remove characters
Remove quantifier of "months" e.g. 4 months

Make column data types uniform between parts before joining

Check if columns in the second part have "undefined" in the columns
Check if columns in the second part need to be changed from yes/no to 1/0
Check for truncated answers
Standardize data types between data sets to allow joining in next step

Create separate clean data set with threshold of answered questions

Some individuals left many answers blank so it might be convenient to have a separate data set with individuals who have answered more than xx% number

From the plot, it looks like a general cut off might be to keep people who've answered at least 50% of the survey. This plot is just the second part, so after the data sets have been merged, this threshold may change.

Code to generate the plot:

part2 %>% apply(MARGIN = 1, FUN = function(x) sum(!is.na(x)) / length(x)) %>%
    hist(main = "Distribution of Percent of Questions Answered in Part 2", 
          xlab = "Percent Answered (%)")

Update datapackage.json to final data specification

The datapackage.json file should be in line with the final cleaned and merged dataset for documentation purposes.

Clean up "Other" answer for "Which one of these careers are you interested in?"

Find relatively most common answers given
Create new column for popular answers
Make sure answers in "Other" match original answers (i.e. "Full-Stack Web Developer" as answer to "Other" when "Full-Stack Web Developer" is one of original answers)

Update renaming survey part 1 function

Clean up "Other" answer for "Which online learning resources have you found helpful?"

Find relatively most common answers given (total = 1107 so if greater than 10% ~= 10)
- YouTube
- SoloLearn
- Microsoft Virtual Academy
- Books
- CodeCombat
- Codefights
- Code School
- DataCamp
- Exercism
- Frontend Masters
- GitHub
- Google
- Laracasts
- Launch School
- OpenClassroom
- reddit
- scotch.io
- SoloLearn
- TheNewBoston
- tutorialspoint
- Watch and Code
- Wes Bos
Create new column for popular answers

Clean up "Other" answer for "If you have attended in-person coding-related events before, which ones have you found helpful?"

Find relatively most common answers given
Create new column for popular answers

Clean expected earnings

Remove dollar signs
If value is in the form of xx.000, then convert to xx000 (i.e. remove the decimal place)
Deal with ranges of numbers e.g. "80k-100k"
Remove unnecessary answers e.g. "over $200 monthly"
No need to normalize values (e.g. low values to thousands) - let end user decide

Join together parts 1 and 2 of the survey data

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs