Comments (6)
@erictleung, I reopen this issue, because PR was already merged.
Everything seems to be good, but 3 variables: CommuteTime, HomeMortgageOwe, StudentDebtOwe.
I had investigated CommuteTime for a bit.
> NROW(data.Learn[!is.na(data.Learn$CommuteTime)&data.Learn$CommuteTime>300,])
[1] 84
> NROW(data.Learn[data.Learn$LanguageAtHome=="English"&!is.na(data.Learn$CommuteTime)&data.Learn$CommuteTime>300,])
[1] 13
> NROW(data.Learn[data.Learn$LanguageAtHome!="English"&!is.na(data.Learn$CommuteTime)&data.Learn$CommuteTime>300,])
[1] 71
There are 84 answers more than 5 hours, and 71 of them from not-native English speakers.
The reason I made language filter is because:
- "commute" is not a popular word, I also remember to google the meaning of a word as I never saw it before;
- there are more people who answered 8 hours rather than 5, 6 or 7 hours, which is very weird;
- 95% of those who typed 8 hours don't use English to talk with their families;
- 8 hours is an average working day;
- "commute" is similar to a word "commit";
So my bet is that many non-native speakers had mistaken the question:
About how many minutes total do you spend commuting to and from work each day?
And they thought that we were asking about how long is their working day.
So for CommuteTime I suggest to cut off all the answers greater than 300 (5 hours) into NA, and not into 300, because we have no idea how much is their real commute time, as they confused the question or make some totally unreal number, like 600 or 1000, etc.
For HomeMortgageOwe and StudentDebtOwe we just need min & max values, because mortgages like "35" or "10 000 000" don't look trustful.
I guess min. value of $1000 for both HomeMortgageOwe & StudentDebtOwe should be good to go.
Answers less than min. value makes sense to cut off into NA.
As for max. value, I have no idea, you should know it better. But I doubt that it's more than $1KK for mortgage or $500K for education.
Answers greater than max. value makes sense to set to max. value.
Summary:
CommuteTime
>300
cut off into NA
StudentDebtOwe
<$1000
cut off into NA
>$500 000
set to $500 000
HomeMortgageOwe
<$1000
cut off into NA
>$1 000 000
set to $1 000 000
from 2016-new-coder-survey.
Hi @SamAI-Software
We practised some parsing but could you please QA recent changes by downloading the resulting file and suggesting discrepancies that could still exist?
Please contact @erictleung to confirm a copy of the file is available.
cc. @erictleung
from 2016-new-coder-survey.
@SamAI-Software thanks for looking at the data. I did realize this and started changing the salaries, removing dollar sign and averaging ranges I found. I see you've already found my PR on this over in #29. I think we can close this issue and just carry the conversation over there.
from 2016-new-coder-survey.
@erictleung cool, closing this issue.
from 2016-new-coder-survey.
Sorry, this has been long overdue. I'm got most of the code ready. I was going to try and get an updated data dictionary at the same time, but maybe I should just settle on the data first and then the data dictionary later.
I'm away from my primary development environment until next week. So I'll try to get a PR in by the end of next week dealing with these normalization issues. I also found some spelling mistakes I fixed as well..
from 2016-new-coder-survey.
@erictleung cool, but also consider, that data dictionary was already PR-ed with missing variables by @M0nica
0789b4f
from 2016-new-coder-survey.
Related Issues (20)
- What makes someone more or less likely to prefer remote work?
- What are some interesting observations about parents learning to code, esp. single parents/primary caregivers? HOT 2
- Feedback on Survey and Future Survey Questions HOT 4
- List of interesting facts HOT 1
- List of interesting visualizations HOT 5
- Which country pays more to software development and IT employee who is 25 - 30 years old? HOT 1
- Podcast breakdowns: hoursLearning, TimeProgramming? Demographics? HOT 4
- Preferred Podcast by Role?
- Cross data visualization
- What drives you to become a developer? HOT 2
- Bear
- Add FCC favicon
- Check map data HOT 5
- are minorities landing jobs during / after completing a code camp?
- Problems with Expected Earning HOT 3
- The meaning of the field IsSoftwareDev? HOT 2
- Population data in 2016? HOT 1
- Competence
- Why are older applicants decreasingly likely to get a job after bootcamp?
- I am new here and want to learn coding HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 2016-new-coder-survey.