GithubHelp home page GithubHelp logo

Comments (6)

SamAI-Software avatar SamAI-Software commented on July 18, 2024 1

@erictleung, I reopen this issue, because PR was already merged.

Everything seems to be good, but 3 variables: CommuteTime, HomeMortgageOwe, StudentDebtOwe.

I had investigated CommuteTime for a bit.

> NROW(data.Learn[!is.na(data.Learn$CommuteTime)&data.Learn$CommuteTime>300,])
[1] 84
> NROW(data.Learn[data.Learn$LanguageAtHome=="English"&!is.na(data.Learn$CommuteTime)&data.Learn$CommuteTime>300,])
[1] 13
> NROW(data.Learn[data.Learn$LanguageAtHome!="English"&!is.na(data.Learn$CommuteTime)&data.Learn$CommuteTime>300,])
[1] 71

There are 84 answers more than 5 hours, and 71 of them from not-native English speakers.
The reason I made language filter is because:

  • "commute" is not a popular word, I also remember to google the meaning of a word as I never saw it before;
  • there are more people who answered 8 hours rather than 5, 6 or 7 hours, which is very weird;
  • 95% of those who typed 8 hours don't use English to talk with their families;
  • 8 hours is an average working day;
  • "commute" is similar to a word "commit";

So my bet is that many non-native speakers had mistaken the question:

About how many minutes total do you spend commuting to and from work each day?

And they thought that we were asking about how long is their working day.

So for CommuteTime I suggest to cut off all the answers greater than 300 (5 hours) into NA, and not into 300, because we have no idea how much is their real commute time, as they confused the question or make some totally unreal number, like 600 or 1000, etc.


For HomeMortgageOwe and StudentDebtOwe we just need min & max values, because mortgages like "35" or "10 000 000" don't look trustful.
I guess min. value of $1000 for both HomeMortgageOwe & StudentDebtOwe should be good to go.
Answers less than min. value makes sense to cut off into NA.
As for max. value, I have no idea, you should know it better. But I doubt that it's more than $1KK for mortgage or $500K for education.
Answers greater than max. value makes sense to set to max. value.

Summary:

CommuteTime
>300 cut off into NA
StudentDebtOwe
<$1000 cut off into NA
>$500 000 set to $500 000
HomeMortgageOwe
<$1000 cut off into NA
>$1 000 000 set to $1 000 000

from 2016-new-coder-survey.

evaristoc avatar evaristoc commented on July 18, 2024

Hi @SamAI-Software
We practised some parsing but could you please QA recent changes by downloading the resulting file and suggesting discrepancies that could still exist?

Please contact @erictleung to confirm a copy of the file is available.

cc. @erictleung

from 2016-new-coder-survey.

erictleung avatar erictleung commented on July 18, 2024

@SamAI-Software thanks for looking at the data. I did realize this and started changing the salaries, removing dollar sign and averaging ranges I found. I see you've already found my PR on this over in #29. I think we can close this issue and just carry the conversation over there.

from 2016-new-coder-survey.

SamAI-Software avatar SamAI-Software commented on July 18, 2024

@erictleung cool, closing this issue.

from 2016-new-coder-survey.

erictleung avatar erictleung commented on July 18, 2024

Sorry, this has been long overdue. I'm got most of the code ready. I was going to try and get an updated data dictionary at the same time, but maybe I should just settle on the data first and then the data dictionary later.

I'm away from my primary development environment until next week. So I'll try to get a PR in by the end of next week dealing with these normalization issues. I also found some spelling mistakes I fixed as well..

from 2016-new-coder-survey.

SamAI-Software avatar SamAI-Software commented on July 18, 2024

@erictleung cool, but also consider, that data dictionary was already PR-ed with missing variables by @M0nica
0789b4f

from 2016-new-coder-survey.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.