In New York City, parents were asked to report their kids’ scores on the gifted and talented exam, as well as school priority ranking. The questionnaire was not very well designed so as to force good data collection. As a result there are inconsistencies in reporting, skipped questions, no standard formats, etc, etc.
The relatively small dataset is here.
Your task is to better organize and clean up this dataset, and preferably, have all the data properly represented in some sort of numerical format.
Constraints: You must do this programatically, i.e. using python by itself, or with pandas, numpy, or whatever libraries you can find to get it done. Manually editing the data file is not acceptable. :)
As you go, please explain what you are doing and why. If your actions raise certain issues or risks, mentioning these would be great.