GithubHelp home page GithubHelp logo

wisconsin_teachers's Introduction

This repository provides reproduction code for the paper Teacher Turnover in Wisconsin, which can be downloaded here. The paper was compiled with rmarkdown through knitr; the .Rmd document for the paper is this one.

Raw Data

There are three public sources of data for this paper:

  1. Wisconsin Department of Public Instruction (DPI) collects and releases annual WISEstaff PI-1202 reports which give teacher- (more specificaly, assignment-) level snapshots of the full panoply of school employees in the state. These are available here. The R script raw_data_cleaner.R will download these files and do some baseline touchup to the raw files (which are mostly in fixed-width format) before producing easy-to-use .csv versions of the raw data. The script can be run at the command line with Rscript raw_data_cleaner.R; be sure to customize the wd.data variable to the local paths to which to download the data and write the .csvs.

  2. Wisconsin's WKCE test score data is also released by DPI at the district and school level. As per here, the WKCE is part of the WSAS battery of tests; the full set of these results can be downloaded through the WINSS historical data file repository here.

  3. The NCES Common Core of Data District- and School-level data files.

Pre-Paper Data Cleaning

turnover_paper.Rmd runs the turnover_data_cleaner.R script internally (which does some final data wrangling and runs the COBS routine, but relies on two scripts to be run beforehand:

  1. background_data_cleaner.R assembles school- and district-level files from the DPI and NCES; be sure to customize wds here as well, which tell the script where to find these raw data files and where to write the output.

  2. teacher_match_and_clean.R runs the teacher matching algorithm described in the paper's Appendix in order to create a panel of data from the DPI's cross-sections.

Feel free to file an issue or e-mail me for any further clarification/concerns.

wisconsin_teachers's People

Contributors

michaelchirico avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

snowdj

wisconsin_teachers's Issues

Use ethnicity to improve match fidelity

(From data_cleaning_tasks.txt, 2015-07-25):

Any way to deal with the clear ethnicity mismatch?

        teacher_id year first_name last_name_clean ethnicity district school position_code
    53:     104802 1996  dorothy a         johnson         B     3619   0498            52
    54:     104802 1997  dorothy a         johnson         B     3619   0006            52
    55:     104802 2004  dorothy a         johnson         W     4620                   43
    56:     104802 2005    dorothy         johnson         W     4620                   43
    57:     104802 2006    dorothy         johnson         W     4620                   43
    58:     104802 2007    dorothy         johnson         W     4620                   43
    59:     104802 2008    dorothy         johnson         W     4620                   43
    60:     104802 2009    dorothy         johnson         W     4620                   43
    61:     104802 2010    dorothy         johnson         W     4620                   43
    62:     104802 2011    dorothy         johnson         W     4620                   43
    63:     104802 2012    dorothy         johnson         W     4620                   43
    64:     104802 2013    dorothy         johnson         W     4620                   43
    65:     104802 2014    dorothy         johnson         W     4620                   43
    66:     104802 2015    dorothy         johnson         W     4620                   43
  • PROBLEM: 7-year gap between matches, different location, different ethnicity, different position, etc.

Data analysis tasks

  • Is there much variation in pay across subjects? (especially in core subjects)

Robustness: Real wage methods

For robustness, should test any results' sensitivity to measuring real values by alternative measures, e.g.,

  • Calculate monthly income as average over 12 months, then deflate/inflate wages by month
  • Calculate monthly income as average over 9 months (September - May, then deflate/inflate wages by month
  • Calculate yearly CPI index as 12-month average (January - December) of CPI & deflate annual wages with rolling average
  • Calculate yearly CPI index as 12-month average (September - August) of CPI & deflate annual wages with rolling average

Dealing with noise in certification

(From data_cleaning_tasks.txt, 2015-07-26):

What to do when certification noise makes it seem like a person certifies twice?

               id year first_name last_name highest_degree certified district school position_code
     1: 000101691 1996       bryn     perry             NA        NA     2460     NA            43
     2: 000101691 1996       bryn     perry             NA        NA     6300     NA            43
     3: 000130797 1997       bryn     perry             NA        NA     6300     NA            43
     4: 000255951 1998       bryn     perry              4        NA     6300   0280            53
     5: 000346264 1999       bryn     perry              4     FALSE     6300   0280            53
     6: 000489141 2000       bryn     perry              4     FALSE     6300   0280            53
     7: 000669151 2001       bryn     perry              4     FALSE     6300   0140            53
     8: 000669151 2001       bryn     perry              4     FALSE     6300   0280            53
     9: 000782105 2002       bryn     perry              4     FALSE     6300   0140            53
    10: 000782105 2002       bryn     perry              4     FALSE     6300   0280            53
    11: 000035017 2004       bryn     perry              4     FALSE     6174   0999            53
    12: 000374937 2005       bryn     perry              4     FALSE     6174   0020            53
    13: 000374937 2005       bryn     perry              4     FALSE     6174   0260            53
    14: 000494663 2006       bryn     perry              4     FALSE     6174   0260            53
    15: 000066581 2007       bryn     perry              4     FALSE     6174   0260            53
    16: 000067172 2008       bryn     perry              4     FALSE     6174   0260            53
    17: 000054863 2009       bryn     perry              4     FALSE     6174   0260            53
    18: 000067833 2010       bryn     perry              5      TRUE     6174   0260            53
    19: 000068873 2011       bryn     perry              4     FALSE     6174   0260            53
    20:     68264 2012       bryn     perry              5      TRUE     6174   0260            53
    21: 000066211 2013       bryn     perry              5     FALSE     6174   0260            53
    22: 000066125 2014       bryn     perry              5     FALSE     6174   0260            53
    23: 000065632 2015       bryn     perry              5     FALSE     6174   0260            53

Related to #5 - 1.

Deal with `school` `"0999"`

(From data_cleaning_tasks.txt, 2015-07-17):

School code 0999 and 0000 used (inconsistently) as a dummy for "mutliple schools" / "the district as a whole". Should be dealt with somehow...

Questions for DPI, Aug 24, 2015

Questions about data constructs, etc. for DPI.

  • [From data_cleaning_tasks.txt, 2015-07-23]: Ask about tendency of highest_degree to decrease
  • What is the meaning of the variable assigment_seq (only non-missing in 1996)?
  • Are staff_type (1996), staff_category (1997-99), and category (2000-2015) all the same? category value 4 appears from 2005.
  • What is the meaning of the variable program_sped and program_seq (only 1996 through 1999)?
  • What is the meaning of contracted_employee (present 2000-2004 & 2009-2012)?
  • What is the meaning of long_term_sub (present 2000-2004 & 2009-2015)?

Leverage consistent difference between `local_exp` and `total_exp`

(From data_cleaning_tasks.txt, 2015-07-17):

Use consistent difference between local_exp and total_exp as evidence of a censored switch -OR- of spurious ID loss. May also be an indication that we've missed earlier observations (alternatively, evidence of out-of-state experience).

Sample individuals:

               id year first_name last_name birth_year total_exp local_exp agency
    22: 000002223 1996   sandra l   wittwer       1953      12.0         5   0896 
    23: 000001845 1997   sandra l   wittwer       1953      13.0         6   0896 
    24: 000151645 1998   sandra l   wittwer       1953      14.0         7   0896 
    25: 000276915 1999   sandra l   wittwer       1953      15.0         8   0896 
    26: 000276915 1999   sandra l   wittwer       1953      15.0         8   0896
    27: 000440856 2000   sandra l   wittwer       1953      16.0         9   0896
    28: 000552100 2001   sandra l   wittwer       1953      17.0        10   0896
    29: 000681506 2002   sandra l   wittwer       1953      18.0        11   0896
    30: 000815623 2003   sandra l   wittwer       1953      19.0        12   0896
    31: 000949100 2004   sandra l   wittwer       1953      21.0        14   0896
    32: 000373396 2005     sandra   wittwer       1953      22.0        15   0896
  • suggests 7 years experience elsewhere, but would need pre-1995 data to see the switch
  • CAVEAT: EXPERIENCE NOISY
              id year first_name last_name      nee birth_year local_exp total_exp agency
    1:     99261 2012       kari  debruine                1976         1        10   4620 
    2: 000096498 2013       kari  debruine                1976         2        11   4620
    3: 000096571 2014       kari  schaefer debruine       1976         3        12   4620
    4: 000095966 2015       kari  schaefer                1976         4        13   4620
  • Note that 4620, Racine, WI, is very close to IL.

Data cleaning/fidelity tasks

Some avenues to explore for understanding the data & its idiosyncrasies

  • check in-school versus in-district variance to be as sure as possible that pay is indeed determined at the district level.
  • Evaluate the noisiness of total_exp -- how frequently does it decrease/increase by more than one from year to year? (examine abs(diff(total_exp)) by teacher_id)
  • What exactly is local_exp? Can it be used as a leverage in interpolation? How often is it the case that move_district is TRUE but local_exp > 1, i.e., that local_exp doesn't seem to have reset upon a district switch?
  • Why does Whitnall School District only have 45 teachers listed in 1998, and why are none of them paid? Whitnall phone number : 414 525 8400
  • What is the usage of district code 6799 "Multiple-District Agency"? Why did it cease in 1999?
  • Why is move_school missing for a non-negligible subset of teachers?

Increase robustness of school switch identifier

(From data_cleaning_tasks.txt, 2015-07-23):

Add more robust school/position switch identifier--if any position is new, vs. highest-FTE position

  • Currently, identifying switches by first eliminating all but the highest-FTE position
  • Can be done in a "memoryless" and/or a full-historic way:
    1. memoryless: move_* is T if any * is new in one year compared to the prior year
    2. historic: move_* is T if any current position has never been held in any prior year

Add flag for divorce

(From data_cleaning_tasks.txt, 2015-07-20):

Check for possibility of divorce--last name->maiden name->last name again, e.g.:

               id year first_name         last_name     nee district position_code birth_year
     1: 000012866 2007       joan bacchus-swamidass bacchus     5656            53       1955
     2: 000012781 2008       joan bacchus-swamidass bacchus     5656            53       1955
     3: 000010079 2009       joan         swamidass      NA     5656            53       1955 
     4: 000012920 2010       joan         swamidass      NA     5656            53       1955 
     5: 000013089 2011       joan         swamidass      NA     5656            98       1955 
     6: 000013089 2011       joan         swamidass      NA     5656            53       1955 
     7:     13015 2012       joan         swamidass      NA     5656            98       1955 
     8:     13015 2012       joan         swamidass      NA     5656            53       1955 
     9: 000012568 2013       joan         swamidass      NA     5656            53       1955
    10: 000012661 2014       joan         swamidass      NA     5656            53       1955
    11: 000012688 2015       joan         swamidass      NA     5656            53       1955
  • BEWARE: could easily just be that they elected to stop using their maiden name at some point; either way, it may be useful to keep track of such individuals after they lose their second name

Fuzzy string matching could improve match fidelity

(From data_cleaning_tasks.txt, 2015-07-17):

Typos/misspellings/changes to nicknames causing missed matches, e.g.:

               id year first_name     last_name birth_year agency school position_code area
     1: 000475820 2000    jenifer lemke-bublitz       1959   1945                   67 0000
     2: 000325187 2005    jenifer lemke-bublitz       1959   4515   0140            53 0830
     3: 000451787 2006    jenifer lemke-bublitz       1959   4515   0140            53 0830
     4: 000017457 2007    jenifer lemke-bublitz       1959   4515   0140            53 0830
     5: 000017289 2008    jenifer lemke bublitz       1959   4515   0140            53 0811
     6: 000013811 2009    jenifer lemke bublitz       1959   4515   0140            53 0811
     7: 000017260 2010    jenifer lemke bublitz       1959   4515   0140            53 0811
     8: 000017460 2011    jenifer lemke bublitz       1959   4515   0140            53 0811
     9:     17287 2012    jenifer lemke bublitz       1959   4515   0140            53 0830
    10: 000016629 2013    jenifer lemke bublitz       1959   4515   0140            53 0830
    11: 000016695 2014    jenifer lemke bublitz       1959   4515   0140            53 0830
    12: 000016657 2015    jenifer lemke bublitz       1959   4515   0140            53 0830
    13: 000577332 2001   jennifer lemke-bublitz       1959   1945   0080            97 0883 
    14: 000769347 2002   jennifer lemke-bublitz       1959   1945   0080            97 0883 
    15: 000848947 2003   jennifer lemke-bublitz       1959   1945   0080            97 0883
    16: 000011642 2004   jennifer lemke-bublitz       1959   1945   0080            97 0883

And

              id year first_name last_name birth_year agency school position_code area
    1: 000218099 1998   patricia      haas       1959   4473   0060            53 0050
    2: 000352882 1999   patricia      haas       1959   4473   0050            53 0050
    3: 000352882 1999   patricia      haas       1959   4473   0060            53 0050
    4: 000482863 2000   patricia      haas       1959   4473   0050            53 0050
    5: 000482863 2000   patricia      haas       1959   4473   0060            53 0050
    6: 000610451 2001   patricia      haas       1959   4473   0060            53 0050
    7: 000682423 2002    patti j      haas       1959   1015   0070            97 0907 
    8: 000839439 2003   patricia      haas       1959   2058   0110            53 0620

Organizing background data

Lots of stuff to clean up in background data. Be sure to incorporate as much WINSS data as possible.

  • ACT data
  • AP data
  • Attendance data
  • Census data
  • Community activities data
  • Enrollment data
  • Extracurricular data
  • Graduation requirements data
  • HS completion data
  • Post-graduation data
  • Salaries data
  • Truancy data
  • WRCT data
  • WSAS data
  • Spatial data
  • ETF retirement data for large districts
  • Charter school data
  • Supply & Demand for Education Professionals in Wisconsin
  • UW state school system -- patterns in ed. degrees

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.