GithubHelp home page GithubHelp logo

mentorship-survey-analysis's People

Contributors

andrewsu avatar rjawesome avatar

Watchers

 avatar  avatar

mentorship-survey-analysis's Issues

graceful handling of the "no numeric data to plot" error

Right now, the script produces many errors like this:

Q64:Please reflect on your mentoring experience. How can the mentorship climate at Scripps Research be improved for mentors?
Traceback (most recent call last):
  File "/home/asu/Science/mentorship-survey-analysis/main.py", line 156, in plot_bar_charts
    plottable_df.plot.barh(stacked=True, ax=axes[i])
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_core.py", line 1222, in barh
    return self(kind="barh", x=x, y=y, **kwargs)
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_core.py", line 975, in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_matplotlib/__init__.py", line 71, in plot
    plot_obj.generate()
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py", line 446, in generate
    self._compute_plot_data()
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py", line 632, in _compute_plot_data
    raise TypeError("no numeric data to plot")
TypeError: no numeric data to plot

no numeric data to plot

These are expected errors given the sample data file and they do not indicate anything is actually going wrong, but their appearance in the console output may confuse users who are not familiar with the reason. It would be nice to catch these errors more gracefully.

change overflow behavior for free text answers

Currently, long free-text answers are truncated using ... (see red arrows below). Also, the overall number of free-text answers is truncated with ... (see green arrow below). We want to change this so that all answers are shown in their entirety.
ย 
image

remove Report Score when equal to `nan`

When we don't map answers to a numeric scale, the calculated "Report Score" appears as nan. In those cases, we should completely remove the Report Score line.

image

initial analysis notes

Notes

  • Sample data were originally provided to us as 2023 Scripps DEI Survey_dataset to share with Scripps 7.10.23.xlsx, which has been converted to a plain TSV file in /data/sample_data.txt
  • The three columns Department/Org Level 1, Division/Org Level 2, and Strategic Unit/Org Level 3 describe the levels of aggregation as we move up the org chart
  • The columns between Q1:Gender Identity - Selected Choice and Q5:Citizenship status define the demographic groups. Reports for each one of these groups will be created when we meet the threshold to ensure anonymity only.
  • The columns starting at Q6:How long have you been with Scripps Research? represent the survey data to be summarized in a report
  • Columns with discrete, ordered values (e.g., Strongly agree, Agree, ... , Strongly Disagree) should be shown as stacked bars. The vast majority of data should be in this format.
  • Some columns contain a comma-delimited list of discrete values (e.g., for Q13B:What methods do you use to communicate with your mentor? (Check all that apply) - Selected Choice, example value is In-person, one-on-one meetings,Group meetings,Email.) These data should be shown as a bar chart showing the percentage of respondents who selected each response.
  • Some columns contain free-text answers (e.g., Q54_1_TEXT:Is there something that you experienced working with previous mentors that you wish was also done with your current mentor? - Yes (please explain): - Text). All answers should be presented in a simple text box.

To-Do

This section will be broken out into individual tickets.

  • Ask The Mark to adjust the Org Chart columns for GRAD PROG - STUDENTS
  • Confirm that The Mark will pre-screen all free-text responses to remove identifying information
  • Andrew to adjust sample_data.txt to include more realistic counts for demographic groups to test inclusion/exclusion of reports as outlined in this analysis spreadsheet

display mean values for cohort and for higher organizational levels

Suppose we have a report generated for a specific lab, say "NEURO LAB 1". For many questions, we will have responses that correspond to a scale, e.g., 'Strongly agree', 'Somewhat agree', 'Neither agree nor disagree', 'Somewhat disagree', 'Strongly Disagree'. Those currently are ordered and visually displayed in a bar chart.

In this issue, I propose also calculating a numeric score from the responses to a given question. We might do this by assigning a score for each answer, e.g.,

  • 'Strongly agree' = +2
  • 'Somewhat agree' = +1
  • 'Neither agree nor disagree' = 0
  • 'Somewhat disagree' = -1
  • 'Strongly Disagree' = -2

The responses could then be averaged, and that average could then be shown on the PDF report.

The average for a given question in the report could then be compared to the average for the same question at higher organizational levels. For example, if "NEURO LAB 1" is the "Department/Org Level 1", then the "Division/Org Level 2", corresponds to "NEUROSCIENCE - CA", and the "Strategic Unit/Org Level 3" corresponds to "ACADEMIC RESEARCH". For a given question, the report could include the average of responses for each of those three levels, as well as the Institute average.

Similarly, suppose we are generating a report for the "NEUROSCIENCE - CA" level specifically for respondents who provided a "gender identity" answer of "Female". In addition to computing the average of responses for all Female respondents in "NEUROSCIENCE - CA", we would also show the average for all Female respondents in "ACADEMIC RESEARCH", and all Female respondents Institute-wide.

threshold count should be based on consent:yes >= 5

Currently, reports are generated if the number of respondents is >= 5. Let's modify so that this threshold is based on the number of respondents answering "yes" to Q0:Do you consent to taking this survey? being >= 5

suppress the generation of multiple identical reports

In many cases, there is a 1-1 relationship between the Supervisor for Reporting column and the Department/Org Level 1 column. For example, all the people listing a supervisor of "Su, Andrew I." also list "ISCB - SU" for the department. In cases like this, the reports for "Su, Andrew I." and "ISCB - SU" would be exactly the same. In that case, only create the report for "ISCB - SU".

combine demographic reports into a single PDF

Currently we generate separate PDF reports for demographic splits. For example, based on the sample data, we generate separate PDFs for 'NEURO LAB 2.pdf', 'NEURO LAB 2+Male.pdf', and 'NEURO LAB 2+Female.pdf'. On reviewing these reports with test users, we realized that it would be easier to use if all the demographic splits (gender and race/ethnicity) were included in a single PDF. So, there would only be one 'NEURO LAB 2.pdf', and the summary for one question might look like this:

The only thing we'd lose in this version is the actual counts, but I think that is an acceptable trade off.

(of all the issues, this is probably the most substantial change, so let's discuss feasibility...)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.