andrewsu / mentorship-survey-analysis Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 1.49 MB

License: MIT License

Python 100.00%

mentorship-survey-analysis's People

Contributors

Watchers

mentorship-survey-analysis's Issues

graceful handling of the "no numeric data to plot" error

Right now, the script produces many errors like this:

Q64:Please reflect on your mentoring experience. How can the mentorship climate at Scripps Research be improved for mentors?
Traceback (most recent call last):
  File "/home/asu/Science/mentorship-survey-analysis/main.py", line 156, in plot_bar_charts
    plottable_df.plot.barh(stacked=True, ax=axes[i])
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_core.py", line 1222, in barh
    return self(kind="barh", x=x, y=y, **kwargs)
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_core.py", line 975, in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_matplotlib/__init__.py", line 71, in plot
    plot_obj.generate()
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py", line 446, in generate
    self._compute_plot_data()
  File "/home/asu/env/mentorship-survey-analysis/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py", line 632, in _compute_plot_data
    raise TypeError("no numeric data to plot")
TypeError: no numeric data to plot

no numeric data to plot

These are expected errors given the sample data file and they do not indicate anything is actually going wrong, but their appearance in the console output may confuse users who are not familiar with the reason. It would be nice to catch these errors more gracefully.

change overflow behavior for free text answers

Currently, long free-text answers are truncated using ... (see red arrows below). Also, the overall number of free-text answers is truncated with ... (see green arrow below). We want to change this so that all answers are shown in their entirety.

remove Report Score when equal to `nan`

When we don't map answers to a numeric scale, the calculated "Report Score" appears as nan. In those cases, we should completely remove the Report Score line.

initial analysis notes

Notes

Sample data were originally provided to us as 2023 Scripps DEI Survey_dataset to share with Scripps 7.10.23.xlsx, which has been converted to a plain TSV file in /data/sample_data.txt
The three columns Department/Org Level 1, Division/Org Level 2, and Strategic Unit/Org Level 3 describe the levels of aggregation as we move up the org chart
The columns between Q1:Gender Identity - Selected Choice and Q5:Citizenship status define the demographic groups. Reports for each one of these groups will be created when we meet the threshold to ensure anonymity only.
The columns starting at Q6:How long have you been with Scripps Research? represent the survey data to be summarized in a report
Columns with discrete, ordered values (e.g., Strongly agree, Agree, ... , Strongly Disagree) should be shown as stacked bars. The vast majority of data should be in this format.
Some columns contain a comma-delimited list of discrete values (e.g., for Q13B:What methods do you use to communicate with your mentor? (Check all that apply) - Selected Choice, example value is In-person, one-on-one meetings,Group meetings,Email.) These data should be shown as a bar chart showing the percentage of respondents who selected each response.
Some columns contain free-text answers (e.g., Q54_1_TEXT:Is there something that you experienced working with previous mentors that you wish was also done with your current mentor? - Yes (please explain): - Text). All answers should be presented in a simple text box.

To-Do

This section will be broken out into individual tickets.

Ask The Mark to adjust the Org Chart columns for GRAD PROG - STUDENTS
Confirm that The Mark will pre-screen all free-text responses to remove identifying information
Andrew to adjust sample_data.txt to include more realistic counts for demographic groups to test inclusion/exclusion of reports as outlined in this analysis spreadsheet

display mean values for cohort and for higher organizational levels

Suppose we have a report generated for a specific lab, say "NEURO LAB 1". For many questions, we will have responses that correspond to a scale, e.g., 'Strongly agree', 'Somewhat agree', 'Neither agree nor disagree', 'Somewhat disagree', 'Strongly Disagree'. Those currently are ordered and visually displayed in a bar chart.

In this issue, I propose also calculating a numeric score from the responses to a given question. We might do this by assigning a score for each answer, e.g.,

'Strongly agree' = +2
'Somewhat agree' = +1
'Neither agree nor disagree' = 0
'Somewhat disagree' = -1
'Strongly Disagree' = -2

The responses could then be averaged, and that average could then be shown on the PDF report.

The average for a given question in the report could then be compared to the average for the same question at higher organizational levels. For example, if "NEURO LAB 1" is the "Department/Org Level 1", then the "Division/Org Level 2", corresponds to "NEUROSCIENCE - CA", and the "Strategic Unit/Org Level 3" corresponds to "ACADEMIC RESEARCH". For a given question, the report could include the average of responses for each of those three levels, as well as the Institute average.

Similarly, suppose we are generating a report for the "NEUROSCIENCE - CA" level specifically for respondents who provided a "gender identity" answer of "Female". In addition to computing the average of responses for all Female respondents in "NEUROSCIENCE - CA", we would also show the average for all Female respondents in "ACADEMIC RESEARCH", and all Female respondents Institute-wide.

threshold count should be based on consent:yes >= 5

Currently, reports are generated if the number of respondents is >= 5. Let's modify so that this threshold is based on the number of respondents answering "yes" to Q0:Do you consent to taking this survey? being >= 5

suppress the generation of multiple identical reports

In many cases, there is a 1-1 relationship between the Supervisor for Reporting column and the Department/Org Level 1 column. For example, all the people listing a supervisor of "Su, Andrew I." also list "ISCB - SU" for the department. In cases like this, the reports for "Su, Andrew I." and "ISCB - SU" would be exactly the same. In that case, only create the report for "ISCB - SU".

combine demographic reports into a single PDF

Currently we generate separate PDF reports for demographic splits. For example, based on the sample data, we generate separate PDFs for 'NEURO LAB 2.pdf', 'NEURO LAB 2+Male.pdf', and 'NEURO LAB 2+Female.pdf'. On reviewing these reports with test users, we realized that it would be easier to use if all the demographic splits (gender and race/ethnicity) were included in a single PDF. So, there would only be one 'NEURO LAB 2.pdf', and the summary for one question might look like this:

The only thing we'd lose in this version is the actual counts, but I think that is an acceptable trade off.

(of all the issues, this is probably the most substantial change, so let's discuss feasibility...)

andrewsu / mentorship-survey-analysis Goto Github PK

mentorship-survey-analysis's People

Contributors

Watchers

mentorship-survey-analysis's Issues

graceful handling of the "no numeric data to plot" error

change overflow behavior for free text answers

remove Report Score when equal to `nan`

initial analysis notes

Notes

To-Do

display mean values for cohort and for higher organizational levels

threshold count should be based on consent:yes >= 5

suppress the generation of multiple identical reports

combine demographic reports into a single PDF

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs