Under Q2: What is the correspondence of different metrics on rejecting participants under different thresholds? Look into generalizations of the Dice coefficient?
Under Q3: Is there are correlation between asymmetric learning rates, questionnaire sum scores, and low-effort responding?
Under Q3: Dig into hypomania correlations?
Under Q3: Is there some combination of metrics that can be used to predict low-effort responding via infrequency items assuming they were not collected? Look into decision tree literature?
Some points for discussion after looking through the data today:
Infrequency thresholds: we may want to think about the consistency between the different infrequency items, which is lower than what would be expected under pure random responding. Obviously pure random responding an unrealistic assumption, but I'm wondering if there's anything else to say about those items (e.g. all-endorse items are somehow less discriminative?).
Additional survey metrics: there are some recommended survey quality metrics I have not yet implemented as they are somewhat challenging for our dataset. A metric like internal (split-half) consistency is possibly less robust in our case where we have few items per subscale. Similarly, it's not clear if we have enough items to compute consistency via "psychometric synonyms/antonyms". It doesn't seem crucial to me to compute all of these survey metrics as they're not the crux of the paper -- that said, if there's an easy way to compute these it'd be interesting to compare them to behavioral metrics (re: Major Point #2, behavior =/= survey thresholding).
Thresholding non-behavior metrics: it is somewhat more clear what the anchor points are for thresholding behavior (i.e. chance). It's somewhat less clear for other metrics (total experiment duration, entropy, Mahalanobis D). It's possible the literature may have some recommendations. Short of that, we'll want to think about a sensible rule.