nzbri / pd-apathy Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Taken from #8:
> npi %>%
+ filter(abs(as.numeric(npi_date - session_date, units = "days")) > 120) %>%
+ select(session_id, session_date, npi_date)
# A tibble: 19 x 3
session_id session_date npi_date
<chr> <date> <date>
1 006BIO_2016-02-05 2016-02-05 2015-03-09
2 027LPR-C_2009-01-06 2009-01-06 2018-01-31
3 ...
As per e.g. the "Cognitive tests that identify high risk of conversion to dementia in Parkinson's disease" preprint, is there a way of drilling down to the individual tests (i.e. rather than the five subdomains)? Presumably this would have to go via REDCap?
Thank you!
b09419b visualises medication as sqrt(LED)
, which transforms the distribution to (approximately) a delta + Gaussian mixture. Within the regression model, this is probably best modelled as a binary taking_medication
variable, and then a dose = sqrt(LED) * taking_medication
(z-scored within the taking medication group).
See nzbri/chchpd#14.
See e.g.:
Would probably want to split into:
x
years / deceasedAs suggested by @zenourn, this issue is a dynamic list of quick questions about the different study protocols and database itself. Some previous questions can be found in #1 and #2.
full_assessment
).More detail on individual items in comments.
If we are interested in whether apathy becomes more prevalent as Parkinson's progresses, then we have (at least) three highly correlated proxies for progression: years since diagnosis, decline in motor scores, decline in cognitive scores. How best to think about evaluating the strength of these effects?
We can test each of the three individual models relative to a simpler null (cf. Kyla's paper), or we could simply test the joint model (perhaps with a post-hoc test to see if any individual factor has driven any changes), but I don't have a good feel for how the LOOIC behaves in this context.
> model1$formula
global_z ~ 1
> model2$formula
global_z ~ 1 + ethnicity
> loo_compare(model1, model2, criterion = "loo")
elpd_diff se_diff
model2 0.0 0.0
model1 -8.5 6.0
> model3$formula
global_z ~ 1 + education
> loo_compare(model1, model3, criterion = "loo")
elpd_diff se_diff
model3 0.0 0.0
model1 -8.7 4.5
> model4$formula
global_z ~ 1 + ethnicity + education
> loo_compare(model1, model4, criterion = "loo")
elpd_diff se_diff
model4 0.0 0.0
model1 -18.0 7.6
I.e. the origin of the global_z
data. Would also be useful to define the various subdomains too.
Chatting with Marie, the enrichment study (study == "Cognition and Exercise in PD"
) included an 'apathy scale' in the initial and final assessments. However, we tend to only see these subjects if they were subsequently enrolled in the main PD study as the assessments were not the standard neuropsych sessions, though they did include a variety of measures and UPDRS.
To do:
chchpd
interface?Finalise procedures for missing data -- key questions:
E.g. a scatter / correlation plot of the five subdomains of global_z
. Should probably also visualise the relationship between e.g. MoCA
and global_z
.
Over the years I've become to appreciate Andrew Gelman's view around why the terms fixed and random effect aren't ideal: https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/
In https://github.com/nzbri/pd-apathy/blob/master/AnalysisPlan.md I'd suggest:
Fixed effects (cross sectional)
-> Subject-level predictors
Fixed effects (within subject)
-> Measurement-level predictors
[not strictly within-subject; both within and between subjects]Fixed effects (interactions)
-> Measurement-level predictors (interactions)
Random effects
-> Varying intercepts and slopes
Subject-specific intercept (i.e. baseline propensity)
-> Intercept allowed to vary by subject
Subject-specific slope with time since diagnosis (i.e. rate of progression).
-> Effect of time since diagnosis allowed to vary by subject
Happy to discuss this morning
DataOverview.md
.Potential useful to keep either the study
or session_suffix
variables present in some form when exporting the raw data (e.g. for keeping track of potential protocol differences across studies).
3de5e1a added support for multiple imputation during data processing and model fitting. However, there is still no obvious way to use this multiple data for model comparison via the LOOIC (see e.g. paul-buerkner/brms#997). Currently output warning:
Using only the first imputed data set. Please interpret the results with caution until a more principled approach has been implemented.
The approach suggested in the above issue does not seem to work in this instance (i.e. there is no change to the output of brms::loo
when using newdata
, other than the warning disappearing).
Need to look into whether e.g. MAMI
could help in this instance.
4aedc7c includes the output from a run through of the initial modelling code in Results/core-analyses_2020-12-21.Rout
:
pd-apathy/Results/core-analyses_2020-12-21.Rout
Line 1640 in 4aedc7c
By some distance the largest effect is that of sex
, with some weak session-level progression-like predictors too. Before looking at analyses of temporal predictability (#7), we thought it was probably worth checking everything looks sensible here! A few quick questions that spring to mind:
ethnicity
, given that the large majority (≈93%) of participants are pākehā. I presume LOOIC is doing leave-one-session-out, but I don't know whether leave-one-subject-out would be more appropriate in some instances?Thanks! 😄
Currently, the proposal is to look at apathy status over time via a logistic regression (where 'over time' means that temporally varying measurement-level variables are included in the model, and the prediction is per session), and testing via LOOIC.
Two other approaches we could take are:
∆t
years.For the former, the simpler model is (arguably) a bit more intuitive: predicting what happens to a patient given their current status and age / cognitive score / etc. However, it also feels like the models are closely related: for a given time since diagnosis (t
) our model would give p(apathetic | t, ...)
, and we could get something like Kyla's metric via e.g. p(developing apathy | ∆t, ...) = (p(apathetic | t + ∆t, ...) - p(apathetic | t, ...)) / p(!apathetic | t, ...)
(this is an oversimplification: assuming for the sake of argument a positive β
on t
, and that other metrics get worse over time, so that remission is negligible). In other words, the 'development' model can be thought of as a (normalised) slice through the full model at a specific timepoint. In that context, we would then talk about the e.g. interactions between subject-level variables and years since diagnosis as being the 'risk factors' for developing apathy.
p(apathetic | t + ∆t, ...)
, in the presence of other measurement-level variables: it is really something more like p(apathetic | t + ∆t, motor_scores(t + ∆t), ...)
. That then has a non-trivial dependence on changes in other metrics, in a way that means it's not such a pure predictor (we could presumably marginalise over future unseen motor scores etc., but that is introducing more complexity).For the classification approach, we're basically trading off interpretability for flexibility (if we went for say a GP / kernel regression / kernel SVM / etc approach). Are there any obvious disadvantages / is it redundant to have a look at that approach (this would be more as a potential side project, and wouldn't change the core analysis).
Thanks!!
The screening sessions (see #8) have a very stripped down assessment, and don't include any apathy measures. These should be excluded by default.
No. of patients / sessions / diagnoses / etc.
Going over exclusion criteria and redefining baselines (see e.g. #10, #12) means I was thinking about the PDD exclusions again. Currently, we exclude at baseline, but given the screening sessions we no longer have a clear definition of a baseline, and actually don't need one at yet for the proposed analysis (i.e. things are either as measured, or time since diagnosis). Do we want to revise the PDD exclusion to e.g.:
Key information required (to be added to the analysis plan):
full_assessment
)?We are in what is a somewhat tricky regime for LOOIC-based model selection: not enormous numbers of subjects (though by no means small either), and a binary outcome variable. This means the LOOIC can have a high variance:
We can see a related issue in the initial model outputs, where there are many small differences and we could essentially pick any model as long as it includes the effect of sex
:
Model comparison
elpd_diff se_diff
NPI_apathy_present ~ 1 + sex + ethnicity + age_at_diagnosis 0.0 0.0
NPI_apathy_present ~ 1 + sex + ethnicity + education + age_at_diagnosis -0.3 1.3
NPI_apathy_present ~ 1 + sex + ethnicity -0.6 1.8
NPI_apathy_present ~ 1 + sex + age_at_diagnosis -0.7 0.5
NPI_apathy_present ~ 1 + sex + ethnicity + education -0.9 2.2
NPI_apathy_present ~ 1 + sex + education + age_at_diagnosis -1.1 1.3
NPI_apathy_present ~ 1 + sex -1.5 1.9
NPI_apathy_present ~ 1 + sex + education -1.7 2.3
NPI_apathy_present ~ 1 + ethnicity + age_at_diagnosis -9.2 4.3
NPI_apathy_present ~ 1 + age_at_diagnosis -9.6 4.3
NPI_apathy_present ~ 1 + ethnicity -9.9 4.7
NPI_apathy_present ~ 1 + ethnicity + education + age_at_diagnosis -9.9 4.4
NPI_apathy_present ~ 1 + education + age_at_diagnosis -10.2 4.4
NPI_apathy_present ~ 1 + ethnicity + education -10.4 4.8
NPI_apathy_present ~ 1 -10.5 4.7
NPI_apathy_present ~ 1 + education -11.1 4.9
Winning formula: ‘NPI_apathy_present ~ 1 + sex + ethnicity + age_at_diagnosis’
What Piironen & Vehtari recommend is:
See the following for example code:
As suggested by Leslie et al. at the NZBRI Seminar 2021-05-23. Will need to parse other_medications
from chchpd::import_medications()
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.