GithubHelp home page GithubHelp logo

mcfrank / manybabies-analysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brockf/manybabies-analysis

2.0 8.0 3.0 17.57 MB

Analytic plan in R(Markdown) for the ManyBabies project

R 100.00%

manybabies-analysis's Introduction

ManyBabies Analysis

Analysis plan for the ManyBabies 1 study. Contains both initial simulations and then later analysis of pilot data, fixing analytic decisions for the confirmatory analyses.

Paper draft can be found here.

TODOs

  • Implement further analyses so that the pilot analysis has the structure of the real analysis
  • Design appropriate templates for labs to use for data entry
  • Discuss decisions about plotting and analysis conventions

The biggest issues I see right now are:

  • what's the DV? is it looking time? log looking time (generally better, but our measure is bounded at 20s)? IDS pref? if so, calculated how?

  • what's the general model spec. I wrote a bit of text assuming this is mixed effect models (rather than meta analytic models) - I would argue for this because it allows a greater degree of modeling of random effects (e.g., at the child and item levels, not just the lab level) and more modeling of trial-by-trial dynamics. But I am open to other options.

Discussions pasted from emails

So it sounds like LT, rather than total trial length, is the consensus from the human coder side :)

In terms of providing data templates/information requested, I think we should put total trial length down as a requested data anyway. 'Total LT' seems to be (fortunately) defined in a pretty standard way but it's a summary measure that collapses a richer kind of data: looks on/off every 0.1 second/whatever -> sum(looks on until a look off for n seconds)

I guess it matters if we want to be able to just document whether distributional differences in LT measures occur across methods, or give a finer-grained account of why these differences happen (e.g. because eyetracking trials are generally longer with shorter looks while headturn measures shorter, focused attention, or whatever).

Hugh, I think your point about 2-second LTs is going to be a problem in either case: a trial could be 4s long but have only 1s of LT total (1s inattention, 1s look, 2s lookaway), 4s long with 2s LT, 2s long and 2s of LT (Right? from track loss), 2.1s long and .1s of LT ('real' data with a very short look), etc.

Mike - I'm having trouble running the code right now (I just need to learn to use rmd format, I'll figure it out) so can't see the grouped dataframe: does the track loss condition just mean scenarios where the tracker never picks up a look at all?

What's the standard for human coding if you never pick up an initial look?

I think this could work across methods: All trials should be a minimum of 2s because that's the lookaway criteria (right?) Drop trials that terminate before 2s (for whatever reason). Drop trials where you never got an initial look (human coded, or extracted from eyetracking data), otherwise include. To do this, we'd need everyone reporting at least LT and total trial length, but nothing more fine grained (I would love to give people the option to give us timecourse data if they have it).

  1. Why does Mike's setup produce so many 2s responses?

I assume that this is because of track loss trials? But I like your suggestion to only include trials with LT >2s, assuming that is the look-away criterion for all setups (tho thinking outloud, how would this work with eg single screen, for which there will be some trials on which the child looks away instantly, but where the LT >2 because of the reaction time of the coder).

I was referring to just the eyetracking data actually, since I think I have a mental model that says that headturn, single screen, and eyetracker # of seconds LT are not actually directly comparable anyway. If nothing else, we have timecourse data on the eyetracker, whereas it looks like the other 2 datasets we are just getting the total summed time (right?) However actually adjusting LTs by 2 seconds (I was proposing subtracting 2 sec in addition to dropping shorter trials) might stretch this beyond reasonableness; slash maybe I should be more sanguine about the # of seconds converging across methods. Mike?

  1. We shouldn't do between subjects test.

Yes, I don't see a compelling motivation for this either.

You mean between conditions? Anyway, if we proceed with #3 below it would seem like we should include IFF it's a common analysis.

  1. Is the goal of the analyses to (1) present what we believe to be the most appropriate analysis? (2) To reflect current practices? (3) to compare the two or (4) something else?

I would go with (1), but am happy to be corrected. It would also be helpful to do (3).

  1. Why not mixed effects structures (esp. for Subjects and Items) in other models, too? You're right that I should have included random Subjects effects in these models. They weren't in the original planned analysis presumably because we were thinking of using difference scores, which would be calculated over all trials, and so would remove all that structure from the data. ItemIdentity is not in the datafiles right now.

  2. cant we measure item effects on preference scores too, since the items are paired across condition? Yes we could do this, but wouldn't it have the (bad) effect that if we have an NA for one of the trials, then we have to exclude the data from its paired trial as well? That seems wasteful.

Not sure if we're talking about the same thing: If we are doing by-(mega)trial differences in LT for each item, we'd have to drop pairs where one of the LTs can't be calculated, right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.