neuroscout / neuroscout Goto Github PK

View Code? Open in Web Editor NEW

15.0 5.0 12.0 15.61 MB

NeuroScout web app and API

Home Page: https://neuroscout.org

License: BSD 3-Clause "New" or "Revised" License

Python 29.87% HTML 19.54% Shell 0.19% CSS 0.93% TypeScript 35.01% Dockerfile 0.29% Mako 0.06% JavaScript 14.11%

brain-imaging fmri open-source naturalistic statmaps

neuroscout's Introduction

neuroscout

This is the repository for the neuroscout server.

Requirements: Docker and docker-compose.

Configuration

First, set up the main environment variables in .env (see: .env.example). Set DATASET_DIR, KEY_DIR, and FILE_DATA to folders on the host machine.

Optionally, set up pliers API keys for feature extraction in .pliersenv (see: .pliersenv.example). More information on pliers API keys

Next, set up the Flask server's environment variables by modifying neuroscout/config/example_app.py and saving as neuroscout/config/app.py.

Finally, set up the frontend's env variables by modifying neuroscout/frontend/src/config.ts.example and saving as neuroscout/frontend/src/config.ts.

For single sign on using Google, a sign-in project is needed.

Initalizing backend

Build the containers and start services using the development configuration:

docker-compose -f docker-compose.yml -f docker-compose.dev.yml build
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d

The server should now be running at http://localhost/

Next, initialize, migrate and upgrade the database migrations. If you have a database file, load it using pg_restore. Otherwise, delete the migrations folder, initalize the database and add a test user.

docker-compose exec neuroscout bash
rm -rf /migrations/migrations
python manage.py db init
python manage.py db migrate
python manage.py db upgrade
python manage.py add_user useremail password

Staging & production server

For the staging server, you can trigger a manual build as follows:

docker-compose -f docker-compose.yml -f docker-compose.build.yml build
docker-compose -f docker-compose.yml -f docker-compose.build.yml up -d

For the staging or production server, you can trigger pull a pre-built image from GHCR: First set the variable IMAGE_TAG to the apprioriate image tag

docker-compose -f docker-compose.yml -f docker-compose.image.yml build
docker-compose -f docker-compose.yml -f docker-compose.image.yml up -d

Setting up front end

The frontend dependencies are managed using yarn

Enter the neuroscout image, and install all the necessary libraries like so:

docker-compose exec neuroscout bash
cd frontend
yarn

You can then start a development server:

yarn start

Or make a production build:

yarn build

Ingesting datasets and extracting features

You can use manage.py commands to ingest data into the database. Run the following commands inside docker: docker-compose exec neuroscout bash

To add BIDS datasets

python manage.py add_task bids_directory_path task_name

For example for dataset ds009

python manage.py add_task /datasets/ds009 emotionalregulation

Finally, once having added a dataset to the database, you can extract features using pliers into the database as follows:

python manage.py extract_features bids_directory_path task_name graph_json

For example:

python manage.py extract_features /datasets/ds009 emotionalregulation graph.json

Even easier, is to use a preconfigured dataset config file, such as:

docker-compose exec neuroscout python manage.py ingest_from_json /neuroscout/config/ds009.json

Maintaining docker image and db

If you make a change to /neuroscout, you should be able to simply restart the server.

docker-compose restart neuroscout

If you need to upgrade the db after changing any models:

docker-compose exec neuroscout python manage.py db migrate
docker-compose exec neuroscout python manage.py db upgrade

To inspect the database using psql:

docker-compose run postgres psql -U postgres -h postgres

API

Once the server is up and running, you can access the API however you'd like.

The API is document using Swagger UI at:

http://localhost/swagger-ui

Authorization

To authorize API requests, we use JSON Web Tokens using Flask-JWT. Simply navigate to localhost:5000/auth and post the following

{
    "username": "[email protected]",
    "password": "string"
}

You will receive an authorization token in return, such as:

{
    "access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZGVudGl0eSI6MSwiaWF0IjoxNDQ0OTE3NjQwLCJuYmYiOjE0NDQ5MTc2NDAsImV4cCI6MTQ0NDkxNzk0MH0.KPmI6WSjRjlpzecPvs3q_T3cJQvAgJvaQAPtk1abC_E"
}

You can then insert this token into the header to authorize API requests:

GET /protected HTTP/1.1
Authorization: JWT eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZGVudGl0eSI6MSwiaWF0IjoxNDQ0OTE3NjQwLCJuYmYiOjE0NDQ5MTc2NDAsImV4cCI6MTQ0NDkxNzk0MH0.KPmI6WSjRjlpzecPvs3q_T3cJQvAgJvaQAPtk1abC_E

Note that in order to use any protected routes, you must confirm the email on your account. Confusingly, you can get a valid token without confirming your account, but protected routes will not function until confirmation.

Running backend tests

To run tests, after starting services, create a test database:

docker-compose exec postgres psql -h postgres -U postgres -c "create database scout_test"

and execute:

docker-compose run -e "APP_SETTINGS=neuroscout.config.app.DockerTestConfig" --rm -w /neuroscout neuroscout python -m pytest neuroscout/tests

or run them interactively: docker.compose exec neuroscout bash APP_SETTINGS=neuroscout.config.app.DockerTestConfig python -m pytest neuroscout/tests/ --pdb

To run frontend tests run:

docker-compose run --rm -w /neuroscout/neuroscout/frontend neuroscout npm test

Running frontened tests

To run frontend tests, have Cypress 6.0 or greater installed locally. First, ensure neurscout is running:

docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d

Next, set up the test environment:

docker-compose exec neuroscout bash
export APP_SETTINGS=neuroscout.config.app.DockerTestConfig
bash setup_frontend_tests.sh

In a separate window, you can run cypress:

cd neuroscout/frontend cypress open

Once done, kill the first command, and run the following to tear down the test db

docker-compose exec -e APP_SETTINGS=neuroscout.config.app.DockerTestConfig neuroscout python manage.py teardown_test_db

neuroscout's People

Contributors

Stargazers

Watchers

Forkers

tongsong91 rwblair peerherholz soichih rbroc devhliu jdkent tsalo francopestilli jsmentch snastase yarikoptic

neuroscout's Issues

Refresh cron job

@tyarkoni : "Yes, refreshing should be handled separate. We probably don't want to do the standard memoization thing of refreshing the resource in real time as needed (i.e., every time a request comes in) if the existing one has expired, because in our case these extractions can take a long time. Probably best to have a nightly cronjob run that inspects everything in the DB and regenerates it after a month or something."

Add new status field to analysis model

As we discussed in the specs doc, we'll need a new status field with possible values of DRAFT, PENDING and GENERATED. I think the existing boolean locked field will technically be redundant since it can be derived as locked = status != 'DRAFT'.

Add PUT route for users

To allow editing user profile info

Add 'predictions' field to analysis on backend

Description from the specs:

Predictions [textarea]: a free-form text field where the user can enter their predictions about what they expect the results to look like ahead of time.

Frontend code already has it. Just need to add it to the backend and update the related API routes.

Configuration file

Currently, deploying the server requires several commands. It would be nice to have a JSON configuration file that would handle all the set up (and of varying thoroughness, e.g. test vs full deployment config), such as downloading datasets via datalad, or finding them on the local file system.

Analysis status

In order to allow for incomplete, but persistent analyses, very few fields should be actually required to make a POST request to /api/analyses. There should also be a status field (should we have status codes in integers or strings?) which indicates if an analysis is:

In progress
Finished and valid, but locked since it has been run
Finished and invalid (and thus probably editable again)

Changing the status should be done by a different AJAX call, not editing the status field, thus this field should be read-only.

Such as:
/api/analyses/1/validate ?

Once an analysis is validated, a nipype workflow is generated and available at:
/api/analyses/1/workflow

and a graph of the workflow at:P
/api/analyses/1/graph

Analysis specification resource - POST

In order to specify an analysis, REST resource should allow users to POST an analysis JSON that specifies which Predictors, ExtractedFeatures and transformations are in the models.

Some server side validation would be useful here to ensure analyses are valid (although incomplete designs should also be allowed).

Once analysis is posted, complete analysis (with analysis ID) should be returned.

Subscribe to Travis

I think I originally set up Travis as a trial, since its a private repo. It just told me I only have 10 builds left.

Ingest extractors

Using JSON, define extractor names (curated set of extractors), along with description and another information. Using Marshmallow for ingestion.

Update workflow/tests to use latest pybids

Instead of installing from a commit, go for release

Update the URL when user saves a new analysis for the first time

Hash stimuli

In order to prevent representing stimuli multiple times— which may occur at different time points in different runs— use MD5 hashing to generate a unique id of each stimulus and only store it once.

Implement Analysis Viewer

This will be a page to view everything about the analysis and its results without being able to edit anything. Locked analyses can only be opened in Analysis Viewer. Unlocked analyses can be opened in either the "Viewer" or "Builder". The Viewer could also be embedded in the "Review" tab of the Builder.

@tyarkoni @adelavega Am I thinking about this correctly? The viewer will need to be spec'ed out before implementing.

Switch to py.test for testing

py.test has some nice features we may want to use, and also simplifies tests as compared to UnitTest. It seems to be the de facto standard now (nose is no longer being maintained), so we probably want to use it for all our tests.

Also, let's put the test modules in their own folder (web/tests), as there will be a whole bunch of them pretty soon.

Add coveralls

Can we add coveralls? I supposed @tyarkoni already has a subscription for private repos.

PredictorRun cache table

In order to cache Run level descriptive stats, we need a PredictorRun join table, with a single entry (thus unique requirement on PKs), which columns which are various common descriptive stats that will be displayed to users.

Fields such as:
min
max
mean

How should these be served to clients though? One possibility is as a nested field in each predictor, which would vary based on the runs provided.

For example

/api/predictors/?run_id=1,2,3 would return as a field, a list of entries in PredictorRun?

Alternatively, we could have a different route for this such as:
/api/predictors/1/summary?run_id=1,2,3

Thoughts? @skept @tyarkoni

Enable Flask server to serve SPA

Currently we're using development servers such as yarn start and flask.

However, for production, a single server should serve both flask and the static files generated by yarn.

We need to think about how to combine these so that yarn builds the SPA perhaps on docker build? Or possibly this could be incorporated into flask but I doubt it.

Import user defined features using pybids Analysis module

We need to write a script to handle user annotations of features

Example of hand annotations that need to be imported http://studyforrest.org/pages/annot_structure.html

Warn user about unsaved changes when leaving the Analysis Builder interface

Trigger creation of nipype workflows & analysis bundle

Upon locking of an analysis.

Ensure these are async calls.
Generate graphs
Add counts of workflow retrievals / downloads tables - timestamp every time someone retrieves a workflow.

Import dataset from DataLad/git-annex

Would be cool if instead of needing to have a specific dataset on hand, you could just specific a datalad dataset to import, and it would download just what is required and import.

Should we need the actual event.tsv files, then we need the actual dataset, rather than just a JSON description of it. But thanks to git-annex, we can just pull the files we need (i.e. no imaging data).

Extract features from pliers into neuroscout db

Add to manage.py ability to ingest features from dataset/task given a pliers json.

Enable compression

nginx should allow gzip compression by mimetype. Enable for API.

join table for predictors and analyses

analysis PUT - kwargs error 'if 'locked' key missing

Write more concrete specs for Transformations

Predictors table doesn't update correctly when a saved analysis is loaded

CLI Inputs/outputs

Opening the issue to discuss the CLI design. Below are some of my thoughts of the possible designs.

The command line neuroscout interface will download a pre-compiled python file which is a complete nipype workflow that is executed using python-datalad to download all the necessary files. Files in the script will be referenced with relative paths as in src/{dataset}/{file}.

The command line interface will also need to retrieve events.tsv or JSON bundle of the extracted features selected for an analysis in order to execute the workflow.

Alternatively, the CLI could obtain a "bundle" that includes JSON representations necessary to compile the workflow These would reference: the input file paths, the transformations, selected predictors. The CLI woudl then compile and execute the workflow. An advantage of this approach is that we can write a single generic workflow that take these inputs for use on the client and server, rather than having to create a static workflow on the server for download. I've already created such a workflow (see: scripts/fmri_bids_firstlevel.py) and it would simply have to be expanded further (and run with python-datalad).

However, this approach is potentially problematic as the client and server could have different versions of the workflow, and users could edit the inputs in between.

Allow user to delete draft analyses

@tyarkoni @adelavega Do we want to support this?

Use BIDSEventCollection in PliersWorkflow

Minor: Use BIDSEventCollection instead of manually reading in event files.

GET analysis interface

Once an analysis is designed, there should be a resource that can return the full analysis spec, including all run predictor data. This may be rather large and is intended for the nipype interface to read.

Use Marshmallow hierarchical schemas whenever possible to make this an easy read out of db.

Allow user to clone existing analyses

Importing BIDS datasets with stimuli

Scripts need to be written to import stimuli information of BIDS datasets into the neuroscout db.

Version 1.01 of BIDS adds the ability to store stimuli files under 'stimuli/'. These files are references in each participants' events.tsv file in a the stim_file column.

Modality of the stimulus should be inferred from the file extension.

To avoid representing the same stimulus multiple times in the database, md5 hash will be stored. Since the timing of stimuli could vary across runs/subjects, stimuli do not have onset information.

predictor schema id should be int

Analysis bundle endpoint

Although I've left predictor-events as repetition json objects, for a final download from the CLI, a bundle which includes all dependencies makes sense.

PUT analysis route

To edit in progress analyses, need to add a PUT route at:

/api/analyses/{analysis_id}. Unlike POST, this will require the hash_id in the schema.

update .gitignore file and remove .pyc files from tracking

There are a bunch of .pyc files in the web/ folder; maybe expand the current .gitignore considerably (e.g., this one looks reasonable) and remove the unwanted .pyc files from tracking (I see *.pyc in the .gitignore, but you'll need to manually remove files that are already tracked).

Add analysis name and hash_id to response of GET /api/user

The analyses field of the response is currently:

analyses: [{id}]

What I need:

analyses:[{hash_id, name}]

Clone analysis

A route such as:
/api/analysis/1/clone should clone the current analysis (even if it is in an "locked" status) and generate a new analysis id.

To be a restful call, it should return the full representation of the new object, including the new id fields.

Add lock route

Instead of monitoring status changes

DataLad super dataset

It would help management of data resources to create a datalad/git super dataset with submodules that are forks of datasets, with the appropriate additions to link to proprietary stimuli files. This would be a centralized place to manage all of the resources, without any patchy/hacking downloading/insertion of data files.

This would be in conjunction to a configuration file (#48) which allows one to select a subset of this dataset for digestion into the database, and describes which features to extract.

Compact predictor-event data into predictor route, without repeated dictionaries

Remove predictor-event route
Include data (optionally via a parameter) in /predictors/ route, in a more compressed format (i.e. no repeated nested dicts).

Nipype workflow for BIDS analysis

Write nipype workflow for analyzing a BIDS dataset with basic settings. So far SpecifyEvents seems to be working w/ events.tsv and transformations.json. Add standard fMRI stuff to this workflow and run full analysis.

write tests that fetch remote datasets using datalad

currently, all tests are done using local toy datasets. it would be great in the future to run on more realistic datasets that are fetched from remote locations using datalad.

Edit (PUT), compile and expose (GET) analysis nipype workflows.

Given an existing analysis (created using POST request), add ability to edit this analysis using a PUT request.

Analysis will have a locked field which indicates if it is still editable, and thus allows PUT requests to it as well as a locked_at timestamp field.

Once a user is done editing their analysis, a POST request to:
/api/analyses/1/compile will compile the interface, and lock the analysis.

Once an analysis is validated, a nipype workflow is generated and available at:
/api/analyses/1/workflow

and a graph of the workflow at:P
/api/analyses/1/graph

These routes (especially workflow) should be restricted to logged in users - so that they have agreed to user license.

Also, add counts of workflow retrievals / downloads tables - timestamp every time someone retrieves a workflow.

Set up tests for frontend code

Stable database fixtures for tests

Right now, all test fixtures reset after each test, because the session rolls back after tests. This is very useful for most tests, as the database is in a clean state for every test.

However, this is time intensive for certain operations like adding a full BIDS dataset, which happens across many different tests. A session object that does not roll back and works with session level scoping is needed for this.

So far this doesn't seem to be a total deal breaker, so I'm going to leave it as is, but its worth keeping an eye on.

Neuroscout nipype interface

Interface that communicates with backend to get cached extracted features from db.

Input: JSON spec of pliers features, dataset and task.

Output: Same as pliers interface.

Issues:

Should other features be returned? Original features are stored in the database, should events.tsv be reconstructed? Or simply append extracted features like pliers interface does? If the latter, could only get extractedfeatures from db and then append (like pliers interface) to events.tsv that workflow is expected to have.
Need to think about when cache is "too old", but this should be the job of another process.

Add models for BIDS ingestion

To populate the database with BIDS Datasets, we need to add a few models:

Run. Basic unit of fMRI analysis, which can encompass different sessions and tasks, without explicitly modeling sessions and tasks.

Variable. This model defines each variable.
Some variables are not extracted, anything in events file + nuisance regressors.
ExtractedVariable is a subclass of Variable

Only ExtractedVariable has a foreign key to extractor
Are created by an extractor applied to a stimulus
Onsets are relative to Stimulus? or is this in predictor?

And modify a few:
Predictor. Specific instance of a variable in an analysis. Join table of variable and analysis.
Do we still need a join table of predictor and analysis given that it is specific for each?

Analysis:

transformations as a JSON type
contrasts as a JSON type

Add additional fields to /api/user response

Currently the response type is

export interface ApiUser {
  email: string;
  name: string;
  analyses: { hash_id: string, name: string }[];
}

Would like it to be:

export interface ApiUser {
  email: string;
  name: string;
  analyses: { hash_id: string, name: string, 
              status: string, description: string, modified_at: string }[];
}

We need this information for the main analysis management page (i.e. homepage). This will still be a very small payload but will save lots of additional roundtrips to the server and simplify the frontend code. See #62 about the new status field.