brain-score / vision Goto Github PK

A framework for evaluating models on their alignment to brain and behavioral measurements (50+ benchmarks)

License: MIT License

Python 19.21% MATLAB 0.93% C 0.40% Jupyter Notebook 79.46%

vision's Issues

Neural Fit identity score low

Using neural fit to go from hvm to itself (i.e. neural_fit(hvm, hvm)) yields a comparably low score of around .78 (see this unit test).
Did others experience something similar (@qbilius)? Is that an issue with the regression?

test data alignment

the RDMs seem to be different after the recent assembly re-formatting, suggesting that there's an error in the data.
Make sure that the data is aligned properly.

reset_index bug caused by DataAssembly constructor

DataAssembly.reset_index() does not work, because internally DataArray._replace() constructs a new object, and the DataAssembly constructor recreates the MultiIndexes.

Move configuration external to code

A .ini or .yml (or whatever) file, and infrastructure for making config stuff available throughout the project.

Make a config.example.yml, add 'config.yml' to .gitignore so users can add credentials without accidentally committing.

Things to include:

directory for local data cache (e.g. needs to be a larger mount than home on OpenMind)
SQLite file location
Postgres connection information
Logging level and location
AWS credentials and which profile to use

Add a function to list assembly names

specify git dependencies with PEP 508

PEP 508 allows the specification of install_requires requirements like so:

install_requires=[
"result_caching @ git+https://github.com/mschrimpf/result_caching", 
]

This functionality is supported since pip 18.1 and removes the need for --process-dependency-links (which will be removed in pip 19).
We need to update at least our setup.py accordingly.

Add logging

Add a function to get index level names (pseudo-coordinates) like coords_for_dim()

There is no convenient method to obtain the names of the MultiIndex levels for a given dimension.

Note that it's not simple, as a dimension may have multiple MultiIndexes, and (I think) a given MultiIndex may apply to more than one dimension.

Gather indexes when constructing DataAssembly

fix neuralfit-pca

it seems like there is an issue in the interplay between PCA and the neuralfit.
Specifically, test_metrics.test_neural_fit_metric_pca100 scores very low (<0.15) and I observed something similar when PCA'ing down to just 167 (= len(neuroids) - 1).

new version of xarray gives an error

(Submitting as requested)
If I use the most recent xarray version, it gives an error that 'Score' does not have an attribute 'indexes'. Using older version of xarray won't give the error.

Website not mobile-friendly

Table doesn't fit on screen yet there is no horizontal scrollbar.
Tested on Firefox.

Possible solution (wild guess):

body {
  overflow:auto;
}

Preserve relevant coords after GroupBy

Various suggestions

Bugs

test_mkgu has two unused declarations of test_load

Style

type is a reserved word in Python; consider replacing with kind
Benchmark.calculate, Metric.apply etc might be better served by __call__ method
Things like _fetcher_types might be better declared at the top and in capitals (FETCHER_TYPES)
I strongly suggest not using rst for formatting README and HISTORY. Nobody uses this format. Markdown is much more common.
return 0 (e.g., in metrics.py) is not a thing in Python

Other

Docs don't exist yet but aren't they supposed to render automatically on ReadTheDocs?
Any chance lookup.dbcould be a simple csv file? Since it is unlikely to grow too big, there would be no performance penalty, but there would be an advantage of being able to quickly see dataset names and available assets.
A Jupyter Notebook with an example is much desired.

yield coord, (dims, value) from walk_coords

right now coord, dims, value is being yielded

Behavioral benchmarks are not resizing to model degrees

package result of RSA/RDMCharacterization in assembly again

on the same note, make sure that we compute the correlation over neuroids and not ids.

Interest in adding similarity datasets?

From http://cocosci.princeton.edu/papers/Peterson_et_al-2018-Cognitive_Science.pdf

travis unit-test examples

We should automatically run our examples with travis to make sure they still work.
At the moment, they keep getting out-dated because there is no automatic test in place to ensure they run.

There is an example of such a setup here: https://github.com/ghego/travis_anaconda_jupyter

documentation: Move some jupyter notebook tutorials

Some of the jupyter notebooks illustrating the use of brainio code are in this repo, and they should be moved to brainio or brainio-contrib repo

Add behavioral metrics

@qbilius' implementation: https://github.com/qbilius/streams/blob/master/streams/metrics/behav_cons.py#L269

show-case how to score models

given the activations of a model, show how they can be fed into a Benchmark to retrieve a score for that model

running metrics is slow

ideas for improving:

parallelize, e.g. across cross-validations
provide a "quick-and-dirty" way of evaluating, e.g. run the fit without cross-validation

StimulusSet selection yields DataFrame

(transferring @qbilius' issue from https://github.com/mschrimpf/brain-score/issues/5)

df = brainscore.get_stimulus_set('dicarlo.hvm')
df = brainscore.stimuli.StimulusSet(df[df.variation == 6])

df is now a pandas DataFrame, not StimulusSet.
expected output is that df remains a StimulusSet.

Implement URL fetcher

Add environment variable for choosing boto3 profile

Profile can then be used with session = boto3.Session(profile_name='mturk')

metrics

RDM
Neural fit

Add a method to save a DataAssembly to NetCDF, flattening MultiIndexes

There is no convenient method to save a DataAssembly.

setting attributes in StimulusSet

just saw this and was wondering about it: in https://github.com/dicarlolab/mkgu/blob/d054d65fc2410587c91fac066c21f82f1d32b0df/mkgu/stimuli.py#L10, should this not be moved to the __init__ constructor?

numpy.linalg.LinAlgError: SVD did not converge

For models such as CORnet-R with many zeros in their activations, PLS regression fails with a numpy.linalg.LinAlgError: SVD did not converge.
The error originates from NaNs in the regression weights which in turn stem from https://github.com/scikit-learn/scikit-learn/blob/a7a834bdb7a51ec260ff005715d50ab6ed01a16b/sklearn/cross_decomposition/pls_.py#L67 where x_score = 0 and thus y_weights = ... / 0 = NaN.

Solution approaches

~~pad activations with epsilon -- failed due to scaling~~
~~pad activations with random numbers -- yields a score of 0 due to randomness~~
~~drop neuroids with zero values -- not sure why, but it failed. maybe due to scaling again~~
~~drop neuroids with zero values after scaling -- it seems there's only 42 unique values in the first place, after this step, there's only one~~

automated API

following up on #1, it would be great if we could get some documentation. Ideally we'd create this automatically through readthedocs.io

Handle public AWS S3 resources gracefully

mkgu gives NoCredentialsError if there are no AWS credentials files, even if the resource is public.

Add StimulusSets

With lookup
With fetching
Add reference(s) to StimulusSets in DataAssembly class and in lookup
Do some verification that relevant coordinates in a DataAssembly are valid in the associated StimulusSet

Incorporate Gallant V1,V2 data

We should have the whole ventral stream in mkgu rather sooner than later, even if chunked.
Right now, we have V4-IT and we can get V1 and V2 data from Jack Gallant's lab.
V1: http://crcns.org/data-sets/vc/pvc-4/about (paper)
V2: http://crcns.org/data-sets/vc/v2-1/about-v2-1 (paper)

(also nice to see the approach of a different data viewer)

Where should dataset-specific code go?

ToliasCadena benchmarks do not run on visual_degrees branch

To correct add the following field to the stimulus_set_degrees dictionary on brainscore/init.py

'tolias.Cadena2017': 2

necessity to configure AWS credentials

It seems like users still have to configure AWS credentials even if they only access public resources.
Can we get rid of that requirement @jjpr-mit? If yes, how long will it take?
It's also okay if we put a note in the README detailing how to configure AWS.

ResolvePackageNotFound:
  - netcdf4==1.2.4=np113py36_1

After removing that package:

UnsatisfiableError: The following specifications were found to be in conflict:
  - libnetcdf==4.4.1=1 -> jpeg=9
  - qt==5.6.2=2

sqlite I/O error

Sometimes (!), mkgu raises an sqlite3 disk I/O error.
This does not occur all the time, it seems like it mostly happens when running jobs in batches. Maybe concurrent accesses to sqlite do not work?

Traceback (most recent call last):
  File "neural_metrics/compare.py", line 39, in main
    hvm = mkgu.get_assembly(name="HvM")
  File "/om/user/msch/miniconda3/envs/neural-metrics/lib/python3.6/site-packages/mkgu/__init__.py", line 11, in get_assembly
    return fetch.get_assembly(name)
  File "/om/user/msch/miniconda3/envs/neural-metrics/lib/python3.6/site-packages/mkgu/fetch.py", line 247, in get_assembly
    assy_record = get_lookup().lookup_assembly(name)
  File "/om/user/msch/miniconda3/envs/neural-metrics/lib/python3.6/site-packages/mkgu/fetch.py", line 129, in lookup_assembly
    cursor.execute(self.sql_lookup_assy, (name,))
sqlite3.OperationalError: disk I/O error

put metrics in brainscore/init's all

make sure the following works:

import brainscore
brainscore.metrics

Fix proxy behavior of DataAssembly to return a DataAssembly when appropriate

return aggregate Score object

contents:

region
mean, std

non-parametric, such as the RDMSimilarity: compute similarity of two assemblies directly
parametric, such as the NeuralFit: first fit on a training set, then predict on a test set and compute similarity based on the predictions

We also have additional utility on top of the simple case:

compute the outer-product of combinations of all adjacent dimensions (i.e. dimensions that are not used for the Similarity such as region).
cross-validate with several folds over a dimension that Similarity is computed over, e.g. object_name as part of presentation

There are several ways to organize the code around this:

sub-classing (this is the way it is organized now): there is a parent class OuterCrossValidationSimilarity that all Similarity classes need to inherit from. This parent class implements (1) and (2) from above and sub-classes only need to implement the simple case. Drawbacks:
1.1 harder to test since all sub-classes drag the parent code with them
1.2 we can't just implement apply in our sub-classes but need to adjust the method name
1.3 we can't compute similarity without the extra baggage, i.e. everything is always cross-validated
chaining: each Similarity class implements exactly one operation in apply. A chain operator then takes all these classes, applies them one after another and outputs only the final result. The result here is a list of assemblies which are then fed into a Score in Similarity.__call__. However, I don't know how to represent parametric and non-parametric similarities with this approach (one has to fit, predict, compare_prediction, the other just has to compare)
extract the computor from Similarity: all the specialized handling ((1) and (2) from utilities) goes into Similarity sub-classes, operation on simplified assemblies goes into a Computor class. Hard to separate the two though, for instance Similarity still needs to call fit, predict etc.

For now, approach (1) works but after NIPS, I would like to revisit the structuring here.

brain-score / vision Goto Github PK

vision's Issues

Bugs

Style

Other

Recommend Projects

Recommend Topics

Recommend Org

Jobs