GithubHelp home page GithubHelp logo

senteval's Introduction

SentEval: evaluation toolkit for sentence embeddings

SentEval is a library for evaluating the quality of sentence embeddings. We assess their generalization power by using them as features on a broad and diverse set of "transfer" tasks. SentEval currently includes 17 downstream tasks. We also include a suite of 10 probing tasks which evaluate what linguistic properties are encoded in sentence embeddings. Our goal is to ease the study and the development of general-purpose fixed-size sentence representations.

(04/22) SentEval new tasks: Added probing tasks for evaluating what linguistic properties are encoded in sentence embeddings

(10/04) SentEval example scripts for three sentence encoders: SkipThought-LN/GenSen/Google-USE

Dependencies

This code is written in python. The dependencies are:

Transfer tasks

Downstream tasks

SentEval allows you to evaluate your sentence embeddings as features for the following downstream tasks:

Task Type #train #test needs_train set_classifier
MR movie review 11k 11k 1 1
CR product review 4k 4k 1 1
SUBJ subjectivity status 10k 10k 1 1
MPQA opinion-polarity 11k 11k 1 1
SST binary sentiment analysis 67k 1.8k 1 1
SST fine-grained sentiment analysis 8.5k 2.2k 1 1
TREC question-type classification 6k 0.5k 1 1
SICK-E natural language inference 4.5k 4.9k 1 1
SNLI natural language inference 550k 9.8k 1 1
MRPC paraphrase detection 4.1k 1.7k 1 1
STS 2012 semantic textual similarity N/A 3.1k 0 0
STS 2013 semantic textual similarity N/A 1.5k 0 0
STS 2014 semantic textual similarity N/A 3.7k 0 0
STS 2015 semantic textual similarity N/A 8.5k 0 0
STS 2016 semantic textual similarity N/A 9.2k 0 0
STS B semantic textual similarity 5.7k 1.4k 1 0
SICK-R semantic textual similarity 4.5k 4.9k 1 0
COCO image-caption retrieval 567k 5*1k 1 0

where needs_train means a model with parameters is learned on top of the sentence embeddings, and set_classifier means you can define the parameters of the classifier in the case of a classification task (see below).

Note: COCO comes with ResNet-101 2048d image embeddings. More details on the tasks.

Probing tasks

SentEval also includes a series of probing tasks to evaluate what linguistic properties are encoded in your sentence embeddings:

Task Type #train #test needs_train set_classifier
SentLen Length prediction 100k 10k 1 1
WC Word Content analysis 100k 10k 1 1
TreeDepth Tree depth prediction 100k 10k 1 1
TopConst Top Constituents prediction 100k 10k 1 1
BShift Word order analysis 100k 10k 1 1
Tense Verb tense prediction 100k 10k 1 1
SubjNum Subject number prediction 100k 10k 1 1
ObjNum Object number prediction 100k 10k 1 1
SOMO Semantic odd man out 100k 10k 1 1
CoordInv Coordination Inversion 100k 10k 1 1

Download datasets

To get all the transfer tasks datasets, run (in data/downstream/):

./get_transfer_data.bash

This will automatically download and preprocess the downstream datasets, and store them in data/downstream (warning: for MacOS users, you may have to use p7zip instead of unzip). The probing tasks are already in data/probing by default.

How to use SentEval: examples

examples/bow.py

In examples/bow.py, we evaluate the quality of the average of word embeddings.

To download state-of-the-art fastText embeddings:

curl -Lo glove.840B.300d.zip http://nlp.stanford.edu/data/glove.840B.300d.zip
curl -Lo crawl-300d-2M.vec.zip https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M.vec.zip

To reproduce the results for bag-of-vectors, run (in examples/):

python bow.py

As required by SentEval, this script implements two functions: prepare (optional) and batcher (required) that turn text sentences into sentence embeddings. Then SentEval takes care of the evaluation on the transfer tasks using the embeddings as features.

examples/infersent.py

To get the InferSent model and reproduce our results, download our best models and run infersent.py (in examples/):

curl -Lo examples/infersent1.pkl https://dl.fbaipublicfiles.com/senteval/infersent/infersent1.pkl
curl -Lo examples/infersent2.pkl https://dl.fbaipublicfiles.com/senteval/infersent/infersent2.pkl

examples/skipthought.py - examples/gensen.py - examples/googleuse.py

We also provide example scripts for three other encoders:

Note that for SkipThought and GenSen, following the steps of the associated githubs is necessary. The Google encoder script should work as-is.

How to use SentEval

To evaluate your sentence embeddings, SentEval requires that you implement two functions:

  1. prepare (sees the whole dataset of each task and can thus construct the word vocabulary, the dictionary of word vectors etc)
  2. batcher (transforms a batch of text sentences into sentence embeddings)

1.) prepare(params, samples) (optional)

batcher only sees one batch at a time while the samples argument of prepare contains all the sentences of a task.

prepare(params, samples)
  • params: senteval parameters.
  • samples: list of all sentences from the tranfer task.
  • output: No output. Arguments stored in "params" can further be used by batcher.

Example: in bow.py, prepare is is used to build the vocabulary of words and construct the "params.word_vect* dictionary of word vectors.

2.) batcher(params, batch)

batcher(params, batch)
  • params: senteval parameters.
  • batch: numpy array of text sentences (of size params.batch_size)
  • output: numpy array of sentence embeddings (of size params.batch_size)

Example: in bow.py, batcher is used to compute the mean of the word vectors for each sentence in the batch using params.word_vec. Use your own encoder in that function to encode sentences.

3.) evaluation on transfer tasks

After having implemented the batch and prepare function for your own sentence encoder,

  1. to perform the actual evaluation, first import senteval and set its parameters:
import senteval
params = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10}
  1. (optional) set the parameters of the classifier (when applicable):
params['classifier'] = {'nhid': 0, 'optim': 'adam', 'batch_size': 64,
                                 'tenacity': 5, 'epoch_size': 4}

You can choose nhid=0 (Logistic Regression) or nhid>0 (MLP) and define the parameters for training.

  1. Create an instance of the class SE:
se = senteval.engine.SE(params, batcher, prepare)
  1. define the set of transfer tasks and run the evaluation:
transfer_tasks = ['MR', 'SICKEntailment', 'STS14', 'STSBenchmark']
results = se.eval(transfer_tasks)

The current list of available tasks is:

['CR', 'MR', 'MPQA', 'SUBJ', 'SST2', 'SST5', 'TREC', 'MRPC', 'SNLI',
'SICKEntailment', 'SICKRelatedness', 'STSBenchmark', 'ImageCaptionRetrieval',
'STS12', 'STS13', 'STS14', 'STS15', 'STS16',
'Length', 'WordContent', 'Depth', 'TopConstituents','BigramShift', 'Tense',
'SubjNumber', 'ObjNumber', 'OddManOut', 'CoordinationInversion']

SentEval parameters

Global parameters of SentEval:

# senteval parameters
task_path                   # path to SentEval datasets (required)
seed                        # seed
usepytorch                  # use cuda-pytorch (else scikit-learn) where possible
kfold                       # k-fold validation for MR/CR/SUB/MPQA.

Parameters of the classifier:

nhid:                       # number of hidden units (0: Logistic Regression, >0: MLP); Default nonlinearity: Tanh
optim:                      # optimizer ("sgd,lr=0.1", "adam", "rmsprop" ..)
tenacity:                   # how many times dev acc does not increase before training stops
epoch_size:                 # each epoch corresponds to epoch_size pass on the train set
max_epoch:                  # max number of epoches
dropout:                    # dropout for MLP

Note that to get a proxy of the results while dramatically reducing computation time, we suggest the prototyping config:

params = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 5}
params['classifier'] = {'nhid': 0, 'optim': 'rmsprop', 'batch_size': 128,
                                 'tenacity': 3, 'epoch_size': 2}

which will results in a 5 times speedup for classification tasks.

To produce results that are comparable to the literature, use the default config:

params = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10}
params['classifier'] = {'nhid': 0, 'optim': 'adam', 'batch_size': 64,
                                 'tenacity': 5, 'epoch_size': 4}

which takes longer but will produce better and comparable results.

For probing tasks, we used an MLP with a Sigmoid nonlinearity and and tuned the nhid (in [50, 100, 200]) and dropout (in [0.0, 0.1, 0.2]) on the dev set.

References

Please considering citing [1] if using this code for evaluating sentence embedding methods.

SentEval: An Evaluation Toolkit for Universal Sentence Representations

[1] A. Conneau, D. Kiela, SentEval: An Evaluation Toolkit for Universal Sentence Representations

@article{conneau2018senteval,
  title={SentEval: An Evaluation Toolkit for Universal Sentence Representations},
  author={Conneau, Alexis and Kiela, Douwe},
  journal={arXiv preprint arXiv:1803.05449},
  year={2018}
}

Contact: [email protected], [email protected]

Related work

senteval's People

Contributors

aconneau avatar amaurysabran avatar cifkao avatar douwekiela avatar jegou avatar kahne avatar oroszgy avatar sojvai avatar stephenroller avatar tscheepers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

senteval's Issues

Can't obtain your scores with ST-LN :/

Hello, I am trying to get the same results as you for the ST-LN model (https://arxiv.org/pdf/1707.06320.pdf : first row of Table 2).

  1. I went to the layer-norm repo, downloaded the lngru_may13_1700000.npz files, added the layer norm function to the Kyros's Skipthought repo, as explained in https://github.com/ryankiros/layer-norm).
  2. And then I tried the Step 4 of https://github.com/ryankiros/skip-thoughts/tree/master/training. Now I can encode sentence with his model. I used a 20,000 word vocabulary that I had on my own skip thought implementation (I can't find the 20,000 words vocabulary used by Kyros for his lngru_may13_1700000.npz model)
  3. In SentEval, instead of 'import skipthoughts', I imported tools in your SentEval/examples/skipthought.py file. When I am running the experiments, I get pretty different results than yours (sometimes worse, sometimes a little better on certain benchmarks).
    Could you explain how you obtained these scores? Which vocabulary did you use ? Is there any special trick I am not aware of, that I didn't mention earlier?
    Thanks a lot, it would be a big help :)

Current bow.py will fail without cuda

Hi guys, just wanted to highlight that currently the bow.py example will fail when not GPUs are available due to the hard-coded .cuda conversions in senteval/tools/classifier.py. An easy fix is to init the classifier with a device variable, and simply set .to(self.device) everywhere. Thanks for making this framework available!

PyTorch 0.2 compatability

I've noticed that a couple of things break when using PyTorch 0.2. Appears to be related to reductions no longer keeping the reduction axis in the new versrion. I can submit a PR if required.

Error when running bow.py with MRPC

When I run bow.py with MRPC task, I got the following error. Any idea why it happened?

2017-10-27 18:33:33,375 : ***** Transfer task : MRPC *****

2017-10-27 18:33:55,791 : Found 3 words with word vectors, out of 3 words
2017-10-27 18:33:55,791 : Computing embedding for test
Traceback (most recent call last):
File "bow.py", line 80, in
results = se.eval(transfer_tasks)
File "../senteval/senteval.py", line 62, in eval
self.results = {x: self.eval(x) for x in name}
File "../senteval/senteval.py", line 62, in
self.results = {x: self.eval(x) for x in name}
File "../senteval/senteval.py", line 104, in eval
self.results = self.evaluation.run(self.params, self.batcher)
File "../senteval/mrpc.py", line 76, in run
mrpc_embed[key][txt_type] = np.vstack(mrpc_embed[key][txt_type])
File "/share/data/speech/zewei/anaconda3/envs/py27/lib/python2.7/site-packages/numpy/core/shape_base.py", line 237, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

STS host page returns 404

I ran the commands below on ubuntu 14:

git clone [email protected]:facebookresearch/SentEval.git
cd SentEval/data/downstream/
bash get_transfer_data.bash

In the STS dataset block, this line seems to fail:

Archive:  ./STS/data_STS12.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of ./STS/data_STS12.zip or
        ./STS/data_STS12.zip.zip, and cannot find ./STS/data_STS12.zip.ZIP, period.

I found that the host page of STS dataset returns 404 when I access to the zip files (e.g. http://ixa2.si.ehu.es/stswiki/images/4/40/STS2012-en-test.zip).

Probing Tasks Don't Work

Hi,

I try to run the bow.py example with just the probing tasks (I haven't downloaded the transfer data) and I get AssertionError: Length not in ['CR', 'MR', 'MPQA', 'SUBJ', 'SST2', 'SST5', 'TREC', 'MRPC', 'SICKRelatedness', 'SICKEntailment', 'STSBenchmark', 'SNLI', 'ImageCaptionRetrieval', 'STS12', 'STS13', 'STS14', 'STS15', 'STS16']

Thank You,
Peter

ImageCaptionRetrieval - unable to reproduce results in paper

I am unable to reproduce the results in the paper for the ImageCaptionRetrieval task.

I tried the scripts skipthought.py and infersent.py in the examples folder, setting the appropriate paths. I get very poor test scores, close to random. Here's the output for the Infersent model:

Test scores | Image to text: 0.08, 0.34, 0.84, 1058.8
Test scores | Text to image: 0.076, 0.372, 0.848, 509.0

Unable to Unzip BinClass Data

The URL works, but I now get an error trying to unzip the file:

Archive:  senteval_data/data_classif.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of senteval_data/data_classif.zip or
        senteval_data/data_classif.zip.zip, and cannot find senteval_data/data_classif.zip.ZIP, period.

Not able to download infersent1.pkl file

I am unable to download infersent1.pkl file. curl -Lo examples/infersent1.pkl https://dl.fbaipublicfiles.com/senteval/infersent/infersent1.pkl gives Failed wiriting body error.

problem with load_state_dict()

When I tried to evaluate the model trained with the updated versions of InferSent and SentEval, I got:
Traceback (most recent call last):
File "infersent.py", line 63, in
model.load_state_dict(torch.load(MODEL_PATH))
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for InferSent:
Missing key(s) in state_dict: "enc_lstm.weight_ih_l0", "enc_lstm.weight_hh_l0", "enc_lstm.bias_ih_l0", "enc_lstm.bias_hh_l0", "enc_lstm.weight_ih_l0_reverse", "enc_lstm.weight_hh_l0_reverse", "enc_lstm.bias_ih_l0_reverse", "enc_lstm.bias_hh_l0_reverse".
Unexpected key(s) in state_dict: "encoder.enc_lstm.weight_ih_l0", "encoder.enc_lstm.weight_hh_l0", "encoder.enc_lstm.bias_ih_l0", "encoder.enc_lstm.bias_hh_l0", "encoder.enc_lstm.weight_ih_l0_reverse", "encoder.enc_lstm.weight_hh_l0_reverse", "encoder.enc_lstm.bias_ih_l0_reverse", "encoder.enc_lstm.bias_hh_l0_reverse", "classifier.0.weight", "classifier.0.bias", "classifier.1.weight", "classifier.1.bias", "classifier.2.weight", "classifier.2.bias".

Could you help find out what's wrong with it?
Thanks!

SentEval S3 bucket does not exist anymore

get_transfer_data.bash is not able to get the BINCLASSIF data because the S3 bucket seems to have vanished.

BINCLASSIF='https://s3.amazonaws.com/senteval/senteval_data/datasmall_NB_ACL12.zip'

<Error>
<Code>NoSuchBucket</Code>
<Message>The specified bucket does not exist</Message>
<BucketName>senteval</BucketName>
<RequestId>CD9C211F9430BBDA</RequestId>
<HostId>
A3Uc1f5nbAEl0vTqp5mFPITfxuvPogs5m5dDm/72UaTbZ8tG5TbCJH9K7tm6JJ8T1Y1R7ZgVpXE=
</HostId>
</Error>

Runtime memory error for SNLI dataset

Hi,
I am running my model with SentEval framework and using SNLI as the target dataset, but I got runtime memory error after the embedding process is completed, and beginning the classifier training process.
here is the error message:
for the record, the runtime environment is 2* K80 GPU with CUDA, memory size is 11*2GiB, but I am not sure this process used two GPU or just one, so not sure the runtime memory size is 11GiB or 22GiB

2018-04-20 11:41:42,001 : Training pytorch-MLP-nhid0-adam-bs64 with standard validation..
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1518241554738/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "snm_senteval_dsvm_gpu.py", line 62, in <module>
    results = se.eval(transfer_tasks)
  File "../senteval/engine.py", line 56, in eval
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/engine.py", line 56, in <dictcomp>
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/engine.py", line 94, in eval
    self.results = self.evaluation.run(self.params, self.batcher)
  File "../senteval/snli.py", line 108, in run
    devacc, testacc = clf.run()
  File "../senteval/tools/validation.py", line 218, in run
    validation_data=(self.X['valid'], self.y['valid']))
  File "../senteval/tools/classifier.py", line 79, in fit
    accuracy = self.score(devX, devy)
  File "../senteval/tools/classifier.py", line 120, in score
    devX = torch.FloatTensor(devX).cuda()
  File "/anaconda/envs/py35/lib/python3.5/site-packages/torch/_utils.py", line 69, in _cuda
    return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518241554738/work/torch/lib/THC/generic/THCStorage.cu:58

I switched classifier option to sklearn logistic regression (UsePytorch = False) with RAM at 10GiB, the error still shows (no memory error), but different description.

I am just wandering, since the SNLI is a 110K size dataset, how many memory size is capable for the classifier to process the embedded sentences.

  1. Can you guys share your development experience when you testing the SNLI dataset?
  2. How much memory size were you guys using at that time?

Many Thanks

ImageCaptionRetrieval doesn't work with Infersent

Traceback (most recent call last):
File "infersent.py", line 75, in
results = se.eval(transfer_tasks)
File "../senteval/engine.py", line 59, in eval
self.results = {x: self.eval(x) for x in name}
File "../senteval/engine.py", line 59, in
self.results = {x: self.eval(x) for x in name}
File "../senteval/engine.py", line 119, in eval
self.evaluation.do_prepare(self.params, self.prepare)
File "../senteval/rank.py", line 39, in do_prepare
prepare(params, samples)
File "infersent.py", line 38, in prepare
params.infersent.build_vocab([' '.join(s) for s in samples], tokenize=False)
File "infersent.py", line 38, in
params.infersent.build_vocab([' '.join(s) for s in samples], tokenize=False)
TypeError: sequence item 0: expected str instance, bytes found

Bizzare dimension out of range Error

Hi, I've been running SentEval just fine for a couple of weeks, and today, after transferring to a new machine (with PyTorch 0.2), all of a sudden I can't evaluate on TREC anymore.

This is not a problem for any other tasks, so I'm wondering why. Does this mean my TREC data is corrupted or is this a problem with PyTorch 0.2?

How come all other evaluation tasks (that all use classifier.py) are fine except TREC...does anyone have any suggestions?

2017-08-15 20:52:30,838 : ***** Transfer task : TREC *****


2017-08-15 20:52:35,352 : Found 9548(/9767) words with glove vectors
2017-08-15 20:52:35,352 : Vocab size : 9548
2017-08-15 20:52:36,461 : Computed train embeddings
2017-08-15 20:52:36,577 : Computed test embeddings
2017-08-15 20:52:36,578 : Training pytorch-LogReg with 10-fold cross-validation
2017-08-15 20:55:49,459 : [('reg:1e-05', 80.72), ('reg:0.0001', 80.76), ('reg:0.001', 80.7), ('reg:0.01', 80.39)]
2017-08-15 20:55:49,460 : Cross-validation : best param found is reg = 0.0001 with score 80.76
2017-08-15 20:55:49,460 : Evaluating...
Traceback (most recent call last):
  File "model_run.py", line 156, in <module>
    tf.app.run()
  File "/home/xx/miniconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "model_run.py", line 150, in main
    results_transfer = se.eval(transfer_tasks)
  File "/home/xx/Documents/SentEval/senteval.py", line 56, in eval
    self.results = {x:self.eval(x) for x in name}
  File "/home/xx/Documents/SentEval/senteval.py", line 56, in <dictcomp>
    self.results = {x:self.eval(x) for x in name}
  File "/home/xx/Documents/SentEval/senteval.py", line 91, in eval
    self.results = self.evaluation.run(self.params, self.batcher)
  File "/home/xx/Documents/SentEval/trec.py", line 76, in run
    devacc, testacc, _ = clf.run()
  File "/home/xx/Documents/SentEval/tools/validation.py", line 159, in run
    yhat = clf.predict(self.test['X'])
  File "/home/xx/Documents/SentEval/tools/classifier.py", line 137, in predict
    yhat = np.append(yhat, output.data.max(1)[1].squeeze(1).cpu().numpy())
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)

MRPC dataset download bash error/ code error

Hi,
When I am running evaluation by using MRPC dataset, it got two problem:

  1. when I use the bash cmd to download the MRPC dataset, at end it shows:
./get_transfer_data.bash: line 244: cabextract: command not found
cat: senteval_data/MRPC/_2DEC3DBE877E4DB192D17C0256E90F1D: No such file or directory
cat: senteval_data/MRPC/_D7B391F9EAFF4B1B8BCE8F21B20B1B61: No such file or directory
rm: cannot remove 'senteval_data/MRPC/_*': No such file or directory

it turns out the two files in /sentevaldata/MRPC are both empty files.

  1. When it runs it throws the error
2018-04-20 15:25:36,263 : ***** Transfer task : MRPC *****


2018-04-20 15:25:36,264 : Computing embedding for test
Traceback (most recent call last):
  File "snm_senteval_dsvm_gpu.py", line 66, in <module>
    results = se.eval(transfer_tasks)
  File "../senteval/engine.py", line 56, in eval
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/engine.py", line 56, in <dictcomp>
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/engine.py", line 94, in eval
    self.results = self.evaluation.run(self.params, self.batcher)
  File "../senteval/mrpc.py", line 76, in run
___mrpc_embed[key][txt_type] = np.vstack(mrpc_embed[key][txt_type])___
  File "/anaconda/envs/py35/lib/python3.5/site-packages/numpy/core/shape_base.py", line 234, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

I checked the source code in mrpc.py line 76, it seems not the correct way of using np.vstack according to numpy.vstack

Any idea about this error?

Many thanks

infersent.py: invalid device ordinal at torch/csrc/cuda/Module.cpp:84

I get this error running infersent.py

python infersent.py
/root/miniconda2/lib/python2.7/site-packages/torch/serialization.py:284: SourceChangeWarning: source code of class 'models.BLSTMEncoder' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=84 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "infersent.py", line 56, in <module>
    params_senteval.infersent = torch.load(MODEL_PATH)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/serialization.py", line 229, in load
    return _load(f, map_location, pickle_module)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/serialization.py", line 377, in _load
    result = unpickler.load()
  File "/root/miniconda2/lib/python2.7/site-packages/torch/serialization.py", line 348, in persistent_load
    data_type(size), location)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/serialization.py", line 85, in default_restore_location
    result = fn(storage, location)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/serialization.py", line 67, in _cuda_deserialize
    return obj.cuda(device_id)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/_utils.py", line 57, in _cuda
    with torch.cuda.device(device):
  File "/root/miniconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 127, in __enter__
    torch._C._cuda_setDevice(self.idx)
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:84

NOTE. The bow.py works, and all tasks are evaluated correctly.

Add a task to SentEval

Hi,

I'm trying to add a custom sentiment classification task (SplitClassification), very similar to SNLI/SST to senteval.py. I simply put a python file on the same level as snli.py and sst.py, and imported it into senteval.py, but what I get is:

Traceback (most recent call last):
  File "examples/infersent.py", line 23, in <module>
    import senteval
  File "/afs/.../SentEval/senteval.py", line 21, in <module>
    from senti import SentiEval
ImportError: cannot import name SentiEval

infersent.py already appends the path to senteval, and when I dropped my custom SentiEval task, the whole thing works fine. What went wrong?

non-compatibility with InferSent

Hi,

The updated version of SentEval raises errors when trying to use infersent.py to evaluate the trained infersent model.

Traceback (most recent call last):
File "infersent.py", line 61, in
results = se.eval(transfer_tasks)
File "../senteval/engine.py", line 56, in eval
self.results = {x: self.eval(x) for x in name}
File "../senteval/engine.py", line 56, in
self.results = {x: self.eval(x) for x in name}
File "../senteval/engine.py", line 94, in eval
self.results = self.evaluation.run(self.params, self.batcher)
File "../senteval/binary.py", line 47, in run
embeddings = batcher(params, batch)
File "infersent.py", line 38, in batcher
tokenize=False)
File "/raid/data/oanuru/SentEval_dld/SentEval/examples/models.py", line 200, in encode
if self.use_cuda:
File "/data/dgx1/oanuru/anaconda3/envs/infersent/lib/python2.7/site-packages/torch/nn/modules/module.py", line 398, in getattr
type(self).name, name))
AttributeError: 'BLSTMEncoder' object has no attribute 'use_cuda'

Thanks!

Incomplete dataset(MR,SUBJ,SICK-E)

Part of the dataset that I download is not complete:

The MR dataset that I download only contains 74 sentences instead of 11K, also SUBJ only contain 5020 instead of 10K, same for SICK-E and SICK-MSE.

I also downloaded the dataset by the link at Readme, The SUBJ dataset only contains 5020 instead of 10K.

Can somebody take a look? Thanks.

Change batch size of encoder

Hi,

Thanks for your work! Since I am working with a relatively large model, I'm wondering if there is any way I can change the batch size of when getting the embedding for each sentence (i.e. the batch size of the batcher function). In the example codes it seems only batch size of the classifier is changed? And I manually checked that no matter what I set the batch size of the classifier to be, the batcher still takes 128 sentences at a time.

How is hyperparameter tuning done when there is no explicit train/test set? (Cross-validation case)

Hi,

I am trying to apply hyperparameter tuning on binary classification of Movie Reviews (MR) data. Since there is no explicit train/test set split for that dataset, I want to do k-fold cv like in SentEval project. But I also want to do hyperparameter optimization. I wonder how you handle this on that data (or any other data you applied cv.) In SentEval paper it says some hyperparameter tuning (such as learning rate) is performed on validation data. I would like some hints on whether you spared a custom validation data on this kind of datasets, or you did something more complex.

Thanks!

Most of the Tasks Require cuda (i.e gpu)

Hi,

Thanks for the SentEval package. Aggregating all evaluation tasks helps a lot!

However, on my CPU-only PC, most of the tasks are giving the error:

RuntimeError: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library.

Only STS tasks are getting evaluated on my system. Do the other tasks really need a GPU or is there some way they could be evaluated on CPU as well?

Thanks!

bow.py example doesn't work

Hi, I'm using Python 3.6 and Torch 0.2.0.04
I get an error when using senteval on my own scripts so I called bow.py and got this error


2017-09-13 12:06:38,010 : Found 3 words with word vectors, out of         3 words
2017-09-13 12:06:38,015 : Computing embedding for train
Traceback (most recent call last):
  File "bow.py", line 80, in <module>
    results = se.eval(transfer_tasks)
  File "../senteval/senteval.py", line 62, in eval
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/senteval.py", line 62, in <dictcomp>
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/senteval.py", line 104, in eval
    self.results = self.evaluation.run(self.params, self.batcher)
  File "../senteval/mrpc.py", line 76, in run
    mrpc_embed[key][txt_type] = np.vstack(mrpc_embed[key][txt_type])
  File "/home/dsileo/.local/lib/python3.6/site-packages/numpy/core/shape_base.py", line 237, in vstack
    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

I think it's MRPC-specific because other tasks work

Results difference between scikit-learn and pytorch

The results for the Logitstic regression using pytorch are very different than the results from scikit-learn toolkit. Is that common or is a bug at my side.
By different I mean +5-10 % with scikit always giving better results

The code compatability with latest pytorch 0.4

Just upgrade pytorch to version 0.4, and run SentEval got following errors:

2018-04-26 09:47:53,927 : ***** Transfer task : CR *****


2018-04-26 09:47:54,220 : Generating sentence embeddings
2018-04-26 09:51:28,118 : Generated sentence embeddings
2018-04-26 09:51:28,118 : Training pytorch-MLP-nhid0-adam-bs64 with (inner) 10-fold cross-validation
../senteval/tools/classifier.py:108: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  all_costs.append(loss.data[0])
../senteval/tools/classifier.py:123: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  Xbatch = Variable(devX[i:i + self.batch_size], volatile=True)
../senteval/tools/classifier.py:124: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  ybatch = Variable(devy[i:i + self.batch_size], volatile=True)
2018-04-26 09:52:07,491 : Best param found at split 1: l2reg = 1e-05                 with score 0.0
Traceback (most recent call last):
  File "use_senteval.py", line 64, in <module>
    results = se.eval(transfer_tasks)
  File "../senteval/engine.py", line 56, in eval
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/engine.py", line 56, in <dictcomp>
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/engine.py", line 94, in eval
    self.results = self.evaluation.run(self.params, self.batcher)
  File "../senteval/binary.py", line 57, in run
    devacc, testacc = clf.run()
  File "../senteval/tools/validation.py", line 103, in run
    self.testresults.append(round(100*clf.score(X_test, y_test), 2))
TypeError: type Tensor doesn't define __round__ method

Same code runs OK with pytorch 0.3 with usepytorch=True
Same code runs OK if you use sklearn intead of pytorch MLP

Not sure this round() is caused by python's native round() trick or the problem of pytorch.

Reminder: for developers who want to use SentEval framework, be caution to upgrade pytorch to version 0.4.0, especially you ware using MLP classifier of pytorch for benchmark comprarison.

Hyperparameter tuning - discrepancy between readme and code?

The readme states

For probing tasks, we used an MLP with a Sigmoid nonlinearity and and tuned the nhid (in [50, 100, 200]) and dropout (in [0.0, 0.1, 0.2]) on the dev set.

However, in the code it looks like the parameters given by the user are always used. No tuning takes place and no predefined hyperparameters are loaded. Maybe I missed something?

Should I do hyperparameter tuning to get results that are comparable to the literature?

Original sentences in bigram_shift test set

Hi,

There are no detailed annotations about which two words are exchanged in bigram_shift test set. If that possible to release the original sentences? We would like to compare these two sets. Great thanks.

Longyue Wang

Python 3.x compatibility

I'm currently incorporating SentEval in my training procedure, so it evaluates intermediate model parameters every x training steps. However I'm now finding myself hacking the code to make it work with Python 3.6, which I need for other components in my model. It would be wonderful if SentEval had out of the box support for Python 3.x.

COCO Caption Retrieval task broken under Python 3.5

With Python 3.5.2, the Image Caption Retrieval task fails because the pickled COCO data (presumably from Python 2) cannot be loaded:

Traceback (most recent call last):
  File "../../scripts/nm-senteval.py", line 99, in <module>
    main()
  File "../../scripts/nm-senteval.py", line 59, in main
    results = se.eval(transfer_tasks)
  File "/a/merkur3/cifka/miniconda3/envs/py35_tf_cpu/lib/python3.5/site-packages/senteval/engine.py", line 56, in eval
    self.results = {x: self.eval(x) for x in name}
  File "/a/merkur3/cifka/miniconda3/envs/py35_tf_cpu/lib/python3.5/site-packages/senteval/engine.py", line 56, in <dictcomp>
    self.results = {x: self.eval(x) for x in name}
  File "/a/merkur3/cifka/miniconda3/envs/py35_tf_cpu/lib/python3.5/site-packages/senteval/engine.py", line 89, in eval
    self.evaluation = ImageCaptionRetrievalEval(tpath + '/COCO', seed=self.params.seed)
  File "/a/merkur3/cifka/miniconda3/envs/py35_tf_cpu/lib/python3.5/site-packages/senteval/rank.py", line 31, in __init__
    train, dev, test = self.loadFile(task_path)
  File "/a/merkur3/cifka/miniconda3/envs/py35_tf_cpu/lib/python3.5/site-packages/senteval/rank.py", line 47, in loadFile
    cocodata = pickle.load(f)
  File "/a/merkur3/cifka/miniconda3/envs/py35_tf_cpu/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Getting error running infersent.py

Hi, I am getting the following error which I believe has something to do with cuda and cpu version. Anyone face the same problem? I would appreciate help.

Traceback (most recent call last):
  File "infersent.py", line 76, in <module>
    results = se.eval(transfer_tasks)
  File "../senteval/engine.py", line 59, in eval
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/engine.py", line 59, in <dictcomp>
    self.results = {x: self.eval(x) for x in name}
  File "../senteval/engine.py", line 121, in eval
    self.results = self.evaluation.run(self.params, self.batcher)
  File "../senteval/mrpc.py", line 74, in run
    embeddings = batcher(params, batch)
  File "infersent.py", line 43, in batcher
    embeddings = params.infersent.encode(sentences, bsize=params.batch_size, tokenize=False)
  File "/tilde/fbrahman/parc/SentEval/examples/models.py", line 226, in encode
    (batch, lengths[stidx:stidx + bsize])).data.cpu().numpy()
  File "/tilde/fbrahman/parc/SentEval/examples/models.py", line 68, in forward
    sent_output = self.enc_lstm(sent_packed)[0]  # seqlen x batch x 2*nhid
  File "/tilde/fbrahman/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/tilde/fbrahman/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 192, in forward
    output, hidden = func(input, self.all_weights, hx, batch_sizes)
  File "/tilde/fbrahman/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 324, in forward
    return func(input, *fargs, **fkwargs)
  File "/tilde/fbrahman/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 244, in forward
    nexth, output = func(input, hidden, weight, batch_sizes)
  File "/tilde/fbrahman/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 87, in forward
    hy, output = inner(input, hidden[l], weight[l], batch_sizes)
  File "/tilde/fbrahman/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 159, in forward
    hidden = inner(step_input, hidden, *weight)
  File "/tilde/fbrahman/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 34, in LSTMCell
    gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
  File "/tilde/fbrahman/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py", line 1024, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #4 'mat1'

cabextract: command not found

I did this on Ubuntu16.04LTS

git clone https://github.com/facebookresearch/SentEval
cd SentEval/
cd data/
./get_transfer_data.bash
...
./get_transfer_data.bash: line 194: cabextract: command not found
cat: senteval_data/MRPC/_2DEC3DBE877E4DB192D17C0256E90F1D: No such file or directory
cat: senteval_data/MRPC/_D7B391F9EAFF4B1B8BCE8F21B20B1B61: No such file or directory
rm: cannot remove 'senteval_data/MRPC/_*': No such file or directory

`byte` type sentences fed in for SNLI task

The source of the problem is that there is a line in loadFile that encodes str to bytes. I suppose that the original author intended to convert sentence encodings from latin-1 to utf-8, but codecs already does that for us, so .encode('utf-8') is unnecessary and might even cause further errors when the user performs string operations on the object.

I have made a pull request on the simple fix.

What's the purpose of SentEval and how can i integrate it it with my inferSent model?

Hi,

Can you please elaborate how can i use SentEval integrated with my inferSent model to get the sentence embeddings for my own sentences corpus?

It will also be great to get more insights on interpreting and using the results for similar sentences for our business requirement(sentiment analysis for the keywords entered) which are as follows for the STS16 transfer task used:

{
'STS16':{
'answer-answer':{
'pearson':(0.25246563397263544,
4.702985203895257e-05 ),
'spearman':SpearmanrResult(correlation=0.3079584675460508,
pvalue=5.556290983707961e-07),
'nsamples':254
},
'headlines':{
'pearson':(0.3635692171614175,
3.385973583781225e-09 ),
'spearman':SpearmanrResult(correlation=0.3514220638099597,
pvalue=1.1960065660420662e-08),
'nsamples':249
},
'plagiarism':{
'pearson':(0.4232950489009525,
2.050858700377934e-11 ),
'spearman':SpearmanrResult(correlation=0.4132543728895399,
pvalue=6.698065484617935e-11),
'nsamples':230
},
'postediting':{
'pearson':(0.6907186645066123,
6.277476490673788e-36 ),
'spearman':SpearmanrResult(correlation=0.7436531853554167,
pvalue=3.323825847627377e-44),
'nsamples':244
},
'question-question':{
'pearson':(-0.028058870107370452,
0.6867356266558649 ),
'spearman':SpearmanrResult(correlation=-0.042384317092789704,
pvalue=0.5422977635039196),
'nsamples':209
},
'all':{
'pearson':{
'mean':0.3403979388868495,
'wmean':0.34964917170036636
},
'spearman':{
'mean':0.35478075450163543,
'wmean':0.3654022810828834
}
}
}
}

SNLI issue

Hi, my evaluation is not working with SNLI dataset. The problem is there: https://github.com/facebookresearch/SentEval/blob/master/senteval/snli.py#L95

this list should be converted to a numpy array, afterwards everything works well.

Below the logs:

TypeError                                 Traceback (most recent call last)
<ipython-input-5-70817b0caa6d> in <module>
     30 #                   'BigramShift', 'Tense', 'SubjNumber', 'ObjNumber',
     31 #                   'OddManOut', 'CoordinationInversion']
---> 32 results = se.eval(transfer_tasks)

/data/jchledowski/ss/shooka/SentEval/senteval/engine.py in eval(self, name)
     57         # evaluate on evaluation [name], either takes string or list of strings
     58         if (isinstance(name, list)):
---> 59             self.results = {x: self.eval(x) for x in name}
     60             return self.results
     61 

/data/jchledowski/ss/shooka/SentEval/senteval/engine.py in <dictcomp>(.0)
     57         # evaluate on evaluation [name], either takes string or list of strings
     58         if (isinstance(name, list)):
---> 59             self.results = {x: self.eval(x) for x in name}
     60             return self.results
     61 

/data/jchledowski/ss/shooka/SentEval/senteval/engine.py in eval(self, name)
    119         self.evaluation.do_prepare(self.params, self.prepare)
    120 
--> 121         self.results = self.evaluation.run(self.params, self.batcher)
    122 
    123         return self.results

/data/jchledowski/ss/shooka/SentEval/senteval/snli.py in run(self, params, batcher)
    107 
    108         clf = SplitClassifier(self.X, self.y, config)
--> 109         devacc, testacc = clf.run()
    110         logging.debug('Dev acc : {0} Test acc : {1} for SNLI\n'
    111                       .format(devacc, testacc))

/data/jchledowski/ss/shooka/SentEval/senteval/tools/validation.py in run(self)
    216                 # TODO: Find a hack for reducing nb epoches in SNLI
    217                 clf.fit(self.X['train'], self.y['train'],
--> 218                         validation_data=(self.X['valid'], self.y['valid']))
    219             else:
    220                 clf = LogisticRegression(C=reg, random_state=self.seed)

/data/jchledowski/ss/shooka/SentEval/senteval/tools/classifier.py in fit(self, X, y, validation_data, validation_split, early_stop)
     67         # Preparing validation data
     68         trainX, trainy, devX, devy = self.prepare_split(X, y, validation_data,
---> 69                                                         validation_split)
     70 
     71         # Training

/data/jchledowski/ss/shooka/SentEval/senteval/tools/classifier.py in prepare_split(self, X, y, validation_data, validation_split)
     52 
     53         trainX = torch.from_numpy(trainX).to(device, dtype=torch.float32)
---> 54         trainy = torch.from_numpy(trainy).to(device, dtype=torch.int64)
     55         devX = torch.from_numpy(devX).to(device, dtype=torch.float32)
     56         devy = torch.from_numpy(devy).to(device, dtype=torch.int64)

TypeError: expected np.ndarray (got list)

Transfer Performance on MRPC dataset is slight low than reported

As is shown in https://github.com/facebookresearch/InferSent, the MRPC result is 76.2/83.1 based on the released inferent1 model, but my result is 74.14/82.06, and my parameter setting as below:
params_senteval = {'task_path': '/data3/zjm/dataset/MSRParaphraseCorpus', 'usepytorch': True, 'kfold': 10}
params_senteval['classifier'] = {'nhid': 512, 'optim': 'adam', 'batch_size': 128,
'tenacity': 3, 'epoch_size': 2}
{u'acc': 74.14, u'f1': 82.06, u'ntest': 1725, u'devacc': 75.370000000000005, u'ndev': 4076}

Could you provide the default setting for MRPC dataset?

Thank you very much!

Problem with Padding in BLSTMEncoder and equal length inputs

Hi,

Since it is possible that equal length sentences appear as an input in [1] and np.sort/np.argsort use quicksort by default which is an unstable sorting algorithm, shouldn't quicksort function calls in #L45 and #L46 be replaced by mergesort so the input and output have same order for equal keys?

Changed to follwoings:

sent_len, idx_sort = np.sort(sent_len, kind='mergesort')[::-1], np.argsort(-sent_len, kind='mergesort')
idx_unsort = np.argsort(idx_sort, kind='mergesort')

[1] https://github.com/facebookresearch/SentEval/blob/master/examples/models.py#L44

have issue when run bow.py

Traceback (most recent call last):
File "bow.py", line 111, in
results = se.eval(transfer_tasks)
File "../senteval/engine.py", line 59, in eval
self.results = {x: self.eval(x) for x in name}
File "../senteval/engine.py", line 59, in
self.results = {x: self.eval(x) for x in name}
File "../senteval/engine.py", line 121, in eval
self.results = self.evaluation.run(self.params, self.batcher)
File "../senteval/binary.py", line 49, in run
enc_input = np.vstack(enc_input)
File "/home/zzh/wyk_paper/software/anaconda3/envs/sentEval/lib/python3.6/site-packages/numpy/core/shape_base.py", line 234, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

when run bow's Transfer task : MR ,it's appear this issue.how to solve it and why?

Segmentation with spearmanr

I got segmentation fault everytime I run the STS tasks. After some investigation I found that the error comes from the the spearmanr function in sts.py.

I am using python 3.7, scipy 1.2.0, pytorch 0.4.1.

Here are the logs:

sys_scores
[0.53282106, 0.56179583, 0.63689476, 0.52020013, 0.70368159, 0.85529864, 0.69405955, 0.80750644, 0.50071383, 0.78061992, 0.73714286, 0.74759108, 0.77318597, 0.61328888, 0.82235664, 0.67099679, 0.46783599, 0.87001014, 0.67212039, 0.52007401, 0.60244519, 0.56914616, 0.55294085, 0.55580038, 0.7000789, 0.37525794, 0.6046868, 0.53792733, 0.57309884, 0.78869909, 0.52774405, 0.69139481, 0.6898036, 0.6959576, 0.62821692, 0.68083483, 0.56105924, 0.59035391, 0.54674476, 0.73587191, 0.75334722, 0.74477309, 0.80102426, 0.64415872, 0.56861484, 0.52249408, 0.58139539, 0.82671714, 0.74536121, 0.77357376, 0.70629591, 0.40831181, 0.66388333, 0.60138464, 0.82425904, 0.74319339, 0.67951113, 0.70138073, 0.77270043, 0.78673518, 0.71917921, 0.78075945, 0.7851581, 0.70233256, 0.78714317, 0.58923429, 0.64612728, 0.57743275, 0.67893326, 0.77131927, 0.54793221, 0.79382974, 0.7107833, 0.65085924, 0.43631819, 0.79180443, 0.69630986, 0.8032608, 0.6845845, 0.67379141, 0.60308737, 0.49806148, 0.7866559, 0.49621186, 0.80384904, 0.79054737, 0.834279, 0.59118289, 0.71131784, 0.74815035, 0.7889083, 0.61007786, 0.70626193, 0.31144464, 0.52034998, 0.85465163, 0.64460927, 0.72596329, 0.62866676, 0.76025343, 0.83724588, 0.77664566, 0.75103313, 0.61436921, 0.54367328, 0.80471706, 0.6748184, 0.67683917, 0.71606731, 0.74000257, 0.53928262, 0.68466693, 0.71783888, 0.70335972, 0.71639985, 0.67988008, 0.61855751, 0.81000304, 0.87668842, 0.69470048, 0.76580679, 0.48343068, 0.72444898, 0.66337252, 0.77593547, 0.62781852, 0.77682769, 0.46797526, 0.70053804, 0.65706426, 0.63090724, 0.77131999, 0.56111509, 0.72701353, 0.72136265, 0.72380322, 0.71729803, 0.72381616, 0.76242697, 0.7080372, 0.39157203, 0.65572792, 0.69320512, 0.73564279, 0.76948404, 0.67487121, 0.57728362, 0.75574845, 0.63954127, 0.66635221, 0.71936178, 0.52472252, 0.57108057, 0.66900474, 0.85823673, 0.73608321, 0.77027607, 0.61897272, 0.74442381, 0.67931819, 0.85658896, 0.66424078, 0.68785131, 0.64514256, 0.69124854, 0.66538209, 0.77916342, 0.74337095, 0.78974795, 0.4712033, 0.70299804, 0.62414801, 0.73369694, 0.64495879, 0.6124332, 0.8495757, 0.6273331, 0.78814358, 0.66349941, 0.72749186, 0.73088801, 0.71480709, 0.66348439, 0.71329457, 0.59407598, 0.67095512, 0.59238553, 0.58496535, 0.64888179]
gs_scores
[0.2, 2.0, 2.4, 0.0, 2.6, 3.4, 0.0, 1.0, 1.6, 2.2, 2.0, 1.4, 3.8, 1.2, 1.6, 0.0, 0.4, 1.0, 2.6, 3.4, 1.2, 0.4, 3.2, 2.4, 1.6, 0.0, 1.8, 0.6, 0.2, 2.2, 3.4, 1.4, 2.2, 1.0, 2.6, 0.0, 2.4, 2.6, 0.2, 1.4, 2.6, 2.2, 2.6, 2.0, 3.2, 0.8, 0.4, 2.0, 2.0, 2.2, 1.6, 2.8, 2.4, 3.2, 2.8, 3.0, 0.4, 0.0, 2.0, 0.4, 1.8, 3.2, 2.8, 0.8, 2.8, 0.2, 2.66666666666667, 0.0, 0.8, 0.8, 1.8, 1.8, 4.0, 0.4, 0.0, 0.8, 0.6, 0.8, 1.2, 2.4, 0.0, 0.0, 3.4, 1.0, 2.6, 2.0, 1.8, 0.8, 0.8, 1.6, 0.4, 1.5, 0.6, 0.6, 2.0, 2.0, 0.0, 1.4, 2.6, 2.4, 0.8, 1.2, 1.6, 0.2, 0.8, 1.8, 1.0, 1.5, 1.2, 2.4, 2.0, 1.0, 1.2, 1.6, 0.0, 2.2, 0.6, 1.4, 2.6, 0.0, 2.2, 1.6, 0.8, 2.6, 1.4, 0.0, 0.8, 0.2, 1.4, 1.2, 0.8, 2.4, 0.0, 2.2, 1.0, 2.2, 1.2, 0.6, 2.0, 1.6, 0.0, 0.8, 1.6, 3.0, 1.6, 0.0, 0.0, 2.6, 0.0, 1.6, 0.4, 0.6, 2.8, 2.6, 2.0, 1.0, 0.8, 1.8, 2.8, 2.8, 1.6, 1.33333333333333, 1.8, 1.8, 2.0, 1.6, 0.8, 0.6, 1.2, 1.0, 2.2, 0.0, 3.0, 0.2, 0.8, 2.4, 1.2, 2.25, 0.0, 1.2, 3.2, 2.4, 0.0, 0.4, 0.8, 0.0, 1.6, 1.2, 1.4]
spearmanr
<function spearmanr at 0x7fe1abaf5488>
spearmanr(sys_scores, gs_scores)
6735 Segmentation fault

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.