nyu-dl / dl4ir-doc2query Goto Github PK

View Code? Open in Web Editor NEW

159.0 159.0 15.0 136 KB

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

dl4ir-doc2query's People

Contributors

Stargazers

Watchers

Forkers

adedzy misoknisky chriskuei zeynepakkalyoncu shannonyu giangpol ashutosh-adhikari balatatree aiuidotdev mkhural anshiquanshu66 kklingeman weng8232

dl4ir-doc2query's Issues

dtype mismatch in beam_search.py

Sorry for the rudimentary question.
For predicting queries, I run the following code as instructed in README.md:

python ./OpenNMT-py/translate.py \
  -gpu 0 \
  -model ${DATA_DIR}/doc2query_step_10000.pt \
  -src ${DATA_DIR}/opennmt_format/src-collection.txt \
  -output ${DATA_DIR}/opennmt_format/pred-collection_beam5.txt \
  -batch_size 32 \
  -beam_size 5 \
  --n_best 5 \
  -replace_unk \
  -report_time

Then, I got an error as follows:

Traceback (most recent call last):
  File "/home/work/doc2query/./OpenNMT-py/translate.py", line 46, in <module>
    main(opt)
  File "/home/work/doc2query/./OpenNMT-py/translate.py", line 25, in main
    translator.translate(
  File "/home/work/doc2query/OpenNMT-py/onmt/translate/translator.py", line 314, in translate
    batch_data = self.translate_batch(
  File "/home/work/doc2query/OpenNMT-py/onmt/translate/translator.py", line 498, in translate_batch
    return self._translate_batch(
  File "/home/work/doc2query/OpenNMT-py/onmt/translate/translator.py", line 650, in _translate_batch
    beam.advance(log_probs, attn)
  File "/home/work/doc2query/OpenNMT-py/onmt/translate/beam_search.py", line 155, in advance
    torch.div(self.topk_ids, vocab_size, out=self._batch_index)
RuntimeError: result type Float can't be cast to the desired output type Long

ref:

self._batch_index = torch.empty([batch_size, beam_size],
  dtype=torch.long, device=mb_device)

It seems dtype of self._batch_index is torch.float , but it causes another error.

Does anyone know how to fix it?
Thanks in advance!

Trained model file of TREC CAR

Hi, I noticed that the result of trec car 2017 is also presented on your paper. Would you mind release the trained model file of the Trec CAR, as you already done of the MS MARCO? Thanks a lot!

DATA_DIR and DATADIR in readme

Is it the same path?
Thank you!
@rodrigonogueira4

Unable to run evaluation on colab

Hi,

I tried to run evaluation using the provided data and checkpoint, but run into the error on the line for item in result: and the full stack trace below:

Original stack trace for 'input_pipeline_task0/while/IteratorGetNext':

  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 664, in launch_instance
    app.start()
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-300050db5ac8>", line 2, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "<ipython-input-8-f2c74e0682f2>", line 100, in main
    print(list(result)[0])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3072, in predict
    yield_single_examples=yield_single_examples):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 622, in predict
    features, None, ModeKeys.PREDICT, self.config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2857, in _call_model_fn
    config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3148, in _model_fn
    input_holders.generate_infeed_enqueue_ops_and_dequeue_fn())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1428, in generate_infeed_enqueue_ops_and_dequeue_fn
    self._invoke_input_fn_and_record_structure())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1547, in _invoke_input_fn_and_record_structure
    wrap_fn(device=host_device, op_fn=enqueue_ops_fn))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3693, in _wrap_computation_in_while_loop_with_stopping_signals
    parallel_iterations=1)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/control_flow_ops.py", line 2753, in while_loop
    return_same_structure)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/control_flow_ops.py", line 2245, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/control_flow_ops.py", line 2170, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3681, in computation
    return_value = op_fn()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1021, in enqueue_ops_fn
    features, labels = inputs.features_and_labels()  # Calls get_next()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3923, in features_and_labels
    inputs_with_signals = self._iterator.get_next()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 426, in get_next
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 2518, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

I was unable to debug the error, so I was wondering are you able to reproduce this or this is a colab issue?

A simple question about the paper：What is the training data for doc2query transformer? Thank you very much!

Sorry I'm new to this paper.
What is the training data for doc2query transformer?
I guess the training data is the origin doc and its corresponding query pair.
Thank you!!
@rodrigonogueira4

Big question, BERT ranking need triple to train, but...

https://msmarco.blob.core.windows.net/msmarcoranking/collectionandqueries.tar.gz
The data above do not contains the negative doc.

@rodrigonogueira4 Thank you!!!

The data is from readme here: https://github.com/nyu-dl/dl4ir-doc2query#ms-marco

ERROR

[tong.guo@gpu-4 dl4ir-doc2query-master]$ /data4/tong.guo/Py36/bin/python3  ./OpenNMT-py/preprocess.py   -train_src ${DATA_DIR}/opennmt_format/src-train.txt   -train_tgt ${DATA_DIR}/opennmt_format/tgt-train.txt   -valid_src ${DATA_DIR}/opennmt_format/src-dev.txt   -valid_tgt ${DATA_DIR}/opennmt_format/tgt-dev.txt   -save_data ${DATA_DIR}/opennmt_format/preprocessed   -src_seq_length 10000   -tgt_seq_length 10000   -src_seq_length_trunc 400   -tgt_seq_length_trunc 100   -dynamic_dict   -share_vocab   -src_vocab_size 32000   -tgt_vocab_size 32000   -shard_size 100000
[2019-07-05 16:19:16,352 INFO] Extracting features...
[2019-07-05 16:19:16,352 INFO]  * number of source features: 0.
[2019-07-05 16:19:16,352 INFO]  * number of target features: 0.
[2019-07-05 16:19:16,352 INFO] Building `Fields` object...
[2019-07-05 16:19:16,352 INFO] Building & saving training data...
[2019-07-05 16:19:16,352 INFO] Reading source and target files: ./data/opennmt_format/src-train.txt ./data/opennmt_format/tgt-train.txt.
[2019-07-05 16:19:16,439 INFO] Building shard 0.
[2019-07-05 16:19:35,996 INFO]  * saving 0th train data shard to ./data/opennmt_format/preprocessed.train.0.pt.
[2019-07-05 16:20:19,033 INFO] Building shard 1.
[2019-07-05 16:20:38,667 INFO]  * saving 1th train data shard to ./data/opennmt_format/preprocessed.train.1.pt.
[2019-07-05 16:21:20,578 INFO] Building shard 2.
[2019-07-05 16:21:38,967 INFO]  * saving 2th train data shard to ./data/opennmt_format/preprocessed.train.2.pt.
[2019-07-05 16:22:18,861 INFO] Building shard 3.
[2019-07-05 16:22:38,026 INFO]  * saving 3th train data shard to ./data/opennmt_format/preprocessed.train.3.pt.
[2019-07-05 16:23:17,526 INFO] Building shard 4.
[2019-07-05 16:23:36,450 INFO]  * saving 4th train data shard to ./data/opennmt_format/preprocessed.train.4.pt.
[2019-07-05 16:24:14,980 INFO] Building shard 5.
[2019-07-05 16:24:21,203 INFO]  * saving 5th train data shard to ./data/opennmt_format/preprocessed.train.5.pt.
[2019-07-05 16:24:33,794 INFO] Building & saving validation data...
[2019-07-05 16:24:33,794 INFO] Reading source and target files: ./data/opennmt_format/src-dev.txt ./data/opennmt_format/tgt-dev.txt.
[2019-07-05 16:24:33,798 INFO] Building shard 0.
[2019-07-05 16:24:35,017 INFO]  * saving 0th valid data shard to ./data/opennmt_format/preprocessed.valid.0.pt.
[2019-07-05 16:24:37,548 INFO] Building & saving vocabulary...
[2019-07-05 16:24:56,466 INFO]  * reloading ./data/opennmt_format/preprocessed.train.0.pt.
[2019-07-05 16:25:20,221 INFO]  * reloading ./data/opennmt_format/preprocessed.train.1.pt.
[2019-07-05 16:25:43,888 INFO]  * reloading ./data/opennmt_format/preprocessed.train.2.pt.
[2019-07-05 16:26:07,105 INFO]  * reloading ./data/opennmt_format/preprocessed.train.3.pt.
[2019-07-05 16:26:30,643 INFO]  * reloading ./data/opennmt_format/preprocessed.train.4.pt.
[2019-07-05 16:26:41,342 INFO]  * reloading ./data/opennmt_format/preprocessed.train.5.pt.
[2019-07-05 16:26:42,716 INFO]  * tgt vocab size: 32004.
[2019-07-05 16:26:45,684 INFO]  * src vocab size: 32002.
[2019-07-05 16:26:45,684 INFO]  * merging src and tgt vocab...
[2019-07-05 16:26:50,727 INFO]  * merged vocab size: 32004.
Traceback (most recent call last):
  File "./OpenNMT-py/preprocess.py", line 155, in <module>
    main(opt)
  File "./OpenNMT-py/preprocess.py", line 140, in main
    build_save_vocab(train_dataset_files, fields, opt)
  File "./OpenNMT-py/preprocess.py", line 91, in build_save_vocab
    torch.save(fields, vocab_path)
  File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/serialization.py", line 218, in save
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/serialization.py", line 143, in _with_file_like
    return body(f)
  File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/serialization.py", line 218, in <lambda>
    return _with_file_like(f, "wb", lambda f: _save(obj, f, pickle_module, pickle_protocol))
  File "/data4/tong.guo/Py36/lib/python3.6/site-packages/torch/serialization.py", line 291, in _save
    pickler.dump(obj)
_pickle.PicklingError: Can't pickle <function Field.<lambda> at 0x7ffaad122d90>: attribute lookup Field.<lambda> on torchtext.data.field failed

torch 1.0.0
torchfile 0.1.0
torchtext 0.3.1
torchvision 0.2.1

OpenNMT-py 0.8.2

python3.6

Thank you!
@rodrigonogueira4

For text match problem, what is the different between question-question match and question-answer match?

I know question-question match is a text similarity problem.
What about question-answer match or question-doc match? It is used in information retrieval.
question-question match is indeed text similarity. But how do you define question-answer similarity?
Thank you!!

nyu-dl / dl4ir-doc2query Goto Github PK

dl4ir-doc2query's People

Contributors

Stargazers

Watchers

Forkers

dl4ir-doc2query's Issues

dtype mismatch in beam_search.py

Trained model file of TREC CAR

DATA_DIR and DATADIR in readme

Unable to run evaluation on colab

A simple question about the paper：What is the training data for doc2query transformer? Thank you very much!

Big question, BERT ranking need triple to train, but...

ERROR

For text match problem, what is the different between question-question match and question-answer match?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs