GithubHelp home page GithubHelp logo

macaw's Introduction

Macaw: An Extensible Conversational Information Seeking Platform

Conversational information seeking (CIS) has been recognized as a major emerging research area in information retrieval. Such research will require data and tools, to allow the implementation and study of conversational systems. Macaw is an open-source framework with a modular architecture for CIS research. Macaw supports multi-turn, multi-modal, and mixed-initiative interactions, for tasks such as document retrieval, question answering, recommendation, and structured data exploration. It has a modular design to encourage the study of new CIS algorithms, which can be evaluated in batch mode. It can also integrate with a user interface, which allows user studies and data collection in an interactive mode, where the back end can be fully algorithmic or a wizard of oz setup.

Macaw could be of interest to the researchers and practitioners working on information retrieval, natural language processing, and dialogue systems.

For more information on Macaw, please refer to this paper.

Table of content:

Macaw Architecture

Macaw has a modular architecture, which allows further development and extension. The high-level architecture of Macaw is presented below:

The high-level architecture of Macaw

For more information on each module in Macaw, refer to this paper.

Interfaces

Macaw supports the following interfaces:

  • Standard IO: For development purposes
  • File IO: For batch experiments (see the examples in the data folder for input and output file formats)
  • Telegram bot: For interaction with real users

Here is an example of the Telegram interface for Macaw. It supports multi-modal interactions (text, speech, click, etc).

Telegram interface for Macaw Telegram interface for Macaw

Retrieval

Macaw features the following search engines:

  • Indri: an open-source search engine that can be used for any arbitrary text collection.
  • Bing web search API: sending a request to the Bing API and getting the results.

Answer Selection and Generation

For question answering, Macaw only features the DrQA model in its current version.

Installation

Macaw requires Python >= 3.6 and pip3. If you don't have setuptools, run sudo pip3 install setuptools. To install Macaw, first clone macaw from this repo and then follow the following installation steps. The mentioned installation commands can be executed on Ubuntu. You can use the same or similar commands on other Linux distribution. If you are using Windows 10, we recommend installing Macaw and all the required packages on Windows Subsystem for Linux.

Step 1: Installing MongoDB server

Macaw uses MongoDB for storing and retrieving user interactions (conversations). To install MongoDB server, run the following command:

sudo apt-get install mongodb-server-core

Step 2: Installing Indri and Pyndri

Indri is an open-source search engine for information retrieval research, implemented as part of the Lemur Project. Pyndri is a python interface to Indri. Macaw uses Indri for retrieving documents from an arbitrary text collection. To install Indri, first download Indri from https://sourceforge.net/projects/lemur/files/lemur/. As suggested by pyndri, we have used Indri-5.11. This Indri version can be installed as follows:

# download indri-5.11.tar.gz
sudo apt install g++ zlib1g-dev
tar xzvf indri-5.11.tar.gz
rm indri-5.11.tar.gz
cd indri-5.11
./configure CXX="g++ -D_GLIBCXX_USE_CXX11_ABI=0"
make
sudo make install

Then, clone the pyndri repository from https://github.com/cvangysel/pyndri and run the following command:

python3 setup.py install

At this step, you can make sure your installation is complete by running the pyndri tests.

Step 3: Installing Stanford Core NLP

Stanford Core NLP can be used for tokenization and most importantly for co-reference resolution. If you do not need co-reference resolution, you can ignore this step. Stanford Core NLP requires java. Get it by following these commands:

wget -O "stanford-corenlp-full-2017-06-09.zip" "http://nlp.stanford.edu/software/stanford-corenlp-full-2017-06-09.zip"
sudo apt-get install unzip
unzip "stanford-corenlp-full-2017-06-09.zip"
rm "stanford-corenlp-full-2017-06-09.zip"

If you don't have java, install it using:

sudo apt-get install default-jre

Step 4: Installing DrQA

Macaw also supports answer extraction / generation for user queries from retrieved documents. For this purpose, it features DrQA. If you do not need this functionality, ignore this step (you can also install this later). To install DrQA, run the following commands:

git clone https://github.com/facebookresearch/DrQA.git
cd DrQA
pip3 install -r requirements.txt
pip3 install torch
sudo python3 setup.py develop

To use pre-trained DrQA model, use the following command.

./download.sh

This downloads a 7.5GB (compressed) file and requires 25GB (uncompressed) space. This may take a while!

Step 5: Installing FFmpeg

To support speech interactions with users, Macaw requires FFmpeg for some multimedia processing steps. If you don't need a speech support from Macaw, you can skip this step. To install FFmpeg, run the following command:

sudo apt-get install 

Step 6: Installing Macaw

After cloning Macaw, use the following commands for installation:

cd macaw
sudo pip3 install -r requirements.txt
sudo python3 setup.py install

Running Macaw

If you run macaw with interactive (or live) mode, you should first run MongoDB server using the following command:

sudo mongod

Note that this command uses the default database directory (/data/db) for storing the data. You may need to create this directory if you haven't. You can also use other locations using the --dbpath argument.

We provide three different main scripts (i.e., app):

  • live_main.py: An interactive conversational search and question answering system. It can use both STDIO and Telegram interfaces.
  • batch_ext_main.py: A model for running experiments on a reusable dataset. This main script uses FILEIO as the interface.
  • wizard_of_oz_main.py: A main script for Wizard of Oz experiments.

After selecting the desired main script, open the python file and provide the required parameters. For example, you need to use your Bing subscription key (if using Bing), the path to Indri index (if using Indri), Telegram bot token (if using Telegram interface), etc. in order to run the live_main.py script. You can further run the favorite main script as below:

python3 live_main.py

Bug Report and Feature Request

For bug report and feature request, you can open an issue in github, or send an email to Hamed Zamani at [email protected].

Citation

If you found Macaw useful, you can cite the following article:

Hamed Zamani and Nick Craswell, "Macaw: An Extensible Conversational Information Seeking System", arxiv pre-print.

bibtex:

@article{macaw,
  title={Macaw: An Extensible Conversational Information Seeking Platform},
  author={Zamani, Hamed and Craswell, Nick},
  journal={arXiv preprint arXiv:1912.08904},
  year={2019},
}

License

Macaw is distributed under the MIT License. See the LICENSE file for more information.

Contribution

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

macaw's People

Contributors

hamed-zamani avatar microsoft-github-operations[bot] avatar microsoftopensource avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

macaw's Issues

Integrate with Docker to make it reproducible easily and in all OS

The current dependencies (pyndri, Indri-5.11, DrQA, etc.) require specific versions of C++, pip, python3, etc. to run. This makes it very difficult or impossible to run Macaw on newer Ubuntu/Linux systems. Also, Indri doesn't provide support for Mac OSX other than 10.11.3. To remove these issues, we can add integration with Docker so that Macaw can run in any OS with the fixed set of dependencies that it has been tested with.

Installation on a Mac

Hello,

I have been trying to install macaw on my Mac OS Catalina 10.15.2 (Python 3.7.6).

First I was not able to install indri-5.11.tar.gz. I was only able to do it using the version from this github: https://github.com/diazf/indri. Now Pyindri installation is not working. I get the following message:

Collecting pyndri
Using cached https://files.pythonhosted.org/packages/d6/ee/e1c2f865d5f7471167cbed85c945321cc4842ce245de5a5e374e3b5e4563/pyndri-0.4.tar.gz
Installing collected packages: pyndri
Running setup.py install for pyndri ... error
ERROR: Command errored out with exit status 1:
command: /Users/gustavopenha/personal/macaw/env/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/mp/hx7qsrv11bl5gm0hfsh2c2840000gn/T/pip-install-pr6bswwt/pyndri/setup.py'"'"'; file='"'"'/private/var/folders/mp/hx7qsrv11bl5gm0hfsh2c2840000gn/T/pip-install-pr6bswwt/pyndri/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/mp/hx7qsrv11bl5gm0hfsh2c2840000gn/T/pip-record-xy5sx3p3/install-record.txt --single-version-externally-managed --compile --install-headers /Users/gustavopenha/personal/macaw/env/include/site/python3.7/pyndri
cwd: /private/var/folders/mp/hx7qsrv11bl5gm0hfsh2c2840000gn/T/pip-install-pr6bswwt/pyndri/
Complete output (256 lines):
running install
running build
running build_py
creating build
creating build/lib.macosx-10.14-x86_64-3.7
creating build/lib.macosx-10.14-x86_64-3.7/pyndri
copying py/compat.py -> build/lib.macosx-10.14-x86_64-3.7/pyndri
copying py/init.py -> build/lib.macosx-10.14-x86_64-3.7/pyndri
copying py/utils.py -> build/lib.macosx-10.14-x86_64-3.7/pyndri
copying py/dictionary.py -> build/lib.macosx-10.14-x86_64-3.7/pyndri
running build_ext
building 'pyndri_ext' extension
creating build/temp.macosx-10.14-x86_64-3.7
creating build/temp.macosx-10.14-x86_64-3.7/src
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Tk.framework/Versions/8.5/Headers -D_GLIBCXX_USE_CXX11_ABI=0 -DP_NEEDS_GNU_CXX_NAMESPACE=1 -D_FILE_OFFSET_BITS=64 -UNDEBUG -I/usr/local/include -I/usr/local/opt/[email protected]/include -I/usr/local/opt/sqlite/include -I/Users/gustavopenha/personal/macaw/env/include -I/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c src/pyndri.cpp -o build/temp.macosx-10.14-x86_64-3.7/src/pyndri.o
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:22:
In file included from /usr/local/include/indri/QuerySpec.hpp:25:
In file included from /usr/local/include/indri/Packer.hpp:22:
/usr/local/include/indri/XMLNode.hpp:189:18: warning: comparison of integers of different signs: 'unsigned int' and 'int' [-Wsign-compare]
if( mainLength != length ) {
~~~~~~~~~~ ^ ~~~~~~
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:22:
/usr/local/include/indri/QuerySpec.hpp:701:9: warning: field '_windowSize' will be initialized after field '_children' [-Wreorder]
_windowSize(windowSize),
^
/usr/local/include/indri/QuerySpec.hpp:1698:16: warning: unused variable 'accumulator' [-Wunused-variable]
UINT64 accumulator = 53;
^
/usr/local/include/indri/QuerySpec.hpp:2712:9: warning: field '_contextSize' will be initialized after field '_terms' [-Wreorder]
_contextSize(0),
^
/usr/local/include/indri/QuerySpec.hpp:3229:9: warning: field '_child' will be initialized after field '_exponent' [-Wreorder]
_child(child),
^
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:23:
In file included from /usr/local/include/indri/InferenceNetwork.hpp:21:
In file included from /usr/local/include/indri/BeliefNode.hpp:22:
In file included from /usr/local/include/indri/InferenceNetworkNode.hpp:23:
In file included from /usr/local/include/indri/Index.hpp:29:
In file included from /usr/local/include/indri/VocabularyIterator.hpp:22:
In file included from /usr/local/include/indri/DiskTermData.hpp:22:
/usr/local/include/lemur/Keyfile.hpp:26:19: warning: field '_handleSize' will be initialized after field '_handle' [-Wreorder]
Keyfile() : _handleSize(0), _handle(NULL) {
^
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:23:
In file included from /usr/local/include/indri/InferenceNetwork.hpp:21:
In file included from /usr/local/include/indri/BeliefNode.hpp:22:
In file included from /usr/local/include/indri/InferenceNetworkNode.hpp:23:
In file included from /usr/local/include/indri/Index.hpp:30:
In file included from /usr/local/include/indri/TermList.hpp:23:
/usr/local/include/indri/FieldExtent.hpp:32:9: warning: field 'number' will be initialized after field 'ordinal' [-Wreorder]
number(_number), ordinal(_ordinal),
^
/usr/local/include/indri/FieldExtent.hpp:32:26: warning: field 'ordinal' will be initialized after field 'parentOrdinal' [-Wreorder]
number(_number), ordinal(_ordinal),
^
/usr/local/include/indri/FieldExtent.hpp:39:9: warning: field 'number' will be initialized after field 'ordinal' [-Wreorder]
number(f.number), ordinal(f.ordinal),
^
/usr/local/include/indri/FieldExtent.hpp:39:27: warning: field 'ordinal' will be initialized after field 'parentOrdinal' [-Wreorder]
number(f.number), ordinal(f.ordinal),
^
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:23:
In file included from /usr/local/include/indri/InferenceNetwork.hpp:23:
/usr/local/include/indri/ListIteratorNode.hpp:47:13: warning: unused variable 'sorted' [-Wunused-variable]
int sorted=0;
^
/usr/local/include/indri/ListIteratorNode.hpp:67:13: warning: unused variable 'sorted' [-Wunused-variable]
int sorted=0;
^
/usr/local/include/indri/ListIteratorNode.hpp:117:25: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
while((_lastpos < exts.size()) && (exts[_lastpos].begin < begin)){
~~~~~~~~ ^ ~~~~~~~~~~~
/usr/local/include/indri/ListIteratorNode.hpp:122:25: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
while((_lastpos < exts.size()) && (exts[_lastpos].begin < end)) {
~~~~~~~~ ^ ~~~~~~~~~~~
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:23:
In file included from /usr/local/include/indri/InferenceNetwork.hpp:25:
In file included from /usr/local/include/indri/Repository.hpp:26:
In file included from /usr/local/include/indri/MemoryIndex.hpp:36:
/usr/local/include/indri/FieldStatistics.hpp:34:9: warning: field 'lastCount' will be initialized after field 'byteOffset' [-Wreorder]
lastCount(0),
^
/usr/local/include/indri/FieldStatistics.hpp:48:9: warning: field 'lastCount' will be initialized after field 'byteOffset' [-Wreorder]
lastCount(0),
^
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:23:
In file included from /usr/local/include/indri/InferenceNetwork.hpp:25:
In file included from /usr/local/include/indri/Repository.hpp:26:
In file included from /usr/local/include/indri/MemoryIndex.hpp:37:
/usr/local/include/indri/CorpusStatistics.hpp:27:43: warning: field 'totalDocuments' will be initialized after field 'baseDocument' [-Wreorder]
CorpusStatistics() : totalTerms(0), totalDocuments(0), baseDocument(0), maximumDocument(0), uniqueTerms(0), maximumDocumentLength(0) {}
^
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:23:
In file included from /usr/local/include/indri/InferenceNetwork.hpp:25:
In file included from /usr/local/include/indri/Repository.hpp:26:
In file included from /usr/local/include/indri/MemoryIndex.hpp:39:
/usr/local/include/indri/ReadersWritersLock.hpp:88:9: warning: field '_tail' will be initialized after field '_head' [-Wreorder]
_tail(0),
^
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:23:
In file included from /usr/local/include/indri/InferenceNetwork.hpp:25:
In file included from /usr/local/include/indri/Repository.hpp:26:
/usr/local/include/indri/MemoryIndex.hpp:57:11: warning: field 'list' will be initialized after field 'next' [-Wreorder]
list(allocator),
^
In file included from src/pyndri.cpp:14:
In file included from /usr/local/include/indri/LocalQueryServer.hpp:26:
In file included from /usr/local/include/indri/QueryServer.hpp:23:
In file included from /usr/local/include/indri/InferenceNetwork.hpp:29:
In file included from /usr/local/include/indri/DocumentStructureHolderNode.hpp:29:
/usr/local/include/indri/DocumentStructure.hpp:50:7: warning: field '_index' will be initialized after field '_numNodes' [-Wreorder]
: _index( &index ) ,
^
/usr/local/include/indri/DocumentStructure.hpp:59:5: warning: field '_index' will be initialized after field '_numNodes' [-Wreorder]
_index( &index ) ,
^
/usr/local/include/indri/DocumentStructure.hpp:67:5: warning: field '_index' will be initialized after field '_numNodes' [-Wreorder]
index( 0 ) ,
^
In file included from src/pyndri.cpp:20:
In file included from /usr/local/include/indri/QueryEnvironment.hpp:25:
/usr/local/include/indri/NetworkStream.hpp:155:29: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
while( bytesWritten < (int) length ) {
~~~~~~~~~~~~ ^ ~~~~~~~~~~~~
src/pyndri.cpp:105:30: warning: conversion from string literal to 'char ' is deprecated [-Wc++11-compat-deprecated-writable-strings]
static char
kwlist[] = {"repository_path", NULL};
^
src/pyndri.cpp:207:17: warning: using the result of an assignment as a condition without parentheses [-Wparentheses]
while (item = PyIter_Next(iterator)) {
~~~~~^~~~~~~~~~~~~~~~~~~~~~~
src/pyndri.cpp:207:17: note: place parentheses around the assignment to silence this warning
while (item = PyIter_Next(iterator)) {
^
( )
src/pyndri.cpp:207:17: note: use '==' to turn this assignment into an equality comparison
while (item = PyIter_Next(iterator)) {
^
==
src/pyndri.cpp:470:5: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'UINT64' (aka 'unsigned long long') [-Wsign-compare]
CHECK_EQ(PyDict_Size(token2id), self->index
->uniqueTermCount());
^ ~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/pyndri.cpp:33:46: note: expanded from macro 'CHECK_EQ'
#define CHECK_EQ(first, second) assert(first == second)
~~~~~ ^ ~~~~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? _assert_rtn(func, FILE, LINE, #e) : (void)0)
^
src/pyndri.cpp:471:5: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'UINT64' (aka 'unsigned long long') [-Wsign-compare]
CHECK_EQ(PyDict_Size(id2token), self->index
->uniqueTermCount());
^ ~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/pyndri.cpp:33:46: note: expanded from macro 'CHECK_EQ'
#define CHECK_EQ(first, second) assert(first == second)
~~~~~ ^ ~~~~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? _assert_rtn(func, FILE, LINE, #e) : (void)0)
^
src/pyndri.cpp:472:5: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'UINT64' (aka 'unsigned long long') [-Wsign-compare]
CHECK_EQ(PyDict_Size(id2df), self->index
->uniqueTermCount());
^ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/pyndri.cpp:33:46: note: expanded from macro 'CHECK_EQ'
#define CHECK_EQ(first, second) assert(first == second)
~~~~~ ^ ~~~~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? _assert_rtn(func, FILE, LINE, #e) : (void)0)
^
src/pyndri.cpp:507:5: warning: comparison of integers of different signs: 'Py_ssize_t' (aka 'long') and 'UINT64' (aka 'unsigned long long') [-Wsign-compare]
CHECK_EQ(PyDict_Size(id2tf), self->index
->uniqueTermCount());
^ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/pyndri.cpp:33:46: note: expanded from macro 'CHECK_EQ'
#define CHECK_EQ(first, second) assert(first == second)
~~~~~ ^ ~~~~~~
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? __assert_rtn(func, FILE, LINE, #e) : (void)0)
^
src/pyndri.cpp:588:30: warning: conversion from string literal to 'char ' is deprecated [-Wc++11-compat-deprecated-writable-strings]
static char
kwlist[] = {"index", "rules", "baseline",
^
src/pyndri.cpp:588:39: warning: conversion from string literal to 'char ' is deprecated [-Wc++11-compat-deprecated-writable-strings]
static char
kwlist[] = {"index", "rules", "baseline",
^
src/pyndri.cpp:588:48: warning: conversion from string literal to 'char ' is deprecated [-Wc++11-compat-deprecated-writable-strings]
static char
kwlist[] = {"index", "rules", "baseline",
^
src/pyndri.cpp:641:30: warning: conversion from string literal to 'char ' is deprecated [-Wc++11-compat-deprecated-writable-strings]
static char
kwlist[] = {"query_str",
^
src/pyndri.cpp:642:30: warning: conversion from string literal to 'char *' is deprecated [-Wc++11-compat-deprecated-writable-strings]
"document_set",
^
src/pyndri.cpp:643:30: warning: conversion from string literal to 'char *' is deprecated [-Wc++11-compat-deprecated-writable-strings]
"results_requested",
^
src/pyndri.cpp:644:30: warning: conversion from string literal to 'char *' is deprecated [-Wc++11-compat-deprecated-writable-strings]
"include_snippets",
^
src/pyndri.cpp:679:21: warning: using the result of an assignment as a condition without parentheses [-Wparentheses]
while (item = PyIter_Next(iterator)) {
~~~~~^~~~~~~~~~~~~~~~~~~~~~~
src/pyndri.cpp:679:21: note: place parentheses around the assignment to silence this warning
while (item = PyIter_Next(iterator)) {
^
( )
src/pyndri.cpp:679:21: note: use '==' to turn this assignment into an equality comparison
while (item = PyIter_Next(iterator)) {
^
==
src/pyndri.cpp:852:30: warning: conversion from string literal to 'char ' is deprecated [-Wc++11-compat-deprecated-writable-strings]
static char
kwlist[] = {"query_env", "fb_docs", "fb_terms", NULL};
^
src/pyndri.cpp:852:43: warning: conversion from string literal to 'char ' is deprecated [-Wc++11-compat-deprecated-writable-strings]
static char
kwlist[] = {"query_env", "fb_docs", "fb_terms", NULL};
^
src/pyndri.cpp:852:54: warning: conversion from string literal to 'char ' is deprecated [-Wc++11-compat-deprecated-writable-strings]
static char
kwlist[] = {"query_env", "fb_docs", "fb_terms", NULL};
^
src/pyndri.cpp:863:16: warning: implicit conversion of NULL constant to 'int' [-Wnull-conversion]
return NULL;
~~~~~~ ^~~~
0
src/pyndri.cpp:868:16: warning: implicit conversion of NULL constant to 'int' [-Wnull-conversion]
return NULL;
~~~~~~ ^~~~
0
src/pyndri.cpp:900:30: warning: conversion from string literal to 'char ' is deprecated [-Wc++11-compat-deprecated-writable-strings]
static char
kwlist[] = {"query_str",
^
src/pyndri.cpp:1114:17: error: expected expression
IndexType = {
^
src/pyndri.cpp:1159:28: error: expected expression
QueryEnvironmentType = {
^
src/pyndri.cpp:1204:25: error: expected expression
QueryExpanderType = {
^
43 warnings and 3 errors generated.
error: command 'clang' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/gustavopenha/personal/macaw/env/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/mp/hx7qsrv11bl5gm0hfsh2c2840000gn/T/pip-install-pr6bswwt/pyndri/setup.py'"'"'; file='"'"'/private/var/folders/mp/hx7qsrv11bl5gm0hfsh2c2840000gn/T/pip-install-pr6bswwt/pyndri/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /private/var/folders/mp/hx7qsrv11bl5gm0hfsh2c2840000gn/T/pip-record-xy5sx3p3/install-record.txt --single-version-externally-managed --compile --install-headers /Users/gustavopenha/personal/macaw/env/include/site/python3.7/pyndri Check the logs for full command output.

Syntax Error in running `live_main.py` after following all the prerequisite steps.

Using Ubuntu and all the specified versions of the libraries given in the macaw repo readme.

Traceback (most recent call last):
  File "live_main.py", line 8, in <module>
    from macaw.core import mrc, retrieval
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "<frozen zipimport>", line 259, in load_module
  File "/usr/local/lib/python3.8/dist-packages/macaw-0.1-py3.8.egg/macaw/core/mrc/__init__.py", line 7, in <module>
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "<frozen zipimport>", line 259, in load_module
  File "/usr/local/lib/python3.8/dist-packages/macaw-0.1-py3.8.egg/macaw/core/mrc/drqa_mrc.py", line 5, in <module>
  File "/home/asharma19/dev/DrQA/drqa/__init__.py", line 20, in <module>
    from . import tokenizers
  File "/home/asharma19/dev/DrQA/drqa/tokenizers/__init__.py", line 20, in <module>
    from .corenlp_tokenizer import CoreNLPTokenizer
  File "/home/asharma19/dev/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 14, in <module>
    import pexpect
  File "/home/asharma19/.local/lib/python3.8/site-packages/pexpect/__init__.py", line 75, in <module>
    from .pty_spawn import spawn, spawnu
  File "/home/asharma19/.local/lib/python3.8/site-packages/pexpect/pty_spawn.py", line 14, in <module>
    from .spawnbase import SpawnBase
  File "/home/asharma19/.local/lib/python3.8/site-packages/pexpect/spawnbase.py", line 224
    def expect(self, pattern, timeout=-1, searchwindowsize=-1, async=False):
                                                               ^
SyntaxError: invalid syntax

Installing Macaw in a Virtual Environment

I'm trying to get the Macaw framework up and running for a project to build a conversational search agent and was having trouble with the installation. I downloaded and installed Macaw on Ubuntu 20.04 by going through the installation process. However, when it came to running the live_main.py file I got this error message:

Traceback (most recent call last): File "live_main.py", line 11, in <module> from macaw.core import mrc, retrieval File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/usr/local/lib/python3.8/dist-packages/macaw-0.1-py3.8.egg/macaw/core/mrc/__init__.py", line 7, in <module> File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 655, in _load_unlocked File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible File "<frozen zipimport>", line 259, in load_module File "/usr/local/lib/python3.8/dist-packages/macaw-0.1-py3.8.egg/macaw/core/mrc/drqa_mrc.py", line 5, in <module> File "/home/patrick-easton/Documents/CSA_Project_Patrick_Easton_2021/DrQA/drqa/__init__.py", line 20, in <module> from . import tokenizers File "/home/patrick-easton/Documents/CSA_Project_Patrick_Easton_2021/DrQA/drqa/tokenizers/__init__.py", line 20, in <module> from .corenlp_tokenizer import CoreNLPTokenizer File "/home/patrick-easton/Documents/CSA_Project_Patrick_Easton_2021/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 14, in <module> import pexpect File "/usr/local/lib/python3.8/dist-packages/pexpect/__init__.py", line 75, in <module> from .pty_spawn import spawn, spawnu File "/usr/local/lib/python3.8/dist-packages/pexpect/pty_spawn.py", line 14, in <module> from .spawnbase import SpawnBase File "/usr/local/lib/python3.8/dist-packages/pexpect/spawnbase.py", line 224 def expect(self, pattern, timeout=-1, searchwindowsize=-1, async=False): ^ SyntaxError: invalid syntax

After doing some searching I realised this issue was down to "async" is a reserved word on Python 3.8, the default version of Python used by Ubuntu 20.04. I decided to create a virtual environment to use the correct version by using virtualenv. Once I had downloaded and installed everything in that virtual environment and attempted to run the live_main.py script I received an error claiming it couldn't find the 'macaw' module. This was the same for many of the imports being used in Macaw. I had to either manually install any modules that are imported such as func-timeout, google, and telegram-bot or add the directories I had downloaded such as Macaw or DrQA to the files sys.path. I kept doing that and running the script until I got to this error:

Traceback (most recent call last): File "live_main.py", line 11, in <module> from macaw.core import mrc, retrieval File "/home/patrick-easton/Documents/CSA_Project_Patrick_Easton_2021/macaw/macaw/core/mrc/__init__.py", line 7, in <module> from macaw.core.mrc import drqa_mrc File "/home/patrick-easton/Documents/CSA_Project_Patrick_Easton_2021/macaw/macaw/core/mrc/drqa_mrc.py", line 17, in <module> from macaw.core.retrieval.doc import Document File "/home/patrick-easton/Documents/CSA_Project_Patrick_Easton_2021/macaw/macaw/core/retrieval/__init__.py", line 6, in <module> import macaw.core.retrieval.bing_api File "/home/patrick-easton/Documents/CSA_Project_Patrick_Easton_2021/macaw/macaw/core/retrieval/bing_api.py", line 9, in <module> from macaw.core.retrieval.doc import Document File "/home/patrick-easton/Documents/CSA_Project_Patrick_Easton_2021/macaw/macaw/core/retrieval/doc.py", line 9, in <module> import justext ModuleNotFoundError: No module named 'justext'

It seems I have to install every module manually but why is this? Aren't all these packages installed with the requirements.txt file for Macaw? Is the issue with the virtual environment? I'm really lost with this and was hoping someone can lend a helping hand. Thanks.

Tokenizer issues / errors and live_main.py capabilities

I am trying to get Macaw to work, expecting results similar to Figure 1 (b) in the paper. Currently, I am working based on a clean ubuntu:bionic Docker image, because it provides default Python 3.6 and lets me install Java 8 (for Stanford CoreNLP). Long story short, I am able to run python3 live_main.py and arrive at the ENTER COMMAND: prompt with stdio as the interface.

Firstly, the default simple tokenizer as set in drqa_mrc.py causes the following error (but of course does not affect retrieval of a list of URLs from Bing):

Macaw Logger - 2020-02-26 15:54:58,964 - INFO - New query: who is barack obama
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/macaw-0.1-py3.6.egg/macaw/core/input_handler/actions.py", line 104, in run_action
    return_dict[action] = func_timeout(params['timeout'], action_func, args=[conv_list, params])
  File "/usr/local/lib/python3.6/dist-packages/func_timeout/dafunc.py", line 108, in func_timeout
    raise_exception(exception)
  File "/usr/local/lib/python3.6/dist-packages/func_timeout/py3_raise.py", line 7, in raise_exception
    raise exception[0] from None
  File "/usr/local/lib/python3.6/dist-packages/macaw-0.1-py3.6.egg/macaw/core/input_handler/actions.py", line 81, in run
    return params['actions']['qa'].get_results(conv_list, doc)
  File "/usr/local/lib/python3.6/dist-packages/macaw-0.1-py3.6.egg/macaw/core/mrc/drqa_mrc.py", line 76, in get_results
    predictions = self.predictor.predict(doc, q, None, self.params['qa_results_requested'])
  File "/root/DrQA/drqa/reader/predictor.py", line 88, in predict
    results = self.predict_batch([(document, question, candidates,)], top_n)
  File "/root/DrQA/drqa/reader/predictor.py", line 128, in predict_batch
    batch_exs = batchify([vectorize(e, self.model) for e in examples])
  File "/root/DrQA/drqa/reader/predictor.py", line 128, in <listcomp>
    batch_exs = batchify([vectorize(e, self.model) for e in examples])
  File "/root/DrQA/drqa/reader/vector.py", line 33, in vectorize
    q_lemma = {w for w in ex['qlemma']} if args.use_lemma else None
TypeError: 'NoneType' object is not iterable
THE RESPONSE STARTS
----------------------------------------------------------------------
#get_doc https://www.biography.com/us-president/barack-obama  |  Barack Obama - U.S. Presidency, Education &amp; Family - Biography
#get_doc https://en.wikipedia.org/wiki/Barack_Obama  |  Barack Obama - Wikipedia
#get_doc https://www.britannica.com/biography/Barack-Obama  |  Barack Obama | Biography, Presidency, &amp; Facts | Britannica
----------------------------------------------------------------------
THE RESPONSE STARTS

... which makes sense, given the earlier warning from DrQA stating:

WARNING:drqa.tokenizers.simple_tokenizer:SimpleTokenizer only tokenizes! Skipping annotators: {'pos', 'ner', 'lemma'}

Switching to the corenlp tokenizer and re-running python3 setup.py install for the change to take effect results in the following output with the same query:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/macaw-0.1-py3.6.egg/macaw/core/input_handler/actions.py", line 104, in run_action
    return_dict[action] = func_timeout(params['timeout'], action_func, args=[conv_list, params])
  File "/usr/local/lib/python3.6/dist-packages/func_timeout/dafunc.py", line 108, in func_timeout
    raise_exception(exception)
  File "/usr/local/lib/python3.6/dist-packages/func_timeout/py3_raise.py", line 7, in raise_exception
    raise exception[0] from None
  File "/usr/local/lib/python3.6/dist-packages/macaw-0.1-py3.6.egg/macaw/core/input_handler/actions.py", line 81, in run
    return params['actions']['qa'].get_results(conv_list, doc)
  File "/usr/local/lib/python3.6/dist-packages/macaw-0.1-py3.6.egg/macaw/core/mrc/drqa_mrc.py", line 76, in get_results
    predictions = self.predictor.predict(doc, q, None, self.params['qa_results_requested'])
  File "/root/DrQA/drqa/reader/predictor.py", line 88, in predict
    results = self.predict_batch([(document, question, candidates,)], top_n)
  File "/root/DrQA/drqa/reader/predictor.py", line 107, in predict_batch
    q_tokens = list(map(self.tokenizer.tokenize, questions))
  File "/root/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 96, in tokenize
    self.corenlp.expect_exact('NLP>', searchwindowsize=100)
  File "/usr/local/lib/python3.6/dist-packages/pexpect/spawnbase.py", line 390, in expect_exact
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python3.6/dist-packages/pexpect/expect.py", line 99, in expect_loop
    incoming = spawn.read_nonblocking(spawn.maxread, timeout)
  File "/usr/local/lib/python3.6/dist-packages/pexpect/pty_spawn.py", line 437, in read_nonblocking
    if not self.isalive():
  File "/usr/local/lib/python3.6/dist-packages/pexpect/pty_spawn.py", line 662, in isalive
    alive = ptyproc.isalive()
  File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/dist-packages/pexpect/pty_spawn.py", line 23, in _wrap_ptyprocess_err
    raise ExceptionPexpect(*e.args)
pexpect.exceptions.ExceptionPexpect: isalive() encountered condition where "terminated" is 0, but there was no child process. Did someone else call waitpid() on our process?

If it's any help, I am able to use the DrQA interactive demo with the Stanford CoreNLP tokenizer without errors.

Aside from these errors, maybe I have misunderstood the capabilities or scope of the live_main.py demo? Would it be capable of an interaction similar to the one shown in Figure 1 (b) in the paper?

Thanks in advance for all assistance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.