GithubHelp home page GithubHelp logo

czcorpus / mquery-sru Goto Github PK

View Code? Open in Web Editor NEW
2.0 4.0 1.0 462 KB

CLARIN FCS 2.0 Endpoint for Manatee-open corpus search engine

License: GNU General Public License v3.0

Go 91.17% C++ 1.77% C 0.90% HTML 2.39% Dockerfile 0.14% XSLT 3.62%
corpora linguistics clarin fcs fcs-endpoint

mquery-sru's Introduction

MQuery-SRU

MQuery-SRU is an easy to set up endpoint for Clarin FCS 2.0 (Federated Content Search) based on the Manatee-open corpus search engine and developed and maintained by the Institute of the Czech National Corpus.

Features

  • Full support for the FCQ-QL query language
    • definable mapping between FCQ-QL layers and Manatee-open positional attributes
  • Level 1 support for basic search via CQL (Context Query Language)
  • simultaneous search in multiple defined corpora
  • (optional) backlinks to respective concordances in KonText

Requirements

  • a working Linux server with installed Manatee-open library
  • Redis database
  • Go language compiler and tools
  • (optional) an HTTP proxy server (Nginx, Apache, ...)

How to install

  1. Install Go language environment, either via a package manager or manually from Go download page
    1. make sure /usr/local/go/bin and ~/go/bin are in your $PATH so you can run any installed Go tools without specifying a full path
  2. Install Manatee-open from the download page. No specific language bindings are required.
    1. configure --with-pcre --disable-python && make && sudo make install && sudo ldconfig
  3. Get MQuery-SRU sources (git clone --depth 1 https://github.com/czcorpus/mquery-sru.git)
  4. Run ./configure
  5. Run make
  6. Run make install
    • the application will be installed in /opt/mquery-sru
    • for data and registry, /var/opt/corpora/data and /var/opt/corpora/registry directories will be created
    • systemd services mquery-sru-server.service and mquery-sru-worker-all.target will be created
  7. Copy at least one corpus and its configuration (registry) into respective directories (/var/opt/corpora/data, /var/opt/corpora/registry)
  8. Update corpora entries in /opt/mquery-sru/conf.json file to match your installed corpora
  9. start the service:
    • systemctl start mquery-sru-server
    • systemctl start mquery-sru-worker-all.target

HTTP access

In most cases, it is not recommended to expose the server directly to the Internet. It is therefore advisable to put the service behind an HTTP proxy. E.g. in Nginx, the configuration may look like this:

location /mquery-fcs/ {
    proxy_pass http://127.0.0.1:8080/;
    proxy_set_header Host $http_host;
    proxy_redirect off;
    proxy_read_timeout 30;
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_set_header X-Forwarded-Proto $scheme;    
}

Worker considerations

It's important to understand that endpoints experiencing low traffic can still benefit from having multiple workers. Specifically, if an endpoint is configured to search across multiple corpora, MQuery-SRU can leverage these workers to execute searches in parallel. This approach can significantly reduce the response time by querying all configured corpora simultaneously, thereby improving efficiency even under conditions of minimal load.

Configuration

To run the endpoint, you need at least

  1. to configure listening address and port
  2. defined path to your Manatee corpora registry (= configuration) files
  3. defined corpora along with:
    • positional attributes to be exposed and also layer names they belong to
    • mapping of FCS-QL's within structures (s, sentence, p etc.) to your specific corpora structures
  4. address of your Redis service plus a number of database to be used for passing queries and results around

See configuration reference and/or conf.sample.json for detailed info.

OS integration (systemd)

This applies in case make install is not used.

(Here we assume the service will run with user www-data)

Create a directory for logging (e.g. /var/log/mquery-sru) and set proper permissions for www-data to be able to write there.

You can use predefined systemd files from /scripts/systemd/*. Copy (or link) them to /etc/systemd/system and then run:

systemctl enable mquery-sru-server.service
systemctl enable mquery-sru-worker-all.target

Now you can try to run the service:

systemctl start mquery-sru-server
systemctl start mquery-sru-worker-all.target

See MQuery-SRU in action

A CNC instance of MQuery-SRU is running as one of the endpoints for Clarin Content Search page.

mquery-sru's People

Contributors

mzimandl avatar tomachalek avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

mzimandl

mquery-sru's Issues

In case the first resource gives 0 lines, the server fails to process the query

[GIN-debug] [WARNING] Headers were already written. Wanted to override status code 200 with 500
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: panic: runtime error: index out of range [0] with length 0
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: goroutine 1 [running]:
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: fcs/corpus/conc.(*LineParser).parseTokenQuadruple(0xc0005f7878, {0xc0002c6b80, 0x4, 0xffffffffffffffff?})
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: /home/machalek/superhome/tools/mquery-fcs/corpus/conc/conc.go:61 +0x212
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: fcs/corpus/conc.(*LineParser).parseRawLine(0x8dd640?, {0xc000576140, 0x139})
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: /home/machalek/superhome/tools/mquery-fcs/corpus/conc/conc.go:107 +0x233
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: fcs/corpus/conc.(*LineParser).Parse(0xc000520000?, {{0xc0006cc1e0?, 0xc00055a7e0?, 0xd?}, 0xc000738100?})
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: /home/machalek/superhome/tools/mquery-fcs/corpus/conc/conc.go:115 +0x85
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: fcs/worker.(*Worker).concExample(0x3?, {{0xc000520000, 0x21}, {0x0, 0x0}, {0xc00055a7e0, 0xd}, {0xc000738100, 0x4, 0x4}, ...})
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: /home/machalek/superhome/tools/mquery-fcs/worker/worker.go:152 +0xf9
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: fcs/worker.(*Worker).tryNextQuery(0xc0005f7a98)
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: /home/machalek/superhome/tools/mquery-fcs/worker/worker.go:109 +0x3e5
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: fcs/worker.(*Worker).Listen(0xc0005f7a98)
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: /home/machalek/superhome/tools/mquery-fcs/worker/worker.go:133 +0xba
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: main.runWorker(0xc000001e00, {0xc00001405a, 0x1}, 0xc000636190, 0xc00058e000)
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: /home/machalek/superhome/tools/mquery-fcs/fcs.go:129 +0x159
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: main.main()
led 19 17:57:44 kontext-017 mquery-fcs[2298225]: /home/machalek/superhome/tools/mquery-fcs/fcs.go:220 +0xa3f
led 19 17:57:44 kontext-017 systemd[1]: [email protected]: Main process exited, code=exited, status=2/INVALIDARGUMENT

Unsupported context set error in FCS-QL query

/?query=[+word+=+"druh"+&+word+=+"typ"+]&queryType=fcs&startRecord=1&maximumRecords=10&recordSchema=http://clarin.eu/fcs/resource&x-fcs-context=cnc/corpora/syn2020

Please note that the FCS portal inserts corpus PID. I've already did a related update but in the FCS-QL mode it does not seem to work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.