dracor-org / dracor-api Goto Github PK

View Code? Open in Web Editor NEW

10.0 5.0 2.0 692 KB

eXistdb application for dracor.org

License: MIT License

XQuery 93.83% HTML 0.28% XSLT 2.76% Shell 1.31% JavaScript 0.13% Dockerfile 1.68%

dracor-api's Introduction

DraCor API

This is the eXistdb application providing the API for https://dracor.org.

The API Documentation is available at https://dracor.org/doc/api/.

Getting Started

git clone https://github.com/dracor-org/dracor-api.git
cd dracor-api
docker compose up
# load data, see below

We provide a compose.yml that allows to run an eXist database with dracor-api locally, together with the supporting dracor-metrics service and a triple store. With Docker installed simply run:

docker compose up

This pulls the necessary images from Docker Hub and starts the respective containers. The eXist database will become available under http://localhost:8080/. To check that the DraCor API is up run

curl http://localhost:8088/api/v1/info

By default, when you run docker compose up for the first time, a password for the admin user of the eXist database is generated and printed to the console. If you instead want to use a specific password use the EXIST_PASSWORD environment variable like this:

EXIST_PASSWORD=mysecret docker compose up

To use the database with an empty password, e.g. on a local machine, run:

EXIST_PASSWORD= docker compose up

The docker-compose setup also includes a DraCor frontend connected to the local eXist instance. It can be accessed by opening http://localhost:8088/ in a browser.

Load Data

To load corpus data into the database use the DraCor API calls. First add a corpus:

curl https://raw.githubusercontent.com/dracor-org/testdracor/main/corpus.xml | \
curl -X POST \
  -u admin: \
  -d@- \
  -H 'Content-type: text/xml' \
  http://localhost:8088/api/v1/corpora

Then load the TEI files for the newly added corpus (in this case test):

curl -X POST \
  -u admin: \
  -H 'Content-type: application/json' \
  -d '{"load":true}' \
  http://localhost:8088/api/v1/corpora/test

This may take a while. Eventually the added plays can be listed with

curl http://localhost:8088/api/v1/corpora/test

With jq installed you can pretty print the JSON output like this:

curl http://localhost:8088/api/v1/corpora/test | jq

VS Code Integration

For the Visual Studio Code editor an eXist-db extension is available that allows syncing a local working directory with an eXist database thus enabling comfortable development of XQuery code.

We provide a configuration template to connect your dracor-api working copy to the dracor-v1 workspace in a local eXist database (e.g. the one started with docker compose up).

After installing the VS Code extension copy the template to create an .existdb.json configuration file:

cp .existdb.json.tmpl .existdb.json

Adjust the settings if necessary and restart VS Code. You should now be able to start the synchronization from a button in the status bar at the bottom of the editor window.

XAR Package

To build a dracor-api XAR EXPath package that can be installed via the dashboard of any eXist DB instance you can just run ant.

Webhook

The DraCor API provides a webhook (/webhook/github) that can trigger an update of the corpus data when the configured GitHub repository for the corpus changes.

Note: For the webhook to work, the shared secret between DraCor and GitHub needs to be configured at /db/data/dracor/secrets.xml in the database.

License

dracor-api is MIT licensed.

dracor-api's People

Contributors

Stargazers

Watchers

Forkers

cmil ingoboerner

dracor-api's Issues

Add number of words per character to node data in GEXF

#24 (comment):

One other obvious thing to provide on a per-node basis would be the number of words per character (i.e., everything within <sp>/<p> and <sp>/<l> (including <emph>, but excluding <note> and <stage> within <sp>…).

It would add to our visualisations to align the node sizes with the number of words per character, an aspect we can't visualise at the moment…

Add CORS headers

To allow the API to be used by other web apps we need to add appropriate CORS headers.

Calculate and store corpus metrics when database is updated

To be able to show more detailed corpus metrics (e.g. token counts) on the dracor.org home page, those metrics should be calculated after an update of the corpus files in any sub collection of /db/data/dracor. The numbers should be stored in the database, possibly at /db/data/dracor/metrics.xml.

The re-calculations could be triggered after updates from both load.xq and github-webhook.xq.

SPARQL endpoint

may with the rdf indexer by ljo: https://github.com/ljo/exist-sparql

Add table with character data to API

Proposed name:

/corpora/{corpusname}/play/{playname}/characters/csv

Proposed values:

ID and label:

Character ID
Character Label

Three quantitative measures:

Scene Appearances
- = number of scenes a character appears in
Speech Acts
- = number of <sp> per character
Number of Words

Five network-based measures (per character)

Degree
Weighted Degree
Betweenness Centrality
Closeness Centrality
Eigenvector Centrality

As far as I can see, we do not calculate network values per character for API purposes yet. Our Shiny app has implemented this already in the Vertices tab and may serve as point of references for these values.

Load corpora asynchronously

To avoid long running HTTP requests and possible timeouts the loading of an entire corpus from its repository should be done in an asynchronous manner. Similarly to how we now deal with webhook deliveries, a POST /corpora/{corpusname} should just schedule a job, which then runs independently
and updates the database.

collection structure

@cmil you came up with notable thoughts on the current structure of the collections in the database. as far as i remember this should lead to something like the following:

db
└── data
    └── dracor
        ├── metrics
        │   ├── ger
        │   ├── rus
        │   ├── shake
        │   └── swe
        ├── rdf
        │   ├── ger
        │   ├── rus
        │   ├── shake
        │   └── swe
        └── tei
            ├── ger
            ├── rus
            ├── shake
            └── swe

is this correct and if not, may be you can provide a better sample?

Set up range index to improve performance

as suggested by @mathias-goebel in #11 (comment)

Provide network metrics per play

In order to be able to retire JSNetworkX in the frontend (dracor-org/dracor-frontend#59) the API should make the metrics accessible by play.

Add number of female, male, unknown speakers and number of groups to metadata table

(no further text)

Add segmentation data in CSV and/or JSON

For example, for .csv:

member, scene
Podkolesin, Действие первое | Явление I
Podkolesin, Действие первое | Явление II
Stepan, Действие первое | Явление II

HTTP ERROR 500 for TEI version on https://dracor.org/ger

Implement PUT /corpora

In order to decouple data and application code and to be more flexible in how we populate a dracor-api instance I suggest to follow a RESTful approach and implement the appropriate methods to add and update data via the API itself.

A first step would be to add corpora by PUTing its meta data to /corpora. The payload would be a simple JSON structure like this:

{
    "name": "ger",
    "title": "German Drama Corpus",
    "repo": "https://github.com/dracor-org/gerdracor"
}

This creates an index.xml file in the corpus' TEI collection storing the meta data that is currently held in corpora.xml.

Optionally the TEI files can be loaded from the GitHub archive.

This gives us more flexibility in setting up a test environment without having to build different xar packages and allows us to add corpora to an installation without having to modify the software.

load.xq should be called from post-install.xq

…so it will set up a working application on installation

Exclude text within <note> from delivering tailored spoken text

Example: https://dracor.org/api/corpora/rus/play/turgenev-vecher-v-sorrente/spoken-text

Text within the <note> element is part of the "spoken text", which should be excluded (e.g., in the line starting with "'Celenza, jé… moua… héhé…" in the above example).

<note> should also be excluded in functions spoken-text-by-character and stage-directions (although there will be close to zero examples for the latter).

Improve precision of word counts

The dracor stats module currently uses a simple \W+ regular expression for tokenising texts to count words. This regex includes characters like dashes (-) or apostrophes (’) which results in an imprecise word count. We should find a better regular expression and use the count of tei:w elements in shakedracor as a comparison for testing.

Add playId to metadata table

Since now we have stable IDs for each play throughout all corpora, we could add playId to the metadata table, next to playName.

Improve performance of network CSV generation

The generation of CSV network data at <api>/corpus/{corpusname}/play/{playname}/networkdata/csv currently take pretty long for larger networks and causes timeouts for consumers such as shiny.dracor.org. For example, http://127.0.0.1:8080/exist/restxq/corpus/ger/play/grabbe-napoleon/networkdata/csv takes about a minute and a half on the production server.

Web hook caching problem

Frequent corpus updates of the same TEI document pushed separately to Github fail to properly update the DraCor database via the webhook. This is because we currently use raw.githubusercontent.com to obtain the data for individual documents, which apparently caches requests for about 5min.

To alleviate this problem we could obtain the documents as blobs from the GitHub Data API (see https://developer.github.com/v3/git/blobs/) which seems to cache for 1min only.

This should probably done before #49.

API: list available corpora under /api/corpus/

For the time being, /api/corpus/ throws a 400. We could offer a list with available corpora, which at the moment would be {ger, rus, shake}.

Add possibility to download graphs in GEXF format

The CSV format we offer for download is limited in what it can comprise. We should start slowly to build a GEXF export. The first version could just comprise what the CSV comprises, but on top of the IDs also feature the labels, i.e., character names from <persName> (or, <name>, for person groups). – Here is an easy example how to build the GEXF format.

Add normalisedYear to /corpora/{corpusname} output

The title of the ticket should be self-explanatory. ;)

Also, while doing this, we could change the name of the column "year" in the metadata table (/corpora/{corpusname}/metadata.csv) to "yearNormalised".

Offer various text slices as txt downloads via API (for quantitative research)

It should be possible to obtain various text slices as needed:

all spoken text within <sp> (excluding <stage>, including <emph>) per play as simple txt stream
all stage texts per play (everything within <stage>) as simple txt stream

Additional option:

offer spoken text (as detailed above) separated by {male | female | unknown} property if available

Github webhook causes timeout with when updating too many files

The github webhook currently processes the modified files right away. This takes too long when there are too many of them and causes github to abort the request resulting in some files not being updated properly.

The webhook should be changed to just record the modified files so that a separately scheduled process could do the actual processing and database update.

Migrate to EXPath HTTP Client module

Since the eXist httpclient module is deprecated and will be removed in eXistDB 5.0.0 we should replace calls to httpclient:get() with http:send-request() of the EXPath HTTP Client module.

Add more quantitative values to meta data table

In addition to the meta data implemented in #14 we should introduce some quantitative values, like:

number of words uttered by female|male characters
number of speech acts by female|male characters

Provide global ID resolver

dracor.org/id/rus000001 (etc.) should, depending on the request header:

provide an RDF triple (if RDF is requested), or
forward to respective play (if called via browser)

Refine RDF generation

With #9 and #41 we now have RDF documents for each play. See, for instance:

There are still a few issues:

dc:title is missing in the RDFs for ger and shake corpora
dc:creator has firstname lastname in shake, but lastname, firstname in ger and rus
network measures are not yet included
owl:sameAs seems to appear twice
Plays in Graph <https://dracor.org/ger> lack rdfs:label

To quote @ingoboerner in #9 (comment):

RDF should look as follows:

<rdf:RDF xml:lang="en" 
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:dracon="http://dracor.org/ontology#">
    <rdf:Description rdf:about="https://dracor.org/rus/andreyev-mysl">
        
        <rdfs:label xml:lang="en">Andreyev, Leonid: A Thought</rdfs:label>
        
        <rdfs:label xml:lang="ru">Андреев, Леонид Николаевич: Мысль</rdfs:label>
        
        <dc:creator xml:lang="en">Leonid Andreev</dc:creator>
        <dc:creator xml:lang="ru">Леонид Николаевич Андреев</dc:creator>
        
        <dc:title xml:lang="en">A Thought</dc:title>
        <dc:title xml:lang="ru">Мысль</dc:title>
        
        <!-- http://dracor.org/ontology#normalisedYear -->
        <!-- http://dracor.org/ontology#premiereYear -->
        <!-- http://dracor.org/ontology#printYear -->
        <!-- http://dracor.org/ontology#writtenYear -->
        
        
        <!-- Author as blank node -->
        <dracon:has_author>
            <rdf:Description>
                <rdfs:label xml:lang="en">Leonid Andreev</rdfs:label>
                <rdfs:label xml:lang="ru">Леонид Николаевич Андреев</rdfs:label>
                <owl:sameAs rdf:resource="http://www.wikidata.org/entity/Q310866"/>
            </rdf:Description>
        </dracon:has_author>
        
       <!-- network-measures -->
        
        <!-- http://dracor.org/ontology#averageClustering -->
        <!-- http://dracor.org/ontology#averageDegree -->
        <!-- http://dracor.org/ontology#averagePathLength -->
        <!-- http://dracor.org/ontology#density -->
        <!-- http://dracor.org/ontology#diameter -->
        <!-- http://dracor.org/ontology#maxDegree -->
        <!-- http://dracor.org/ontology#maxDegreeIds -->
        <!-- http://dracor.org/ontology#numOfActs -->
        <!-- http://dracor.org/ontology#numOfSegments -->
        <!-- http://dracor.org/ontology#numOfSpeakers -->
        
        
        <dracon:in_corpus rdf:resource="https://dracor.org/rus"/>
        
        <owl:sameAs rdf:resource="http://www.wikidata.org/entity/Q59355429"/>
        
    </rdf:Description>
</rdf:RDF>

I generate a blank node for the author because within dracor it does not have its own dracor-id; I tested example-rdf in my local installation of Jena and it worked.

The drdf:play-to-rdf() function in rdf.xqm should be adjusted to get there.

Consolidate use of 'id' and 'name' properties

The id properties of some resources (/corpora/{corpusName}, /corpora/{corpusName}/play/{playName}) still provide the play name (e.g. "gogol-revizor"), while /corpora/{corpusName}/metadata already shows the now available DraCor IDs of the plays. This should be changed so that id properties referring to a play always give the DraCor ID whereas the play name is provided by a name property.

publish schema on website

Schema (https://github.com/dracor-org/dracor-schema/blob/master/odd/out/dracor.rng) should be published on the DraCor website, possibly https://dracor.org/schema
Maybe generate schema from Source ODD via http://oxgarage.tei-c.org/

Use openapi4restxq to generate API documentation

Instead of maintaining api.yml manually we should let the API generate its own documentation from xqDoc using the openapi4restxq package.

Builds:

Add function stage-directions-incl-speakers

As proposed by @nilsreiter, let us add another function called stage-directions-incl-speakers. This would be the same as the stage-directions function, but it would add the speaker strings if they are directly preceding a stage direction. So, for example, this one …

<sp who="#vroni">
  <speaker>VRONI</speaker>
  <stage>hat den Eimer umgestülpt und sich auf denselben gesetzt.</stage>
  <p>Erzähl weiter von meiner Mutter!</p>
</sp>

… would transform into:

VRONI hat den Eimer umgestülpt und sich auf denselben gesetzt.

The expected advantage of this added function is that part-of-speech tagging might work better with the subject of the sentence present.

Write tests using XQsuite

https://exist-db.org/exist/apps/doc/xqsuite

Handle payload encoding in api:sparql

In api:sparql the Content-Encoding of the POST data needs to be taken into account when processing the query. Depending on the request the encoding can differ.

While the following httpie request seems to get properly decoded (although resulting in a 400 response because of the nonsensical query)

echo "QUERY" | http -v https://dracor.org/api/sparql

the same query with curl results in an internal server error (500):

curl -v -X POST "https://dracor.org/api/sparql" -H "accept: application/sparql-results+xml" -H "Content-Type: application/sparql-query" -d "QUERY"

Include attribute definition into GEXF format

In order to recognise attributes, Gephi needs them defined upfront, so in our case (also proposing to lowercase "gender" in the id attribute):

<graph defaultedgetype="undirected" mode="static">
  <attributes class="node" mode="static">
    <attribute id="gender" title="Gender" type="string"></attribute>
  </attributes>
  <nodes>
  […]

While we are at it, maybe we could add a header:

<?xml version="1.0" encoding="UTF-8"?>
<gexf xmlns="http://www.gexf.net/1.3" version="1.3">

Thus, also jumping from GEXF 1.2draft to 1.3?
Also, "MALE" and "FEMALE" are correctly inserted, but "UNKNOWN" is not (no <attvalues> given in these cases).

Implement PUT on /corpora/{corpusname}/play/{playname}

As a second step after #44 I suggest to implement PUT ~~and POST~~ on a single play. This would accept a TEI document and would create a new or update an existing resource in the database. This would allow us to add or update a single play without depending on a Github push or reloading the entire corpus.

Graphical schema for dracor.org architecture

For presentation purposes and a better overview altogether.

Export metadata as both csv and json

Retire github-webhook.xq

In #48 we added a new webhook endpoint to the API. Once this proves to work well enough we should remove the old webhook implementation in github-webhook.xq.

Document DraCor API using OpenAPI/Swagger

Eigenvector Centrality in cast list differs from values calculated by Gephi and igraph

We're not the first to notice this mismatch (cf. "Eigenvector Centrality Oddity with iGraph, Gephi, and NetworkX"). While that article finds diverging values for all three, igraph, Gephi and NetworkX, we find that igraph and Gephi throw the same results, while NetworkX begs to differ.

To add another example, here's what our R script throws (using igraph) for "Emilia Galotti":

The documentation for igraph and NetworkX both insinuate that they're relying on the same algorithm. Could you maybe check if you throw the 'edge weights' into the formula (which we don't do)? This could explain the different values…

Originally posted by @lehkost in #31 (comment)

Implement GitHub webhook

To update the database when changes are pushed to the gerdracor and rusdracor repositories, a webhook should be implemented.

There is an example implementation at https://gist.github.com/wsalesky/bf26507ff593f0c99a35.

Offer metadata table via API

We would need something like this table for research purposes:
https://github.com/lehkost/RusDraCor/blob/master/Ira_Scripts/NEW_calculations.csv

Our Python scripts would welcome such up-to-date metadata table, and so would our Shinyapp. Would be nice to know how to calculate the more complicated network metrics in XQuery and whether there are any libraries out there that might help in doing so. In any case, we can certainly start with a subset that is easy to calculate.

I propose the following columns:

(See #18)
~~In addition (no hurry) we should introduce some quantitative values, like:~~

~~number of words uttered by female|male characters~~
~~number of speech acts by female|male characters~~

For the quantitative values it is maybe a good idea to have a cron job calculate the table every so often instead of live-generating this with every request?

Cast function of API throws servor error for some plays

The play where it first turned up was Lermontov's "Maskarad":

https://dracor.org/api/corpora/rus/play/lermontov-maskarad/cast

Looking at the error message, it seems to have to do with the way numbers are stored, probably coming from the metrics service?

Most plays seem to work fine, but there are a few others that are affected, like:

Build issues

This ticket collects a few build issues that are still extant after the merge of PR #36. The below output of ant devel demonstrates some of them.

the exist package is extracted twice, in dependencies and prepare-exist, which is unnecessarily time consuming
there is an ERROR: Could not bind to port because Address already in use at the end of the init target. While this does not seem to be of any consequence for running the development database, it looks fishy and should be solved one way or the other.
~~currently, when building a XAR package you cannot see from the outside whether it has been built with the -Dtestdracor=true option or not, which may lead to confusion~~

$ ant devel
Buildfile: /Users/cmil/Projects/dracor/dracor-api/build.xml

check-devel:

test-corpora:

corpora:
     [copy] Copying 1 file to /Users/cmil/Projects/dracor/dracor-api

create-dirs:

xar:
     [copy] Copying 1 file to /Users/cmil/Projects/dracor/dracor-api
      [zip] Building zip: /Users/cmil/Projects/dracor/dracor-api/build/dracor-0.33.0.xar

dependencies:
      [get] Destination already exists (skipping): /Users/cmil/Projects/dracor/dracor-api/build/dependencies/eXist-db-4.5.0.tar.bz2
    [untar] Expanding: /Users/cmil/Projects/dracor/dracor-api/build/dependencies/eXist-db-4.5.0.tar.bz2 into /Users/cmil/Projects/dracor/dracor-api/devel
      [get] Destination already exists (skipping): /Users/cmil/Projects/dracor/dracor-api/build/dependencies/crypto-0.3.5.xar
      [get] Destination already exists (skipping): /Users/cmil/Projects/dracor/dracor-api/build/dependencies/sparql-latest.xar

prepare-exist:
     [echo] install eXist to devel/eXist-db-4.5.0
      [get] Destination already exists (skipping): /Users/cmil/Projects/dracor/dracor-api/build/dependencies/eXist-db-4.5.0.tar.bz2
    [untar] Expanding: /Users/cmil/Projects/dracor/dracor-api/build/dependencies/eXist-db-4.5.0.tar.bz2 into /Users/cmil/Projects/dracor/dracor-api/devel
     [copy] Copying 2 files to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/autodeploy

set-ports:
     [xslt] Processing /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-http.xml to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-http-tmp.xml
     [xslt] Loading stylesheet /Users/cmil/Projects/dracor/dracor-api/resources/ant/jetty-port-update.xslt
     [move] Moving 1 file to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc
     [xslt] Processing /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-ssl.xml to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-ssl-tmp.xml
     [xslt] Loading stylesheet /Users/cmil/Projects/dracor/dracor-api/resources/ant/jetty-port-update.xslt
     [move] Moving 1 file to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc
     [xslt] Processing /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty.xml to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-tmp.xml
     [xslt] Loading stylesheet /Users/cmil/Projects/dracor/dracor-api/resources/ant/jetty-port-update.xslt
     [move] Moving 1 file to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc

init:
     [exec] 
     [exec] 20 Dec 2018 18:25:40,219 [main] INFO  (JettyStart.java [run]:149) - Running with Java 1.8.0_191 [Oracle Corporation (Java HotSpot(TM) 64-Bit Server VM) in /Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home/jre] 
     [exec] 20 Dec 2018 18:25:40,219 [main] INFO  (JettyStart.java [run]:156) - Running as user 'cmil' 
     [exec] 20 Dec 2018 18:25:40,220 [main] INFO  (JettyStart.java [run]:157) - [eXist Home : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0] 
     [exec] 20 Dec 2018 18:25:40,238 [main] INFO  (JettyStart.java [run]:158) - [eXist Version : 4.5.0] 
     [exec] 20 Dec 2018 18:25:40,238 [main] INFO  (JettyStart.java [run]:159) - [eXist Build : 201811211903] 
     [exec] 20 Dec 2018 18:25:40,238 [main] INFO  (JettyStart.java [run]:160) - [Git commit : e29b4099c] 
     [exec] 20 Dec 2018 18:25:40,238 [main] INFO  (JettyStart.java [run]:162) - [Operating System : Mac OS X 10.14.1 x86_64] 
     [exec] 20 Dec 2018 18:25:40,239 [main] INFO  (JettyStart.java [run]:163) - [log4j.configurationFile : file:///Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/log4j2.xml] 
     [exec] 20 Dec 2018 18:25:40,245 [main] INFO  (JettyStart.java [run]:164) - [jetty Version: 9.4.10.v20180503] 
     [exec] 20 Dec 2018 18:25:40,246 [main] INFO  (JettyStart.java [run]:165) - [jetty.home : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty] 
     [exec] 20 Dec 2018 18:25:40,246 [main] INFO  (JettyStart.java [run]:166) - [jetty.base : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty] 
     [exec] 20 Dec 2018 18:25:40,246 [main] INFO  (JettyStart.java [run]:167) - [jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/standard.enabled-jetty-configs] 
     [exec] 20 Dec 2018 18:25:40,747 [main] INFO  (JettyStart.java [run]:176) - Configuring eXist from /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/conf.xml 
     [exec] 20 Dec 2018 18:25:47,656 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty.xml] 
     [exec] 20 Dec 2018 18:25:47,773 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-gzip.xml] 
     [exec] 20 Dec 2018 18:25:47,800 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-http.xml] 
     [exec] 20 Dec 2018 18:25:47,831 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-jaas.xml] 
     [exec] 20 Dec 2018 18:25:47,836 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-jmx.xml] 
     [exec] 20 Dec 2018 18:25:47,869 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-requestlog.xml] 
     [exec] 20 Dec 2018 18:25:47,878 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-ssl.xml] 
     [exec] 20 Dec 2018 18:25:47,887 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-ssl-context.xml] 
     [exec] 20 Dec 2018 18:25:47,913 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-https.xml] 
     [exec] 20 Dec 2018 18:25:48,013 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-deploy.xml] 
     [exec] 20 Dec 2018 18:25:48,034 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-plus.xml] 
     [exec] 20 Dec 2018 18:25:48,054 [main] INFO  (JettyStart.java [run]:200) - [loading jetty configuration : /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/tools/jetty/etc/jetty-annotations.xml] 
     [exec] 20 Dec 2018 18:25:48,058 [main] INFO  (JettyStart.java [startJetty]:516) - [Starting jetty component : org.eclipse.jetty.server.Server] 
     [exec] 20 Dec 2018 18:25:48,059 [main] INFO  (JettyStart.java [lifeCycleStarting]:662) - Jetty server starting... 
     [exec] 20 Dec 2018 18:25:49,877 [main] ERROR (JettyStart.java [run]:368) - ---------------------------------------------------------- 
     [exec] 20 Dec 2018 18:25:49,878 [main] ERROR (JettyStart.java [run]:369) - ERROR: Could not bind to port because Address already in use 
     [exec] 20 Dec 2018 18:25:49,878 [main] ERROR (JettyStart.java [run]:370) - java.net.BindException: Address already in use 
     [exec] 20 Dec 2018 18:25:49,878 [main] ERROR (JettyStart.java [run]:371) - ---------------------------------------------------------- 
     [exec] /Library/Java/JavaVirtualMachines/jdk1.8.0_191.jdk/Contents/Home/bin/java -Xms128m -Xmx2048m -Dfile.encoding=UTF-8 -Dexist.home=/Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0 -jar /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/start.jar shutdown -u admin -p 
     [exec] Shutting down database instance at 
     [exec] 	xmldb:exist://localhost:8080/exist/xmlrpc/db
     [exec] 20 Dec 2018 18:25:52,695 [qtp416878771-23] INFO  (JettyStart.java [shutdown]:602) - Database shutdown: stopping server in 1sec ... 
     [xslt] Processing /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/conf.xml to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/conf.xml.tmp
     [xslt] Loading stylesheet /Users/cmil/Projects/dracor/dracor-api/resources/ant/exist-conf.xslt
     [move] Moving 1 file to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0
     [copy] Copying 1 file to /Users/cmil/Projects/dracor/dracor-api/devel/eXist-db-4.5.0/autodeploy

devel:
   [delete] Deleting: /Users/cmil/Projects/dracor/dracor-api/anttmp-062451

check-metrics:
     [echo] start the import process with `bash devel/eXist-db-4.5.0/bin/startup.sh`

BUILD SUCCESSFUL
Total time: 1 minute 1 second

Upgrade to eXist 4.0

It's out since January.

/sparql does not accept `application/x-www-form-urlencoded; charset=UTF-8`

While POST requests to /sparql work fine when the Content-type header is exactly application/x-www-form-urlencoded, they won't be accepted when a charset is added to the MIME type, e.g. application/x-www-form-urlencoded; charset=UTF-8. In this case a 400 HTTP method POST is not supported by this URL response is sent.

This appears to be a limitation in eXist's RESTXQ implementation. Since content types with charset are sent by default by clients like YASGUI we should find a was to accept those.

Use prefix from corpora.xml in load.xqm

https://github.com/dracor-org/dracor-exist/blob/11e4ed1f2c5e2e5b18c43fd0f11ee1daa93834ba/modules/load.xqm#L30

Currently /tei/ is hardcoded in the filter function that filters paths when loading files from a ZIP archive. This should be changed to use the prefix declared in corpora.xml

Replace /corpora/{corpusname}/load with POST /corpora

The current way of loading the data of a corpus into the database by GET /corpora/{corpusname}/load should be replaced by a POST request on /corpora with a JSON payload like this:

{
  "load": true,
  "corpora": ["ger", "rus"]
}

Not only would this allow us to update multiple corpora at once. It would also be more RESTful since load is not really a resource but rather an action upon a resource.

This is related to #44 and #45.

Add number of subgraphs per play to metadata table

I am expecting that NetworkX's function number_connected_components is doing just that.

For the majority of plays this value will be 1. A famous aberration will be Goethe's 1st part of Faust which features the unconnected Prologue in the Theatre and the Walpurgis Night's Dream scene, thus adding two subgraphs.