GithubHelp home page GithubHelp logo

fredracor's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

lucagiovannini7

fredracor's Issues

Fix digital source URL

For each play we currently provide two URLs for the digital source, one is an HTML page and the other one is the actual TEI from Théâtre Classique. See for instance:

<bibl type="digitalSource">
<name>Théâtre Classique</name>
<idno type="URL">http://theatre-classique.fr/pages/programmes/edition.php?t=../documents/ABEILLE_ARGELIE.xml</idno>
<idno type="URL">http://theatre-classique.fr/pages/documents/ABEILLE_ARGELIE.xml</idno>

While the issue #22 caused by this has been fixed on the API level, we should still stick to only one source URL. I would suggest to loose the reference to the HTML page and just keep the TEI URL, since these are actually the documents we use for the import to FreDraCor. What do you think @lehkost?

See original post by @cmil in #22 (comment)

unresolved pair characters with 'et' in many French plays

FreDraCor networks seem to be plagued with glued-together characters like "Ugande et Alcif", "Corisande et Florestan". In some 17 century plays such characters easily make up 30-50% of the network nodes which basically renders the whole network false. See example network for Amadis by Philippe Quinault attached.
amadins_quinault
@lucagiovannini7 is going to check some of the plays relevant to his PhD research BUT we need a more systematic solution for the whole corpus. Especially since this looks like an automatable thing (split by ' et ').
Of course, there are also many harder cases such as 'first african', 'second african', and 'both africans' making 3 nodes instead of 2, which also affects network metrics... But even resolving all "A et B" would be a giant leap for FreDraCor

CSV metadata table unreachable

As of now, it is not possible to download the csv metadata table at https://dracor.org/api/corpora/fre/metadata/csv. The JSON table is working, though. Error message:

HTTP ERROR 500 javax.servlet.ServletException: javax.servlet.ServletException: An error occurred while processing request to /exist/restxq/v0/corpora/fre/metadata/csv: err:XPTY0004 checking function parameter 1 in call string-join(untyped-value-check[xs:anyAtomicType, for <697> $c in $api:metadata-columns return <698> if ( count($m($c)) = 0 ) then "" else dutil:csv-escape(untyped-value-check[xs:string, $m($c)]) ], "",""): XPTY0004: The actual cardinality for parameter 1 does not match the cardinality declared in the function's signature: dutil:csv-escape($string as xs:string) xs:string. Expected cardinality: exactly one, got 2. [at line 698, column 74, source: /db/apps/dracor-v0/modules/api.xqm] In function: dutil:csv-escape(xs:string) [698:51:/db/apps/dracor-v0/modules/util.xqm] api:get-corpus-meta-data-csv(item()) [732:3:/db/apps/dracor-v0/modules/api.xqm] api:corpus-meta-data-csv-endpoint(item()) [-1: -1:/db/apps/dracor-v0/modules/api.xqm]

Fix print dates

Print dates are lacking the when attribute even if there is a year which is given as text content of the respective date element. See for instance:

<date type="print">1672</date>

This should be fixed in tc2dracor.xq.

Introduce slug and dates in "ids.xml"

In order to be able to add missing dates (or correct erroneous ones), we should introduce a corresponding option in "ids.xml".

The same goes for a better readable slug that divides the words in a meaningful way.

Both demonstrated with this example of "Sermon Joyeux de Bien Boire" (DraCor ID: fre000037):

<play id="fre000037" file="ANONYME_SERMONJOYEUX.xml"/>

will be:

<play id="fre000037" file="ANONYME_SERMONJOYEUX.xml" slug="anonyme-sermon-joyeux-de-bien-boire" print="1545" premiere="" written=""/>

If available, this additional data would be written into the DraCor files when transforming the original files (if available, these dates would also override dates from the original files, since sometimes there are discrepancies between first print edition [whose date we collect, i.e., Datum des Erstdrucks] and the edition used by TC).

Also, ranges à la notBefore and notAfter are possible, in these cases the two year numbers are spearated by an en dash (–).

Add classCode elements for matching genre information

The transformation at

fredracor/transform.xq

Lines 777 to 782 in 55e2b63

<textClass>
<keywords>
<term type="genreTitle">{string($doc//*:SourceDesc/*:genre)}</term>
<term type="genreTitle">{string($doc//*:SourceDesc/*:type)}</term>
</keywords>
</textClass>
should add appropriate classCode elements where genre information found in the originals matches the recognised text classes defined in dracor-org/dracor-api#122.

See also original discussion in dracor-org/dracor-api#120.

The codes with matching genre attributions (incomplete suggestions) would be:

  • Q40831 for Comedy
    • Comédie ballet
    • Comédie héroïque
    • Comédie Parade
    • Comédie-ballet
    • Comédie
    • Farce
    • Saynète
  • Q80930 for Tragedy
    • Tragédie
    • ...
  • Q192881 for Tragicomedy
    • Tragi-comédie
    • ...
  • Q131084 for Libretto
    • Ballet
    • Comédie ballet
    • Comédie-ballet
    • Opéra Bouffe
    • Opéra comique
    • Opéra
    • Opérette
    • Tragédie en musique
    • Tragédie lyrique

I would also suggest to add a scheme attribute to the keywords element and omit the term/@type in order to make clear where this classification comes from and avoid confusing it with the keywords we recently added to GerDraCor and RusDraCor.

The textClass markup could then look like this:

<textClass>
  <keywords scheme="http://theatre-classique.fr">
    <term>Tragédie</term>
    <term>vers</term>
  </keywords>
  <classCode scheme="http://www.wikidata.org/entity/">Q80930</classCode>
</textClass>

or for a libretto (e.g. moliere-bourgeoisgentilhomme.xml)

<textClass>
  <keywords scheme="http://theatre-classique.fr">
    <term>Comédie-ballet</term>
    <term>mixte</term>
  </keywords>
  <classCode scheme="http://www.wikidata.org/entity/">Q131084</classCode>
  <classCode scheme="http://www.wikidata.org/entity/">Q40831</classCode>
</textClass>

wikipediaLinkCount always "0" for FreDraCor

The wikipediaLinkCount column always equals "0" in the FreDraCor metadata file, even in cases where it shouldn't be (i.e., if there's a Wikidata ID for a play AND sitelinks >= 1). I also tried Ger and Rus, works for these corpora.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.