GithubHelp home page GithubHelp logo

mromanello / hucitlib Goto Github PK

View Code? Open in Web Editor NEW
11.0 3.0 1.0 19.94 MB

HuCit KB: a knowledge base of classical texts and citable text units.

Home Page: https://hucitlib.rtfd.io/

License: GNU General Public License v3.0

Python 18.60% Shell 0.40% Jupyter Notebook 81.00%
digital-classics python knowledge-base canonical-texts canonical-references

hucitlib's Introduction

HuCit Knowledge Base

Status

PyPI version Documentation Status Build Status codecov

NB: hucitlib is currently being ported from Python 2 to 3. For legacy-related reasons, the version available on PyPi (hucitlib 0.2.9) still supports only Python 2. If you need Python 3 support you may want to install version 0.3.0 available in the issue-3/py3 branch (you will need to install it from sources as it currently requires a forked version of surf, since the official surf does not support Python 3 yet).

Description

The HuCit KB is a knowledge base about classical (Greek and Latin) texts, developed with the precise aim of supporting the automatic extraction of bibliographic references to such texts.

The data model of the HuCit KB is based on the following ontologies:

It builds upon and connects with the following resources:

Installation

git clone https://github.com/mromanello/hucit_kb.git
cd hucit_kb
python setup.py install

Or via pip:

pip install hucitlib

Command Line

The library comes with a (development) Command Line Interface.

To see the documentation, try running:

hucit --help

For example, you can search works by name:

hucit find "Iliad"

or look up authors/works by CTS URNs:

hucit find urn:cts:greekLit:tlg0012.tlg001

Stats

Basic stats

Total Min Max Mean Variance
Author names 4842 1 27 3.12791 9.81298
Author abbreviations 774 0 2 0.5 0.26309
Work titles 10354 1 31 1.99154 6.4174
Work abbreviations 2377 0 3 0.457203 0.574496

LOD stats

link to Perseus Catalog (%) link to CWKB (%) link to VIAF (%) link to Wikidata (%)
Authors 4.91 100.00 5.88 4.91

Example

This is an example of how to use the HuCit KB programmatically:

>>> import pprint
>>> import pkg_resources
>>> from knowledge_base import KnowledgeBase

>>> virtuoso_cfg_file = pkg_resources.resource_filename('knowledge_base','config/virtuoso.ini')

>>> kb = KnowledgeBase(virtuoso_cfg_file)

>>> search_results = kb.search('Omero')

>>> print result.to_json()
{
  "name_abbreviations": [
    "Hom."
  ],
  "urn": "urn:cts:greekLit:tlg0012",
  "works": [
    {
      "urn": "urn:cts:greekLit:tlg0012.tlg001",
      "titles": [
        {
          "language": "it",
          "label": "Iliade"
        },
        {
          "language": "la",
          "label": "Ilias"
        },
        {
          "language": "en",
          "label": "Iliad"
        },
        {
          "language": "de",
          "label": "Ilias"
        },
        {
          "language": "fr",
          "label": "L'Iliade"
        }
      ],
      "uri": "http://purl.org/hucit/kb/works/2815",
      "title_abbreviations": [
        "Il."
      ]
    },
    {
      "urn": "urn:cts:greekLit:tlg0012.tlg002",
      "titles": [
        {
          "language": "en",
          "label": "Odyssey"
        },
        {
          "language": "de",
          "label": "Odyssee"
        },
        {
          "language": "la",
          "label": "Odyssea"
        },
        {
          "language": "fr",
          "label": "l'Odyss\u00e9e"
        },
        {
          "language": "it",
          "label": "Odissea"
        }
      ],
      "uri": "http://purl.org/hucit/kb/works/2816",
      "title_abbreviations": [
        "Od.",
        "Odyss."
      ]
    },
    {
      "urn": "urn:cts:cwkb:927.2814",
      "titles": [
        {
          "language": "la",
          "label": "Epigrammata"
        }
      ],
      "uri": "http://purl.org/hucit/kb/works/2814",
      "title_abbreviations": [
        "epigr."
      ]
    }
  ],
  "uri": "http://purl.org/hucit/kb/authors/927",
  "names": [
    {
      "language": "fr",
      "label": "Hom\u00e8re"
    },
    {
      "language": "la",
      "label": "Homerus"
    },
    {
      "language": null,
      "label": "Homeros"
    },
    {
      "language": "en",
      "label": "Homer"
    },
    {
      "language": "it",
      "label": "Omero"
    }
  ]
}

hucitlib's People

Contributors

mromanello avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

paregorios

hucitlib's Issues

remove duplicate records

  1. Quintus Smyrnaeus, Posthomerica :: urn:cts:EpiBau:epibau118.epibau001 and
    Quintus, Posthomerica ::urn:cts:greekLit:tlg2046.tlg001

  2. Kallimachos, Hymn to Apollo :: urn:cts:greekLit:tlg0533.tlg016 and
    Callimachus,, Hymnus in Apollinem :: urn:cts:EpiBau:epibau021.epibau001 (the same problem exists for Hymn to Diana and Hymn to Delos)

speed up method fetching of lexical information from the KB

(the same applies to properties KnowledgeBase.author_abbreviations, .work_titles etc.)

in order to be able to parallelize the generation of counts, a method is needed that gets all CTS URNs from the KB. The SPARQL query would be:

PREFIX frbroo: <http://erlangen-crm.org/efrbroo/>
            PREFIX crm: <http://erlangen-crm.org/current/>
            PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
            SELECT ?label ?resource_URI ?resource_type
            WHERE {
                ?resource_URI rdf:type ?resource_type .
                ?resource_URI crm:P1_is_identified_by ?urn .
                ?urn a crm:E42_Identifier .
                ?urn rdfs:label ?label
            }
            LIMIT 10

req support for Python 3

Evidently this package only works with Python 2. The diagnostic error is that install with pip under Python 3.6.5 OSX El Capitan (10.11.6) fails:

$ pip install hucitlib
Collecting hucitlib
  Downloading https://files.pythonhosted.org/packages/80/8a/768433eba0d9ae07efae3b4f170a6c15dd9484296ab5c05aba292fe91c17/hucitlib-0.2.8.tar.gz (1.9MB)
    100% |████████████████████████████████| 1.9MB 1.7MB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/z8/2zsyr5dn2cl06wl3fwz3f2_00000gp/T/pip-install-mr7vicxo/hucitlib/setup.py", line 5, in <module>
        execfile('{0}/__version__.py'.format(NAME))
    NameError: name 'execfile' is not defined
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/z8/2zsyr5dn2cl06wl3fwz3f2_00000gp/T/pip-install-mr7vicxo/hucitlib/

Also cannot use git clone and setup.py. Same error:

$ git clone https://github.com/mromanello/hucit_kb.git
Cloning into 'hucit_kb'...
remote: Counting objects: 590, done.
remote: Total 590 (delta 0), reused 0 (delta 0), pack-reused 590
Receiving objects: 100% (590/590), 6.29 MiB | 3.23 MiB/s, done.
Resolving deltas: 100% (326/326), done.
$ cd hucit_kb/
$ python setup.py install
Traceback (most recent call last):
  File "setup.py", line 5, in <module>
    execfile('{0}/__version__.py'.format(NAME))
NameError: name 'execfile' is not defined

[epibau] add new records

  • Apollodorus Library Epitome (Perseus catalog record)
  • Scaliger, Julius Caesar
  • Milton, Paradise Lost
  • Orphic argonautica
  • Eratosthenes Katasterismos
  • remove duplicated entry for Theocritus, Idyllia :: urn:cts:EpiBau:epibau128.epibau001

enable passing kb settings as keyword args instead of config file

instead of being forced to do

kb = KnowledgeBase(configuration_file)

I want to be able to do

KnowledgeBase(reader='sparql_protocol', writer='sparql_protocol', server='localhost', endpoint='http://nlp.dainst.org:8888/sparql', port=8888, default_context='http://purl.org/hucit/kb')

Need to determine what's the minimum subset of parameters if any

[KB] add records for Mythologiae project

[KB] mapping of CTS URNs

The KB contains several legacy custom URNs that need to be mapped to Perseus Catalog's CTS URNs. The full list can be found here. For example, urn:cts:cwkb:535 needs to be mapped to urn:cts:latinLit:phi0926.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.