GithubHelp home page GithubHelp logo

culturesofknowledge / emplaces Goto Github PK

View Code? Open in Web Editor NEW
7.0 9.0 0.0 54.71 MB

Early Modern Places

License: MIT License

HTML 9.51% Python 73.96% Shell 3.01% CSS 1.86% TypeScript 11.66%
geo gazetteer historical

emplaces's Introduction

EM Places

Early Modern Places (EM Places) is a collaboratively curated, historical geo-gazetteer for the sixteenth- to eighteenth-century under development by the Cultures of Knowledge project at Oxford University. It is the first of what will eventually become three Linked Open Data resources also comprising EM People and EM Dates built on a shared humanities infrastructure platform in collaboration with the Humanities Cluster of the Royal Dutch Academy of Arts and Sciences (KNAW) in Amsterdam.

You can read an overview of EM Places in the 'Places' chapter of Hotson & Wallnig (eds.) Reassembling the Republic of Letters in the Digital Age: Standards, Systems, Scholarship (Göttingen, 2019).

Goals

EM Places is being designed to meet four goals:

The first is to be a resource for identifying early modern places by means of their current and historical name variants. To this end, EM Places will combine current place (and alternative) place names, a current administrative hierarchy and location data from reference gazetteers such as GeoNames and WikiData and extend this with further place name attestations provided by contributors from primary sources. Users will be able to browse and search for places on multiple criteria, refine their results over facets, and export their search results. To facilitate semi-automatic disambiguation of bulk metadata EM Places will function as a reconciliation service for OpenRefine.

The second is to provide a means for researchers to contribute richer historical contexts to places. EM Places will provide means for capturing i) basic partitive data on historical polities in political-administrative and ecclesiastical hierarchies (later, also military and judicial hierarchies), ii) either current, or where available, georeferenced historical maps, iii) the dates of transition between official calendars in a region (e.g. from the Julian to the Gregorian) for reuse in EM Dates, iv) custom attributes describing ‘associations’ between places' (e.g. the time and cost for mail to travel between two postal stations on a named postal route), and v) links to additional historical resources and bibliographies.

The third is to fully credit, source, and cite all contributions to the gazetteer by individual researchers. Regular contributors with registered accounts on EM Places will be able to submit new data or suggest revisions to existing data using either a web interface or a bulk upload facility. More experienced users with editorial privileges will have the means to review and approve these contributions. Users will be able to see whether data in the gazetteer originated from a reference gazetteer such as GeoNames or was added by an individual contributor. All contributors to EM Places will be able to call up a listing of their contributions and revisions to the gazetteer.

The fourth and final goal is to make the EM Places source code and datasets easily accessible and reusable by others. To this end, the source code for EM Places, based on the Timbuctoo technical infrastructure developed by the KNAW Humanities Cluster, will be shared under open source and made available for reuse in virtual Docker containers. The data in EM Places will be shared under open access licenses and distributed over multiple channels: as user-initiated exports of individual records from the application itself, on popular open repositories such as GitHub, and via the EM Places GraphQL API.

Our intent is to prepare the gazetteer in a transparent and collaborative manner as possible to allow it to become a useful resource for for the early modern community and an active participant in the proposed Pelagios network. In support of this, in addition to CSV, Excel, and RDF-Turtle, EM Places will support the export of structured data in the new Geo-JSON-T Linked Places Interconnection Format.

Design

The draft design documents (display, search, edit) offer an informal description of the planned features for EM Places together with a first set of interface mock-ups. More details will be added here as the gazetteer’s features are finalized.

Data Model

An draft of the proposed data model for EM Places and a set of data model diagrams. More details will be added here as the data model is refined.

Sample Data

Historical sample data from Silesia (c. 1500-1900) including information on administrative, ecclesiastical, judicial and military hierarchies, calendrical transition dates, toponyms, map references, and associated resources.

Status

June 2019: Data model sufficiently complete for pilot, sample data created, search/results & record detail interfaces refined; development sprints now underway; first beta scheduled for end of August.

December 2018: Second revision of internal tool for creating core metadata for place records from reference gazetteers; created sample dataset from EMLO places; reviewed Timbuctoo infratructure dependancies; defined basic editorial workflows;

November 2018 Further revisions to the data model; provisional plan for searching and browsing place records. Initial version of internal Linked Data web editor to be used for prototyping and data entry until the EM Places web editor is ready.

August 2018: Provisional full draft of complete data model completed; first release of tool for processing GeoNames data; completed draft user interface for record detail view

July 2018: First public draft of the design document describing the proposed features of the gazetteer with schematic mock-ups of potential UI elements. First public draft of the overview data model document.

March - June 2018: Private drafts of the gazetter's design document and data model.

Feedback and Comments

We are keen to get your comments and feedback on EM Places. Please get in touch by contacting Arno Bosse (Digital Project Manager, Cultures of Knowledge) by email [email protected] via @kintopp on Twitter or by creating a new GitHub issue in the repository with your comment/question.

Contributors

Arno Bosse (Oxford - Project Management), Howard Hotson (Oxford - Director), Graham Klyne (Oxford – Data Modelling), Miranda Lewis (Oxford - Editor), Martijn Maas (HuC – Systems Development), Glauco Mantegari (Design Consultant), Jauco Noordzij (HuC – Systems Development), Marnix van Berchum (HuC - Project Management), Mat Wilcoxson (Oxford – Systems Development), Rob Zeeman (HuC – Systems Development).

Acknowledgements

We would like to acknowledge the inspiration we drew, and the help we received from several related projects, including the COST Action 'Reassembling the Republic of Letters', GeoNames, Das Geschichtliche Orts-Verzeichnis (GOV), the Getty TGN, the Herder Institute for Historical Research on East Central Europe, the Pelagios Project, WikiData, and the World Historical Gazetteer. Particular thanks are due to Dariusz Gierczak for providing us with sample historical gazetteer data on Silesia.

EM Places, EM People, and EM Dates were funded 2017-2019 by a grant to the University of Oxford from the Andrew W. Mellon Foundation.

emplaces's People

Contributors

gklyne avatar kintopp avatar mmaas3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emplaces's Issues

Discussion: Bibliographic citations

We need to decide which citation formats we're going to support (e.g. Chicago Manual of Style, MLA, BibTeX, RIS) and how and where (in the workflow) we're going to carry out the transformations (e.g. using CSL tools).

Discussion: Authorities

Which are included in GeoNames? Are these represented correctly in the sample RDF? Do we want to add more (and if so, where will we find them)?

GeoNames admin hierarchy inconsistencies

(I'm recording this here partly as a placeholder, pending an issue submission to GeoNames)

I've run against an inconsistency in the GeoNames hierarchy data for Opole (id 3090048).

The hierarchy displayed by GeoNames, and recorded within the Opole record is:

<gn:parentFeature rdf:resource="http://sws.geonames.org/7532413/"/>

<gn:parentADM3 rdf:resource="http://sws.geonames.org/7532413/"/>
<gn:parentADM2 rdf:resource="http://sws.geonames.org/7530818/"/>
<gn:parentADM1 rdf:resource="http://sws.geonames.org/3337495/"/> - Opole Vovoideship
<gn:parentCountry rdf:resource="http://sws.geonames.org/798544/"/> - Poland

But if I chase down the parentFeature references, I end up with a different hierarchy:

7532413
7530818
3082777 - Upper Silesia: plausible(?), but...
3077311 - ... Czechia !

I would expect that following the parentFeature references would yield the same values that are stored within the original record. In this case, the redundant information appears to be out-of-phase.

Discussion: Provenance pages

The sample data will need to be provided per element.. that is, separate Sources texts (as modal pop-ups) for each functional area in the detail view.

Discussion: Info popups

The sample data will need to be provided per element.. that is, separate info texts for each functional area in the detail view.

Discussion: Search customizations

  • Show in the GUI what search parameters (~2 etc.) are possible
  • Allow for search results with multiple fields
  • We will get requests for custom ranking weighting, we should iterate until we have something nice

#Later

  • We also discussed creating a hierarchical facet where you can "pin" a place and then only search in the children of that place.
  • We also discussed an "attribute" filter that does a full text search/filter on all properties that are used in the search result display

Discussion: Location

I think we should also represent the decimal degrees latitude/longitude as degrees, minutes, and seconds. Need to clarify whether the data is also represented this way in GeoNames. If not, decide where (when in the workflow) to convert it. Later, in advanced search, do we want to allow search by location expressed in (partial) degrees, minutes, and seconds?

And if we're going to perform one kind of transformation anyway, we could also look into displaying additional, early modern prime meridians here (e.g. Paris, Cadiz).

Support for multiple start/end dates for a historical place

Up to now, we've worked with a simplified notion of the start and end of a (historical) place on the assumption that it came into existence on some year and dissolved at some later year. But the historical record is more complex.

Take for example the Duchy of Opole and Racibórz. According to Wikipedia (for the sake of argument, let's just assume the data there is correct), it came in and out of existence several times:

1202–1281
1521–1532
1551–1556

Are we able to model these intermittent states of existence in its own place record? In this entry, since we are dealing with the place 'as such' we need to record 1202-1281 as well.

Add analytics tracking

We would like to add a piece of JavaScript to each Place page so that we can record the statistics of how much and in what way EMPlaces is being used. For instance possible using Google Analytics.

EMPlaces vocabulary namespace URIs

This relates to issue #6, but is different, hence a separate issue. This is about the vocabulary terms used by the data model rather than the identifiers assigned to places themselves. There is a little overlap in areas like time period identifiers, etc.

For reference, see https://github.com/culturesofknowledge/emplaces/blob/master/models/20180802-opole-example-multisourced.ttl#L26:

@prefix em:         <http://emplaces.namespace.example.org/> .   #@@@@ TBD
@prefix emt:        <http://emplaces.namespace.example.org/timespan/> .
@prefix eml:        <http://emplaces.namespace.example.org/language/> .
@prefix ems:        <http://emplaces.namespace.example.org/source/> .

And there's this, which will be updated based on the resolution of issue #6:

@prefix ex:         <http://emplaces.data.example.org/> .        #@@@@ TBD

I've been using placeholder namespace URIs in example.org domain for:

  • em: general vocabulary terms
  • emt: predefined time period identifiers; specifically emt:Current
  • eml: language identifiers; this might be replaced by an existing vocabulary (e.g., lexvo?)
  • ems: predefined source/authority identifiers; currently only ems:EMPlaces defined, to represent material under EMPlaces editorial control. (Related: do we want to distinguish EMLO here?)

So the issue is: what domain to use for these. Possible options are:

  • earlymodern.org (or whatever domain has been registered for this project). Are we confident that the registration will be maintained long-term? Dereferencability is desirable but not essential.
  • vocab.ox.ac.uk - (a) creates a possible perception of being Oxford-centric; (b) what to we know about the current state of the commitment (by the Bodleian/university) to keep this live in the long term?
  • purl.org - has had some problems, but is widely used and I assume there's a lot of motivation out there to keep it going.
  • doi.org - we could allocate this via Zenodo as a Datacite URI, and publish the ontologies in GitHub (I've done something similar fort Annalist software: e.g. https://doi.org/10.5281/zenodo.594496
  • Others? (I haven't listed ARK, as this seems to be a different use-case.)

I lean towards using a Datacite DOI for these, as there's a lot of community support behind these, and treating the ontology as a data publication seems about right, and hopefully these will tie in to long-term preservation strategies. The downside is that I'm not sure if we can content-negotiate a DOI to directly retrieve an RDFS/OWL ontology - it usually ends up on a splash-page.

Discussion: Calendars

Is all the RDF data being represented properly or is it missing and has to be added?

Discussion: Toponyms

At present we import the full list of alternative names (toponyms) from GeoNames for indexing. Only a subset should be displayed:

For example, while we can allow users to search for and find places using a Persian transliteration of a place (Opole: اوپوله) there is no need to display this particular transliteration in the list of alternative names. For the display list, alternative names will be drawn from a short list of major European languages and historical forms (i.e. Latin). Next, the list of alternative names will be compared with one or more additional gazetteers. From this these lists, a merged set of unique alternative names will finally be shown. This is to avoid having to list multiple, identical instances of e.g. 'Opole' for the many language transliterations supported by GeoNames.

For the purposes of the sample Opele RDF, we can define this manually. Later, both for bulk imports and when adding single records, we'll need to provide a rule.

Discussion: Maps

Placeholder for a discussion on how we could provide support for historical maps.

Discussion: View customizations

Handling citations has been deferred to later

  • info
  • add hover-hint about how long the higher level place existed
    Show the 4 tabs in the hierarchy with military and judicial greyed out. Whether ecclesiastical will be included is - still up for debate. It will be greyed out when not available.
  • provenance
  • change to the coordinates
  • permanent uri
  • authorities (geonames, getty TGN, wikidata, GND)
  • grab calendar data from the first entity up in the tree that has it. Keep it as a table.
  • add related places
  • show the historical maps (tabs and an image with a link to the original resource)
  • add licence (make sure to highlight that parts are CC-BY and parts are CC-0)
  • add creator and contributors (everyone who edited the record) from timbuctoo provenance
  • add export links
  • make hierarchy links
  • add issue #19 (cofk repo) and see how possible it is given the current provenance status

Discussion: Related Resources

This is a custom list (not what's being show now – which is partially the authorities list). Arno will need to provide this from the Opele Word document prepared by Dariusz.

Create (or define) a sample historical place

We need to define the core and extended data elements of a historical place. Probably the simplest would be to pick a place in Opele's historical administrative hierarchy, e.g. Duchy of Opole and Racibórz.

Need to decide whether this will require a full RDF sample (derived in large part from Opele) or whether a description of the differences would suffice.

BUG: geonames extractor problems with admin hierarchy

I just noticed a problem with some data generated by the geonames extractor/converter.

E.g., see: https://github.com/culturesofknowledge/emplaces/blob/master/src/geonamesdataexport/data-20190624/geonames-data-ref-by-EMLO-0001-0100.ttl#L20467

The em:hasRelation showing Friesland_ADM1_2755812_geonames as part of Netherlands_PCLI_2750405_geonames has an incorrect value for em:relationType at line 20496: it is recorded as em:P_PART_OF_A, but should be em:A_PART_OF_A.

At the time of writing, I don't know if this is throughout the data - I suspect it is: I may have failed to take account of the type of the place for which the relation is described.

Code for this is at about:

https://github.com/culturesofknowledge/emplaces/blob/master/src/geonamesdataexport/get_geonames_data.py#L865

It does appear that the relationship type is hard-wired, rather than being derived based on the place type.

Add support for multiple prime meridians > multiple lat/longs

Several different prime meridians were in active use in the period most relevant to EM Places (1500-1800). Of these, cartographically, the (arguably) most important ran through the Canaries, the Cape Verde, and the Azores islands. For example, location references in Diderot's Encyclopedie assume the meridian runs through the Ferro Islands in the West Canaries.

For details, see: Jonkers, A.R.T., 2005. "Parallel meridians: Diffusion and change in early-modern oceanic reckoning," in: Noord-Zuid in Oostindisch perspectief, ed. J. Parmentier. The
Hague: Walburg, pp.17-42

Current hierarchy is not showing correct labels for place entities

Currently shows:

+ {Name unknown}
  + http://emplaces.data.example.org/Opole_ADM2
    + http://emplaces.data.example.org/Opole_ADM3
     + City of Opole

Should instead show:

+ Poland
  + Opole Voivodeship
    + Opole 
      + Opole
        + City of Opele

Need to clarify if this is an issue with the RDF data or an issue with the mapping.

Revise coredataextractor feature code labelling

abosse$ python get_core_data.py placehierarchy 3090048
3090048   # Opole (Populated place (ppla))
798544    # Poland (Country)
3337495   # Opole Voivodeship (Region (adm1))
7532413   # Opole (City   (adm3})
7530818   # Opole (County (adm2))

Later, we'll want to revise the labels accompanying the feature codes (PPLA etc.) provided by GeoNames. For example, ADM3 in this context means 'administrative seat of the county of Opole' rather than 'city' of Opele as a 'populated place', which is captured by the PPL hierarchy.

See http://www.geonames.org/export/codes.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.