GithubHelp home page GithubHelp logo

saw-leipzig / csv2cmi Goto Github PK

View Code? Open in Web Editor NEW
8.0 6.0 5.0 285 KB

a little program to transform a table of letters into the CMI format

License: MIT License

Python 83.43% XSLT 16.57%
cmi-format catalogue digital-humanities csv-parser csv tei-xml correspondence xslt hacktoberfest

csv2cmi's Introduction

CSV2CMI

DOI GitHub release license

About

CSV2CMI is a little program to transform a table of letters (given as .csv) into the CMI format. The CMI format is the underlying data format for the web service correspSearch which facilitates searching across diverse distributed letter repositories.

It is mainly intended for printed (print only) editions and catalogues of letters.

Usage

You have to name your columns as follows:

  • name of the sender: "sender"
  • name of the addressee: "addressee"
  • IDs of the named person or organization: "senderID" and "addresseeID" (this is essential for correspSearch)
  • the date, when the letter has been sent: "senderDate"

You may provide additional information:

  • where a letter has been sent: "senderPlace" (with the appropriate "senderPlaceID" as proper GeoNames URL)
  • where a letter has been received: "addresseePlace" (with the appropriate "addresseePlaceID" as proper GeoNames URL)
  • when a letter has been received: "addresseeDate"

Furthermore an "edition" column for a bibliographic record, a "key" column for the corresponding number of the edited letter, and even a "note" column can be added.

Various senders or addressees of a letter have to be written in the same cell with a separator that is specified with the "--extra-delimiter"-option (IDs have to follow the same order respectively).

Dates have to be entered in ISO format. Support for EDTF is implemented to parse uncertain / approximate dates, intervals and sets.

For providing essential CMI information like the editor's name or the publisher an INI file is needed.

Check, that your table is using UTF8-encoding!

For options and further information check the wiki.

License

This program is available under The MIT License (MIT)

If you use this software, please cite it!

csv2cmi's People

Contributors

jadolan avatar rettinghaus avatar ukretschmer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

csv2cmi's Issues

updated GND-IDNs

When CSV2CMI finds an updated ID in the authority file, should it simply use the new ID or just throw a warning to the editor?

orgName

add possibility to mark addressee as organization

support various senders/addressees

It could be useful to allow various correspondents of the same type, since many letters have more than one sender or addressee. A possible way of handling them could be a special character as a delimiter within the cell. See the following example:

sender,senderID,senderPlace,senderPlaceID,senderDate,addressee,addresseeID,addresseePlace,addresseePlaceID,edition,key,note Johann Sebastian Bach|Maria Barbara Bach,http://viaf.org/viaf/12304462|http://viaf.org/viaf/45361994,Leipzig,http://www.geonames.org/2879139,1730-10-28,Georg Erdmann,http://viaf.org/viaf/3264804,Danzig,http://www.geonames.org/3099434,"Bach-Dokumente Bd. 1, Leipzig 1963",23,four years later

Examples does not work in CMIF Creator

Hi!

running python csv2cmi.py examples/Example.csv and uploading examples/Example.xml in https://correspsearch.net/en/cmif-creator.html, the tool returns various errors and does not parse correctly the correspondences.

cmif-creator

I've several other example of csv not working. I've been able to modify csv2cmi.py to create a cmif files which validates in CMIF Creator.

My question is: starting from a csv, should I use csv2cmi or it must validate on CMIF Creator, before sending it to correspSearch?

BTW, it is not clear if correspSearch accept only CMIF files or the csv is enough.

Thanks!

Put extra-delimiter in ini file

It would be nice to store the extra delimiter in the ini file.
That way handling multiple projects (with different separators) would become easier.

switch from ini to json

Wouldn't it be better to have additional meta data for configuration stored in a json file instead an ini file?

Add dates automatically

In bigger contexts it could be useful to add dates to letters automatically that correspond to the life span of the sender.

Use textual date information?

In addition to the technical date information, extracted from senderDate or addresseeDate and represented in corresponding attributes of the date element, it would be nice to use textual content for the date element, instead of leaving it empty, if there is such information within the CSV file, e.g. in columns labeled as senderDateText and addresseeDateText.

This would enable representations like

<date notBefore="1767-12-21" notAfter="1768-01-10">Ende Dezember 1767 oder Anfang Januar 1768</date>

Bibliography replacement

It would be nice to have a mechanism implemented, which replaces used edition short titles in CSV file by configured (ini file) full titles.

Example:

Used short title: JBW1, JBW2

ini file part:

[Bibliography]
JBW1 = Jabobi Briefwechsel, Band 1, usw.
JBW2 = Jabobi Briefwechsel, Band 2, usw.

Supporting multiple projects

It would be good to have an option to choose the output path and/or the path of the ini file, to better support projectspecific settings, e.g.:

  • projectfolder1
    -- input.csv
    -- csv2cmi.ini
    -- output.xml
  • projectfolder2
    -- input.csv
    -- output.xml

date shouldn't contain text

CMIF is very strict and doesn't allow content in date elements. We could implement a switch to switch between CMIF and full TEI or put date strings into comments.
I'd prefer the former with default to strict.
Any opinion on this @ukretschmer?

add -a option

Use only published letters by default; add an a- option for all letters in table.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.