Jupyter notebook to explore a cleaned dataset of a slice of the Catalogus Epistularum Neerlandicarum (a catalog of early modern history letters' metadata)

Home Page: https://skillnet.nl

License: MIT License

Jupyter Notebook 100.00%

cen-catalogus-skillnet-exploration's People

Contributors

Stargazers

Watchers

Forkers

kintopp

cen-catalogus-skillnet-exploration's Issues

Explain better what the letter counts mean for sender/receiver

One of the participants (Koen) asks:
"How did you make the list of names with sum(sender+receiver) for every name? It could be interesting to see the ratio sender/receiver. E.g., if the ratio is > 1, one sends more than receives, and if < 1, the opposite is true. If the ratio is > 50, you know the collection is biased, and probably only occurs in their own correspondence collection (in the case of Huygens, Grotius, etc.). Watch out for dividing by 0 if receiver is 0."

Take care of this in three steps:

Explain/document better what the counts mean
Add counts per letter besides counts per hits (this is also based on issue #1)
Create code that counts the number of letters sent vs the number of letters received
Consider if it's possible adding documentation (or a value) indicating possible bias in the collection according to Koen's suggestion (what he meant is that some catalogs have entered all the correspondence received by of one person). Need to gather requirements for this one.

Correct explanation of number of letters vs number of records

In the section "Letters per unique senders" and the other subsections under "Letters per person/entity" there is a confusion between number of hits=records and number of letters. Change the comments/explanations or decide to include also a counting of the number of letters using the "amount" column. Based on Aron's input during workshop.

Change data source from SurfDrive to Dataverse

Get Dataverse to work with JN and colab (https://guides.dataverse.org/en/latest/api/dataaccess.html#download-by-dataset-api), change code that grabs data from SurfDrive to the final version in Dataverse.

Explain what the values mean when "shape" is used

Based on question from Ellen during workshop, for example in section
"# display the number of letters in the second-degree network dataset
letters_2ndD_unique.shape"
I should add explanation of what the two values in shape mean, or only display the relevant value.

Add shelfmark column to data displays

Based on Dirk's input during the workshop, every time that data is displayed include the "signatuur", he made comment in section: "Original data (sender/receiver)"

Reset index when data is displayed

It is confusing for novice users what the index means, during the workshop a participant thought this was the number of letters. I should reset the index every time data is displayed.

Test/improve counts of certain/uncertain letter years

At the moment (version August 23, 2022) there are three groups for counting letters per year: (1) Certain items (no ranges, 1 letter, certain year), (2) missing year, (3) uncertain items (ranges + uncertain 1 letter year). The third group could be split into two, because there are letter year ranges which are certain. Thus:

Consult the team, test the options
Implement improvements to that section
Check consistency with criteria used for Ingeborg's article to separate certain/uncertain slices

Add item to curiosa: letters written in the year a person died

Curiously, many letters were written in the year a person died. This resulted in more than 300 letters. Add this to the notebook.

Add DOI to citation

Create repository DOI via Zenodo and add it to citation.

lilimelgar / cen-catalogus-skillnet-exploration Goto Github PK

cen-catalogus-skillnet-exploration's People

Contributors

Stargazers

Watchers

Forkers

cen-catalogus-skillnet-exploration's Issues

Explain better what the letter counts mean for sender/receiver

Correct explanation of number of letters vs number of records

Change data source from SurfDrive to Dataverse

Explain what the values mean when "shape" is used

Add shelfmark column to data displays

Reset index when data is displayed

Test/improve counts of certain/uncertain letter years

Add item to curiosa: letters written in the year a person died

Add DOI to citation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs