Jupyter notebook to explore a cleaned dataset of a slice of the Catalogus Epistularum Neerlandicarum (a catalog of early modern history letters' metadata)
One of the participants (Koen) asks:
"How did you make the list of names with sum(sender+receiver) for every name? It could be interesting to see the ratio sender/receiver. E.g., if the ratio is > 1, one sends more than receives, and if < 1, the opposite is true. If the ratio is > 50, you know the collection is biased, and probably only occurs in their own correspondence collection (in the case of Huygens, Grotius, etc.). Watch out for dividing by 0 if receiver is 0."
Take care of this in three steps:
Explain/document better what the counts mean
Add counts per letter besides counts per hits (this is also based on issue #1)
Create code that counts the number of letters sent vs the number of letters received
Consider if it's possible adding documentation (or a value) indicating possible bias in the collection according to Koen's suggestion (what he meant is that some catalogs have entered all the correspondence received by of one person). Need to gather requirements for this one.
In the section "Letters per unique senders" and the other subsections under "Letters per person/entity" there is a confusion between number of hits=records and number of letters. Change the comments/explanations or decide to include also a counting of the number of letters using the "amount" column. Based on Aron's input during workshop.
Based on question from Ellen during workshop, for example in section
"# display the number of letters in the second-degree network dataset
letters_2ndD_unique.shape"
I should add explanation of what the two values in shape mean, or only display the relevant value.
Based on Dirk's input during the workshop, every time that data is displayed include the "signatuur", he made comment in section: "Original data (sender/receiver)"
It is confusing for novice users what the index means, during the workshop a participant thought this was the number of letters. I should reset the index every time data is displayed.
At the moment (version August 23, 2022) there are three groups for counting letters per year: (1) Certain items (no ranges, 1 letter, certain year), (2) missing year, (3) uncertain items (ranges + uncertain 1 letter year). The third group could be split into two, because there are letter year ranges which are certain. Thus:
Consult the team, test the options
Implement improvements to that section
Check consistency with criteria used for Ingeborg's article to separate certain/uncertain slices