GithubHelp home page GithubHelp logo

lilimelgar / cen-catalogus-skillnet-exploration Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 17.78 MB

Jupyter notebook to explore a cleaned dataset of a slice of the Catalogus Epistularum Neerlandicarum (a catalog of early modern history letters' metadata)

Home Page: https://skillnet.nl

License: MIT License

Jupyter Notebook 100.00%

cen-catalogus-skillnet-exploration's People

Contributors

lilimelgar avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

kintopp

cen-catalogus-skillnet-exploration's Issues

Explain better what the letter counts mean for sender/receiver

One of the participants (Koen) asks:
"How did you make the list of names with sum(sender+receiver) for every name? It could be interesting to see the ratio sender/receiver. E.g., if the ratio is > 1, one sends more than receives, and if < 1, the opposite is true. If the ratio is > 50, you know the collection is biased, and probably only occurs in their own correspondence collection (in the case of Huygens, Grotius, etc.). Watch out for dividing by 0 if receiver is 0."

Take care of this in three steps:

  • Explain/document better what the counts mean
  • Add counts per letter besides counts per hits (this is also based on issue #1)
  • Create code that counts the number of letters sent vs the number of letters received
  • Consider if it's possible adding documentation (or a value) indicating possible bias in the collection according to Koen's suggestion (what he meant is that some catalogs have entered all the correspondence received by of one person). Need to gather requirements for this one.

Correct explanation of number of letters vs number of records

In the section "Letters per unique senders" and the other subsections under "Letters per person/entity" there is a confusion between number of hits=records and number of letters. Change the comments/explanations or decide to include also a counting of the number of letters using the "amount" column. Based on Aron's input during workshop.

Explain what the values mean when "shape" is used

Based on question from Ellen during workshop, for example in section
"# display the number of letters in the second-degree network dataset
letters_2ndD_unique.shape"
I should add explanation of what the two values in shape mean, or only display the relevant value.

Add shelfmark column to data displays

Based on Dirk's input during the workshop, every time that data is displayed include the "signatuur", he made comment in section: "Original data (sender/receiver)"

Reset index when data is displayed

It is confusing for novice users what the index means, during the workshop a participant thought this was the number of letters. I should reset the index every time data is displayed.

Test/improve counts of certain/uncertain letter years

At the moment (version August 23, 2022) there are three groups for counting letters per year: (1) Certain items (no ranges, 1 letter, certain year), (2) missing year, (3) uncertain items (ranges + uncertain 1 letter year). The third group could be split into two, because there are letter year ranges which are certain. Thus:

  • Consult the team, test the options
  • Implement improvements to that section
  • Check consistency with criteria used for Ingeborg's article to separate certain/uncertain slices

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.