princeton-cdh / geniza Goto Github PK

View Code? Open in Web Editor NEW

11.0 3.0 2.0 46.31 MB

version 4.x of the Princeton Geniza Project

Home Page: https://geniza.princeton.edu

License: Apache License 2.0

Python 77.18% HTML 6.90% CSS 0.59% JavaScript 4.08% SCSS 11.12% Shell 0.13%

python django digital-humanities geniza judaeo-arabic

geniza's Introduction

Princeton Geniza Project

Python/Django web application for a version 4.x of the Princeton Geniza Project.

Python 3.9 / Django 3.2 / Node 16 / Postgresql / Solr 9.2

https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336

Technical documentation is available at https://princeton-cdh.github.io/geniza/

For developer instructions, see DEVNOTES.

License

This software is licensed under the Apache 2.0 License.

geniza's People

Contributors

Stargazers

Watchers

Forkers

mrustow owenduffymassey

geniza's Issues

test aligning TEI transcriptions with output from eScriptorium and Google Cloud Vision API

As a user, I want to search on text with or without accents so that I can easily find variants of words with accents and diacritics in the data.

edit schema to add unicode folding to the fields where we want it (maybe just add to existing text field)
change indexing/search config to use the new field if necessary (if new field name/type)
test to confirm & document required steps for the update to take effect (is copying files + restart + reindex sufficient?)

As a global admin, I want to be able to add and edit a language, script, or correlation between language and script in our ontology, in order to expand our content footprint.

testing notes

list view should include language, script, and optional display name
should be able to edit existing records or add new one; display name should be optional
should be able to delete an existing record
if you try to enter a language + script combination that already exists, you should get an error

should be able to view languages and scripts but not add, edit, delete

Adding a new language or script or correlation (for example, Georgian or Arabic in Greek script); 2) Editing a language or script already in the list, or the correlation between them.

Revise the sitemap and site flow diagram so that the project team would know about the content that would exist on the site and the possible ways they are connected in a more comprehensible way to meet their needs

Link to the revised sitemap

revised the map structure to make it easier to read
separated "content" from "features"
color coded page hierarchy by levels (0-4)
visually identified "potentially out of scope" pages and content and features
removed details for the two pages that are "potentially out of scope" and new in concept

Questions for you:

Can you read this map?
Does separating content from features make sense to you in this way?
Please read through all the "content" and "features" and tell me, are there items that are categorized incorrectly? If yes, what are they?
Does this map answer any questions for you? What are they? Are there questions it's not answering for you?
Does this level of detail make sense to you?

run sample content through google cloud vision api

get text
get zones it recognizes on the page

Revise the fomatting of edition/discussion/transcription and linking to them on scholarship records to make linked content clearer, easier to read, and more meaningful for users

As a data editor, I want metadata from the geniza spreadsheet indexed into Solr so I can run more advanced searches to find geniza materials.

See index_incipits.py in pemm-scripts

As a content editor, I want to see the number of documents we have in each language/script combination, so that I can understand the relative proportions and provide information for data visualization and research.

testing notes

on the language+script admin list view, confirm that:

counts are present and accurate for language
counts are present and accurate for probable language
clicking on language count takes you to the correctly filtered document list
clicking on probable language count takes you to the correctly filtered list
sorting on language and probably language counts works

This is in the "language/script" interface.

Prototype the holistic search so that users who are exploring documents through tags, or are looking for documents by inputting search terms/keywords can navigate through the site.

Design prototypes are here.

draft data diagram for relational database

Revise the sitemap diagram based on comments in #51 to make sure the propoposed website content and their hierarchies are comprehensible to the project team

Here is the link to the revised sitemap.

Note:

I have added notes in blue where more context was needed. Please read.
Please do not pay attention to what the "explore fragments" and "explore words" and "citation/scholarship records" will entail – we will discuss during our meeting. (just view the levels at which they are placed).

Description:

this is the revised sitemap, covering the content and functionality, and levels of the following pages on the site:

homepage,
cluster search,
browse documents by cluster (the page shown once a cluster is selected for further exploration),
search results (the page shown when there is an input in the search box)
document details
Citation/Scholarship Records
Contact Us
About with its subpages: 1. credits, how to cite, data exports, technical and FAQ

Pages that may be out of scope depending on data and priorities: (will discuss at our meeting)

Citation/Scholarship Records
Explore Fragments
Explore words

Questions:

Does the sitemap make sense to you? Does the legend make sense? How about the page levels? If not, please say why
2.Do you have any additions to what's considered as "Content" vs. "functionality" on any of the pages? Anything that's missing? Or you consider unnecessary?
Would you want to revise the name of any of the pages? For instance, which one makes more sense, "Citation records" or "Scholarship Records"?
Is there anything that you expected to see and is missing?

Update ansible deploy so we can use PUL Solr 8 for search

Ideally, we'll be using PUL's Solr infrastructure, which requires Solr 8. parasolr, our Solr python wrapper, is currently tested against Solr 6.6.5.

Updates to parasolr for Solr 8 will be tracked on parsolr's solr-8 branch

See Princeton-CDH/parasolr#48 ~~and Princeton-CDH/parasolr#49~~ (parasolr/issue/49 is unique to docker, not Solr8) for existing issues concerning the Solr 8 update.

As a data editor, I want a report on automatically identified named entities in item descriptions so I can compare it to entities I've already manually identified.

identify places
identify people
check list of places against team-provided known list of places
check list of people against team-provided known list of people

As a user, I want to search on metadata and transcriptions together so that I can find records by description or content.

As a data editor, I want to access links to external records where available so that I can view other versions of the item more easily.

Index Link to image column from metadata into solr
Add "View external record" link to record in search.
Index IIIF manifest for Cambridge records

As a global admin and content editor, I want to clone a document and have a record of the process, to keep track of the origins of the document record and changes in the data.

testing notes

Choose a document from the document list for editing, and use the 'save as new' button at the bottom. Confirm that all fields are populated from the original document record, and there is a note added that this record was cloned from the other one.
Do the same test, but add content to the notes field of the first document before saving it. Confirm that the note about the record being cloned is added to whatever text you added in the notes field.
Confirm that date created and last modified are set accurately for the new record, and not copied from the previous one.

dev notes

ideal implementation:

add a button to the edit view next "save and continue", etc. that's "clone this" or similar
button links to a new add view with all fields prepopulated using current values from the object, but doesn't submit (so user can change it before submitting)
prepopulate notes field with text "cloned from str(document)"

Create a sitemap of the search, holistic search, and doc detail page to communicate the site structure and how the pieces are connected

Here is a link to the sitemap (note: just the holistic search and the doc detail page are there now)

Revise the new search prototype so that when a search term is entered document tags are matched while showing contextual information about the clusters in which the documents appear so that the user can learn more about the position of the document with respect to the clusters

Design document detail pages so that researchers can learn more about a particular document and the fragment where it belongs

This issue is created mainly to communicate with RSK so that we can catch any inconsistencies that might exist between the goal of the designs/users' needs and the database design.

As a user, I want to search with multiple tags and choose how they should be combined (ANY/ALL) so that I can drill down or combine search results.

add faceting on tags field in the solr queryset
add tag facets input to to search form; display tag name and count for current search (use checkboxes to allow multiple)
add any/all configuration option specific to tags (radio boxes; "find documents that match [ALL/ANY] selected tags")
in the view, filter the search based on the selected tags and specified mode
when generating the search page with tags selected, make sure the form reflects current status (any/all and any selected tags)

Revise the information that appear on the cluster search results to better match researchers' needs

As a user, I want to see image thumbnails with search results when available, so that I can quickly see which records have images and what they look like.

testing notes

On the QA site, do some searches and confirm that you can see image thumbnails for each result
- Check documents with no images, one image, two image, and 3+ images
- Comment feedback here about the design and implementation

Revise the sitemap and site flow diagram so that the project team would know about the content that would exist on the site and the possible ways they are connected

Here is a revision to the sitemap

As a content editor, I want to download any data sets created through filters so that I can work on a subset outside of the database.

As a content editor, I want to edit all of the documents associated with the fragment on the same screen that I use to edit the fragment, so that in the case of demerging I can make sure that the data is split correctly.

Testing notes

On the fragment admin display, ensure that the fields are displayed properly in the TextBlock inline.
Click on the document string and ensure that it leads to the proper page.

dev note

create simple inline for text block and enable on fragment edit

editable: side & text+extent/region
display document id + description

As a data editor, I want to search across multiple metadata fields so that I can find geniza items by keyword or phrase.

When searching by description, bring up sorted matches with their associated PGPID, library, and shelfmark.

display tags on search result
display document type on search result
include library in fields searched

Design a way to navigate to a cluster from the search results so the user can view a specific cluster and learn more about the documents it contains

As a user, I want to read the geniza project in my native language so it's easier to understand.

testing notes

note - this issue is a companion to #36, but they address different issues and function differently. this issue covers all the text on the site that isn't actual "PGP data" - in other words, nothing that would come from the database.

you might encounter documentation (or people) that use the terms i18n and l10n. these are lazy ways of writing the long words "internationalization" and "localization", where the numbers mean how many letters you skipped in the middle of the word. the former refers to writing code that can be translated into multiple languages, while the latter refers to actually doing the translation ("localizing") that code into some particular language. for the PGP, the developers will be doing the i18n, but the l10n can be done by the project team.

as a user

visit the test site and you'll see a (very basic) homepage that tells you what language you're currently reading the PGP in.
you'll notice that the default is english; there's also now a /en/ appended to the URL to indicate this.
check out the list of professions in english and you'll see the transliterated profession names.
choose another language to read in using the dropdown at the top right, and click "Go." (these languages are configurable and we can choose as many or as few as we want).
you should now see the profession names change to reflect your choice, and the url suffix should also change to a language code (e.g. /he/ for hebrew). note though that the actual URLs will still be in english (e.g. /ar/people). let us know in a comment if this makes sense - it's possible to instead do /ar/اشخاص/ or /ar/ashas (idk if these are correct but you get the point).
go back to the homepage by clicking "home" in the top left, and you should see the text after "your language is" has also changed. once you pick a language, the website will "remember" it until you make another choice - including if you refresh, close the tab, close the browser, etc. you can remove the choice by clearing your cookies to default back to english, or just choose it from the dropdown.
note that the rest of the language on the homepage didn't change! to make that work, we have to do some extra work from the content editor's point of view.

as a content editor

have a look at the locale folder on github. you'll see three folders: ar, en, and he, corresponding to our three language options. each has a folder inside called LC_MESSAGES (this is a standard name that's required to use).
go ahead and click the folder until you see a file called django.po. this .po file (called a "message file") stores translations for each bit of text on the website that can be translated into multiple languages. if you have a look at the .po file for hebrew, you'll see starting on line 46 entries for all of the bits of text on the homepage ("Geniza multi-language testing", etc).
there's a lot going on here, so let's review the type of messages that you may see in this file. notes I added are in parentheses.

#. Translators: button on language chooser in navigation  (note left by the developer for the translator)
#: templates/base.html:33  (where in the code this bit of text is located)
msgctxt "choose this option"  (extra context for the translator, since "go" could be translated many ways)
msgid "go"    (what the original (untranslated) text reads)
msgstr ""       (the place where the translation goes)

#. Translators: subheading on homepage                               
#: templates/home.html:9                                                      
#, python-format   (indicates this bit of text contains python code)
msgid "Your language is: %(lang_name)s" ("lang_name" will be filled in later; we don't know what it is right now)
msgstr "" (the translation will include a placeholder for "lang_name")

time to add a translation! there are dedicated programs available for editing .po files, but the easiest way to test it right now is just to edit on github. click the "pencil" icon in the top right to edit it directly:

now you can fill in some of the msgstr fields with translations in whatever language's file you're editing. they don't need to be correct or "real" translations, but doing them in the correct language would be good.
when you're finished, scroll to the bottom and find "commit changes". fill in a commit message in the top box (something like "add arabic homepage translations") and if you're feeling fancy also add an extended description of what you did in the bottom box (not required).
make sure you click "create a new branch for this commit and start a pull request". this way, the developers will be notified of the changes and will have a chance to review everything before adding the code. it also helps prevent situations where two people edit the file at the same time and one person's changes "win" (unlikely, but possible). the name for your branch isn't important; something like "arabic-homepage-l10n"
you're done! you just localized something. comment with your thoughts/opinions about the process. if you want to try a fancier way, also check out poedit, which is one possible solution for doing lots of localization at once.

dev notes

Basic django app with some public interface and menus or site content to be translated

add a super basic css framework/stylesheet for prototyping
create the base/home template
create a basic header or footer

As a global admin, I want to be able to add a library collection not represented on the list in order to expand our content footprint and edit those already in the list.

testing notes

list view should include library, collection name, library abbreviation, abbreviation, location
should be able to edit existing records or add new one
should be able to save a record with collection name but no library
should be able to save a record with library but no collection name
should not be allowed to save a record with both collection name and library empty
should be able to delete an existing record
should be able to add a new record
if you try to enter a collection with a library + collection combination that already exists, you should get an error

should be able to view collections but not add, edit, delete

For example, when new collections become available or there are name changes.

As a user, I want transcription language detected and used for search so that I can see what languages are used and take advantage of language-specific search functionality.

add logic to transcription script to autodetect language; try spacy fastlang
pull xml tags from the xml when present; check against detected languages
include languages in transcription json files
update indexing script to index detected languages
update solr config & indexing script to use language-specific text fields

As a content editor, I want to add data to the database in multiple languages so that I can fully represent existing project data.

testing notes

note - this issue is a companion to #35, but they address different issues and function differently. this issue covers all the actual PGP data that can be stored in multiple languages, rather than text on the site.

as a user

see "as a user" on #35; since this part works exactly the same way. then, pick a person from the list of people. you'll see a basic testing page that tells you what profession that person had. currently, nobody has any profession. let's add one.

as a content editor

go to the admin backend for the site and click "people". click a person's name to edit them.
you'll now see the fields that are available on a person: their name, and their profession. go ahead and pick any profession from the list and save the model.
note also that there are three tabs here, labeled according to the language codes of the languages that you can currently browse the test site in. if you go to one of those tabs, the field will be empty, indicating there's no value in that language. we can control which languages are available at the level of each field, but the default is to allow every language that you can browse the site in.
go ahead and add a (fake or real) version of the person's name in another language and save the model.
if you go back to the public site and visit the page of the person you edited, you'll see a message like "X was a Y", where X is the person's name and Y is the profession you chose. if you switch language, both X and Y should now display in the translated versions you entered! if you didn't enter a translation for the language you chose, it will just display in english instead. note that the middle part of this text ("was a") isn't actually "PGP data", and thus it would be translated via the methods covered in #35.

leave comments/opinions below on how this process went for you.

dev notes

model editable in django admin with translation
simple list page
simple detail page

As a global admin, I want a one-time import of all documents and fragments currently in the PGP spreadsheet and the fields in the db populated accordingly, in order to work with the data in the database.

testing notes

Check a variety of documents and fragments from the PGP metadata spreadsheet and test how they have been imported.

for fragments:

check that fields are populated accurately from the spreadsheet:
- shelfmark
- historic shelfmark
- library/collection
- multifragment (yes/no)
- link to image
- iiif url for CUL documents with link to image (can verify via iiif viewer on fragment edit page)
- test that items with Library CUL in spreadsheet are assigned to the right collection based on shelfmark (T-S, Or., Add.)
check that record history documents creation via import script

for documents:

Check a few documents with joins to confirm that the document is linked to all fragments referenced by shelfmark in the join column

I want the following fields populated from the spreadsheet: library, shelfmark (current/historical), recto or verso, language/script, description, type and tags, and, if available, link to image.

dev notes

revisions after testing:

As a user, I want to see a list of topic clusters so I can get a sense of the major areas of content and relationships between them.

Create a new view & template in flask app
Use Solr facet pivot to get a list of tags that appear together, with some minimum (maybe 50, to start)
Output list of tag pairs & number of documents

As a global admin, I want a one-time import of the list of library collections that includes name, abbreviation and location, so that I can manage the library collection info in the database.

testing notes

Confirm that the list of libraries/collections has been correctly imported from the ontology spreadsheet
Check that the edit history for a few records to confirm that there is a log entry showing the record was imported by script

Design a generous search interface for the holistic search idea so that users can learn about the relationship between clusters of data

Here is the link to the proposed data scheme and two versions for stage 1 UI + flow for the cluster view

Note: "stage 1 UI" is the first step for designing a UI – colors, font, line, and shape weights, and alignments are not complete.

have proposed a way to categorize the clusters and the documents (The data scheme) – the condition mentioned in the data scheme is later used to create a graph where the clusters can be placed. But this is just a suggestion and I want to propose this to the project team and have conversations on the data scheme proposal and the condition mentioned.
have proposed two ways of representing the clusters (v1 and v2)

v1: using the most common tag such as "tax" and grouping all the other tags that accompany it
v2: using the logic in v1 but labeling the group with a unique name such as "finance" – or not labeling it at all but that might be problematic once in the document list.

have proposed 3 ways of navigating the clusters in each version.

to view all of the clusters
to view one cluster
to view a sub cluster

have shown how a shared sub cluster between two clusters might be handled
the document list shows the sub cluster with the largest number of documents first
Note: the document list does represent the intended logic however does not match the number of documents in the mock up because that is not necessary to reach the goal of this issue and convey the goal of this design.

Questions for you:

Does the design and the data scheme make sense to you? If not, please say why
Is there something that you need to help you understand the design which is missing here?

set up new django project for the new site

internationalization turned on (adapt from i18n prototype)
cas login configured and enabled
basic local settings & readme

As a global admin, I want a one-time import of the list of all languages and scripts, and their correlation, so that I can access, display and manage the information in the database.

testing notes

Confirm that the list of languages + scripts has been correctly imported from the ontology spreadsheet
Check that the edit history for a few records to confirm that there is a log entry showing the record was imported by script

As a content editor, I want to create, edit, filter and search documents so that I can add/edit information on documents in the database and find pertinent documents.

Testing Notes

List display:

Edit

Test that changes to a document through the edit display are saved and reflected in the list view.
Test that read only fields are displayed and not editable
Test that text blocks for associated fragments are listed and can be updated
Test that you can't add unknown for a probable language
Test that you aren't allowed to set the same language+script as both language and probable language
Test that you aren't allowed to set unknown language+script as probable language
Test that multiple text blocks on the same shelfmark don't result in repeated shelfmark in document combined shelfmark
Test setting text block order and confirm that join shelfmark follows the specified order

dev notes

revisions after testing:

switch language and probable language to autocomplete
fix language/probable language validation
add help text for language and probable language (to be supplied by the team)
fix document shelfmark so it includes unique shelfmarks in order
add multifragment text field
make extent filter empty/not empty
add empty/not empty multifragment filter

Fields
Provide a list of the fields and entities to be included on this model.

Fragments (entity)
description
side
text-block
Type (entity)
Tags (entity)
language
Footnotes for edition and translation (entity)
notes
input by
date entered

List Fields
Which fields should be included on the django admin list view?

shelfmark
description
type
tags
language
text-block (boolean)
edition

Edit Fields
Which fields should be editable via django admin view? Please list them in the order you want them to appear. Indicate which fields are optional, and any fields that should be displayed but not editable.

shelfmark (read only) at least one, possibly more
historical shelfmark (read only)
side (to indicate recto/verso)
Image thumbnail (optional)
type (optional)
language
description
tags (optional)
edition (optional)
translation (optional)
notes (optional)
text-block (optional)
Legacy input by (read only)
Legacy date entered (read only)

Search Fields
Which fields should be searchable in the django admin list view?

description
tags
shelfmark
edition
input by

Default sort

Shelfmark

Filters
Which fields should be used for list filters?

Type
language
Text-block

List Fields (optional)
Which fields (if any) should be editable on the django admin list view?
None.

Related models
List any directly related models (database tables) that would be helpful to edit when on the same page when editing this one.
• Fragment model

Additional context
Add any other context about the model or database here. Include a link to a database diagram if one is available.

Old version
~~List view fields (in order):~~
- Shelfmark (current) (searchable)
- Shelfmark historical (boolean: does it have a historical shelfmark?)
- Library (abbreviation) (filter)
- recto/verso (filter)
- type (filter)
- tags (filter & searchable)
- language/script (filter)
- description (searchable)
- editor (searchable)
- translator (searchable)
- link to image/image (thumbnail, IIIF),
- multifragment (boolean: is it part of a multifragment?)
- text-block (boolean: is it assigned text-blocks?)
- Input by (searchable)
~~- date entered (searchable).~~

~~None editable in the list view.~~
~~Filter: shelfmark (historical), Library, Recto/verso, type, tags, language/script, multifragment, text-block~~
~~Searchable: Shelfmark (current), tags, description, editor, translator, input by, date entered.~~

Create playbook for geniza production deploys

Pointing at QA solr for now since we don't have a real "production" index.

As a user, I want to see existing TEI transcriptions displayed with IIIF images so I can see how using annotations for transcription might work.

testing notes

Go to the test site at https://test-geniza.cdh.princeton.edu/iiif/ — you should see the Mirador 3 IIIF viewer with a list of manuscripts to select (note that it does load quite slowly because there are so many manifests in the list).

When you choose a document to view, it should open with the annotations panel displayed and should show a transcription beside an image. When you close the document you're looking at, use the "start here" at the top right to go back to the list and open another.

The documents available for testing right now are only those that we currently have both IIIF and transcriptions for. For now, I'm attaching all annotations to the first image in the IIIF manifest and setting the annotation zone to be the entire canvas. I'm creating separate annotations for each block of text within a TEI document, as indicated by a label preceding a new section.

I'm setting language direction as rtl and left-aligning everything; I also put all transcription lines into an ordered list, but I haven't yet looked into correcting the TEI documents with un-numbered lines that wrap, so I expect there are cases where the numbers will be wrong. There is no language detection yet.

For sign off on this story, please confirm:

transcriptions are associated with the correct documents (the association is based on PGP ID in the TEI and IIIF manifest in the metadata spreadsheet)
distinct blocks of text within a single TEI are correctly grouped

Feel free to add comments with things you notice that are missing and should be addressed when we start refining this. Here are a few things I've noticed so far:

Transcriptions should sometimes be associated with the second image in a manifest, but I'm not sure how to determine where they belong. It looks like if there is a "recto" or "verso" label, any sections after it are on that side.
Transcriptions of marginal text often seem to use / to indicate line breaks

First pass conversion of TEI transcriptions to IIIF annotation (only for subset of documents with IIIF and transcriptions, for now)

As a user, when I search by keyword I want to see a list of matching clusters with the most relevant documents so that I can get to results but also see the context of related materials.

add keyword search input & form
add new flask view & template for new search results (ideally template snippet for record display common to #23 )
implement solr search to retrieve and display documents in the cluster result format
determine and show which clusters documents in the search belong to
link clusters to cluster browse

test eScriptorium on sample content

setup local instance
import sample content provided by the project team
generate exports that can be used to test matching ocr + zones against tei transcripts

Create deploy playbook for geniza i18n prototype

needs apache configured to allow deployment side-by-side with search prototype
needs separate playbook with different app_name; uses django-specific roles/localsettings setup
needs mysqlclient deps installed on target machine so it can talk to mysql database (via common role)
clean up template_path settings in group vars for all projects since default was updated to be smarter

Synthesize and write a report to communicate the results of usability testing so that design decisions can be made and prioritized

Here is the link to the usability testing report

Please feel free to review and comment on the doc if you have questions about any part – I will share this with the project team in Jan.

As a content editor, I want to suppress documents, so that the records are kept updated.

testing notes

Change the "Status" on a few documents and ensure that the "public" field on the list view updates properly.
Filter by status in the list view side bar and ensure that it works properly.

dev notes

We have similar functionality in ppa, maybe a useful reference: https://github.com/Princeton-CDH/ppa-django/blob/main/ppa/archive/models.py#L285-L293

add status choice field to Document with options public, suppressed; set default to public
make status editable in admin
make status visible in list view as boolean field (see ppa implementation: https://github.com/Princeton-CDH/ppa-django/blob/main/ppa/archive/models.py#L323-L328)
enable filter on status in list view

May want to implement #78 in tandem with this one

As a content editor, I want to be able to flag and annotate a document as needing examination by a global admin, so that the global admin can either resolve the issue or submit a request for changes to the CDH.

New field: needs_review text field
Ability to create a link to a filter search of things that need review on the dashboard

testing notes

edit a few documents, add text to the 'needs review' field, and save
navigate to admin main index page: the documents you edited should be listed as awaiting review
you should be able to click on the document to go straight to the edit page
you should be able to click on the document heading to go to a filtered document list view showing all documents that need review
navigate to the corpus section of the admin site; confirm that the awaiting review section appears and functions the same as on the admin index page
create a test staff user account without permissions to view or edit documents and login as that user to confirm that they do not see the list of documents awaiting review

dev notes

add needs_review text field to Document; help text to indicate what it is for
add empty/nonempty filter on needs_review to document list view
create new template snippet/include with a brief list of 10 most recent (based on last modified) documents with nonempty needs review; should report how many items total need review and link document list view filtered to see all items that need review
- (Kevin):Document.objects.exclude(needs_review='').order_by('-last_modified')[:10]
extend corpus app_index admin template and include the document review snippet in the sidebar (see https://docs.djangoproject.com/en/3.1/ref/contrib/admin/#templates-which-may-be-overridden-per-app-or-model)
extend admin index and add document review in the sidebar after other sidebar content

May want to implement #77 at the same time, since the functionality is the same but for fragments

Design a document detail page so that users can view all the contextual information for a single document

Design the information architecture + flow of all the pieces that belong to this section in this issue
Include preliminary UI – e.g. potential position of features and content on the page.

Duplicate of #73

As a user, when I select a topic cluster I want to browse the documents in that cluster and see their metadata and other tags so that I can explore everything in that cluster.

make cluster list (topic pairs) into links; pass tags via keyword arg to same view
if tag is specified, retrieve documents and display short record view (based on design + fields available currently)

Draft web annotation data model for transcription data

create an ansible deploy script for qa for the flask prototype

Should be able to adapt pretty directly from the pemm ansible playbook

princeton-cdh / geniza Goto Github PK

geniza's Introduction

Princeton Geniza Project

License

geniza's People

Contributors

Stargazers

Watchers

Forkers

geniza's Issues

testing notes

testing notes

testing notes

dev notes

testing notes

Testing notes

dev note

testing notes

as a user

as a content editor

dev notes

testing notes

testing notes

as a user

as a content editor

dev notes

testing notes

dev notes

testing notes

testing notes

Testing Notes

dev notes

testing notes

testing notes

dev notes

testing notes

dev notes

Recommend Projects

Recommend Topics

Recommend Org

Jobs