nulib / donut Goto Github PK

View Code? Open in Web Editor NEW

5.0 15.0 0.0 6.59 MB

Digital Object Northwestern University Toolkit

Ruby 59.37% JavaScript 3.14% HTML 9.99% XSLT 5.92% Shell 0.02% Dockerfile 0.64% SCSS 20.92%

donut's Introduction

DONUT (Archived)

Note: the Donut project is no longer in use. Please visit Meadow to see our current repository.

Donut is a Hydra head based on Hyrax

Dependencies

Ruby >= 2.6
- you can use rbenv or rvm to install Ruby
Follow the Dev Environment Setup instructions
Docker (we're using docker for mac: https://www.docker.com/docker-mac)
Install devstack according to the instructions in the README
Geonames user registration
- The geonames_username key is defined in our shared configuration file.
Fits > 1.0.5 brew install fits
Vips brew install vips

Initial Setup

Clone the Donut GitHub repository
Install Bundler (version that's in the Gemfile.lock) if it's not installed already gem install bundler -v "~>2.0.1"]
Install dependencies: bundle install
Run devstack up donut in a separate tab to start dependency services
Run rake donut:seed to initialize the stack.
- Optional arguments to donut:seed (may be used in combination):
  - bundle exec rake donut:seed ADMIN_USER=[your NetID] ADMIN_EMAIL=[your email] to automatically add yourself an admin user
  - bundle exec rake donut:seed ADMIN_USER=[your NetID] ADMIN_EMAIL=[your email] SEED_FILE=[path to YAML file] to automatically add users and admin_sets. There is a sample seed file in spec/fixtures/files/test_seed.yml
Create a fake AWS profile:

$ aws --profile fake configure
# enter dummy values for "AWS Access Key ID" and "AWS Secret Access Key".
# Set the "Default region name" to "us-east-1", use default[None] for format

# add this to your .zshrc, .bashrc, etc.
export AWS_PROFILE=fake

Running the App

bundle exec rails s

Donut should be live at: https://devbox.library.northwestern.edu:3000/

Stopping the application

You can stop the Phoneix server with Ctrl + C

You can stop devstack by running devstack down. You local data (from the database, ldap, etc) will persist after devstack shuts down.

If you need to clear your data and reset the entire development environment, run devstack down -v

After initial setup, you don't need to run rake donut:seed... again unless you've run devstack down -v.

Set up an "NUL Collection" Collection Type

Donut only wants "NUL Collection" types to be public. In order to make these available to the front-end React app:

Go to Dashboard > Settings > Collection Type and add a "NUL Collection" collection type.
In config/settings/development.local.yml, add the gid of the "NUL Collection" collection type (or one you want to index in Elasticsearch). Ex: nul_collection_type: gid://nextgen/hyrax-collectiontype/3.
Re-start the Rails server

Note: Only Donut collections of the collection type "NUL Collection" will appear in the front-end application.

More detailed information on Collection/Indexing setup here: Collection Type Indexing

Running the Tests

Bring up the test stack in one window with:

$ devstack -t up donut

Run the SEED task for the test environment:

$ rake donut:seed RAILS_ENV=test

Run the test suite in another window:

$ rake spec

Or, you can run the test suite:

$ rake donut:ci

You can alternatively run rubocop and the specs independently with:

$ rake donut:ci:rubocop
$ rake donut:ci:rspec

Run the JavaScript tests

Run Jasmine server:

$ rake jasmine

Run all tests:

$ http://localhost:8888

Run the javascript test suite:

$ rake jasmine:ci

Running the Batch importer from the command line

Run the rake donut:seed or rake s3:setup rake task to create and populate the S3 bucket
Run the importer from the application root directory with the command:

$ bin/import_from_s3 dev-batch sample.csv

Seed Data

Run the batch importer with the seed-data.csv file to load 30 sample records (this will take some time)
Make sure you have first run bundle exec rake s3:setup to populate the s3 bucket
Then run:

$ bin/import_from_s3 dev-batch seed-data.csv

Running the tests for our new CSV importer work from hyrax

the active elastic job gem requires an environment variable to be set otherwise all the specs fail. so run this first:

$ export PROCESS_ACTIVE_ELASTIC_JOBS=true

Notes on the Docker stack

You can replace up with daemon in docker:dev:up and docker:test:up to run the Docker services in the background instead of in a separate tab. To stop the stack, use (for example) rake docker:dev:down.
The test stack always cleans up its data when it comes down. To clean the dev stack, use rake docker:dev:clean.

Adding an Admin user and assigning workflow roles

Run the development servers with rake docker:dev:up (or daemon) and rails s
Go to https://devbox.library.northwestern.edu:3000/ and login
To make the user who logged in an admin, run rake donut:add_admin_role ADMIN_USER[your NetID]
Go to https://devbox.library.northwestern.edu:3000/admin/workflow_roles and grant workflow roles if needed

donut's People

Contributors

Stargazers

Watchers

donut's Issues

Simplify minio setup

Right now donut requires minio running to mimic s3, have a bucket created, and then that bucket needs to be populated to test out our import feature. It requires a few files to be created in the users home directory, an environment variable or two to be set up, and the aws cli scripts have to be run manually every time minio goes up or down.

It's great that we can mimic s3 locally and help speed up dev without relying on outside sources, but it's become unwieldy and missing any one of those steps mean that your tests will fail or give you a false positive. I'm going to try and organize and automate this as much as possible

Add Exif-Tool Job To DONUT

refs: https://github.com/nulib/next-generation-repository/issues/352

Add the job to DONUT
write tests for the job using a (resized to smaller) fixture tiff
write functional test to check after_create hook on the image model and that the job fires as a perform_later

Write a Spec for Image TechMD

refs: https://github.com/nulib/next-generation-repository/issues/352

Possibly a shared spec? Something to exercise the model and ensure the attributes are set and solrized.

Fix deprecation warnings

This is just good practice and we'll thank ourselves later.

After Carrick's-update-to-the-latest-hyrax branch passes and is merged I'll start fixing the warnings

Allow Authority Driven Dropdown for CREATOR ROLE

Description

(Breakout of Issue https://github.com/nulib/next-generation-repository/issues/90)
-- for CREATOR ROLE
As a Collection Manager, I want to have authorities attached to certain fields and be able to grab them from a drop down menu (editing) so that I don't have to worry about editors putting in inconsistent information.

Here is an example of using the relator endpoint through our local questioning authority:
http://devbox.library.northwestern.edu/authorities/search/loc/relators?q=art

Done Looks Like

A dropdown for single authority added

Verify resources ingested by rake task have derivatives created

The Image worktype should have the default derivatives created, verify this occurs when using our new ingest path

derivatives created
update case handled correctly

Write Rake Task to Call our Ingest Code

Rake task should take a CSV and fire off the actual ingest for each row.

Remove model param from csv_importer

We don't need this feature and it's extra overhead in the test suite. Just specify the model in the CSV and that will work.

Check DONUT for redis-store security flaw

The redis-store dependency defined in Gemfile.lock has a known high severity security vulnerability in version range < 1.4.0 and should be updated.

This kicked in with Avalon, it may impact DONUT and DONUT might be a Gemfile update.

avalonmediasystem/avalon#2702 shows the fix in Avalon

Add two new properties for storing Exif specific data

refs: https://github.com/nulib/next-generation-repository/issues/415

We need to create two new properties for technical metadata to store the entire exif hash and the version of the exiftool.

Also document this so people can parse this hash later.

Exclude CreateWithRemoteFilesActor from rubocop

Right now we're overriding CreateWithRemoteFilesActor from Hyrax so we can exclude the area where it's encoding the URL one too many times, but it's also making rubocop upset.

Since it's not our file, we shouldn't really care if it's violating rubocop rules and it should be excluded from it's checks.

AWS worker and webapp file system permissions are wrong

For items in /bin , the execute bit isn't being set properly. Right now you'll have to ssh into the eb instances and change them manually, but we should get this fixed.

Investigate user key issue

Description

The user model is storing escaped email strings as ids, which seems to break things like deleting a user from a role. For example, first.last@northwestern is getting stored as the user key but trying to delete the user from role fails with a user key not found error looking for [email protected].

This might would be solved by storing the netid as the user key is User.rb by changing to

  def to_s
    username
  end

Done looks like:

the proper user key is used
users can be deleted from roles, etc. without not found errors

Review batch upload docs

Description

Review docs from #338 and get familiar with them.

Done looks like

review docs linked in #338

Create upstream pr from controlled vocabulary

Description

Determine what controlled vocab updates implemented locally would benefit or be appropriate for Hyrax core. At a minimum, see if Authority Select works with multiple items in Hyrax core, and fix that. Up for debate whether controlled vocab mixed with Authority Select is a generic enough use case for users outside of Northwestern.

Done Looks Like

Make a decision whether local controlled vocab / authority select updates are necessary for Hyrax core.
If yes, do the work.

Tasks

- Decide whether local controlled vocab / authority select updates are necessary for Hyrax core.
If so, update Hyrax and submit a PR.
Merge PR back into Donut and check everything still works (create a new ticket for this).

Meet to discuss s3 bucket pipeline

Description

do we need to have automated process for this>
What does the process look like (avalon inspired)
Do we have a UI or a magical manifest sensing
Review https://github.com/nulib/next-generation-repository/wiki/Batch-Ingestion-Workflow

Done looks like

workflow meeting takes place
document the s3 pipeline and workflow
~~Create a bucket!~~
~~Create Cyberduck script for any user to upload to bucket (for staging we can have one generic user)~~

get spec/factory specs working

part of #53

Remove deprecated defaultOperator and defaultSearchField solr configs

As per same fix in blacklight: projectblacklight/blacklight@b88b93a

Compare current validation with BFF spreadsheet and Image model required fields

Primarily for Berkeley at this point, given JSON validate it and determine if a resource can be created or not. If it cannot write error to log.

For MVP required validations are:

File is present in the S3 bucket
It has a title
It has a collection to put it in

Done Looks Like

Update validator to conform to Northwestern model.

Refactor route for omniauth callbacks

We're getting a deprecation warning: DEPRECATION WARNING: Using a dynamic :action segment in a route is deprecated and will be removed in Rails 5.2. (called from block (2 levels) in <top (required)> at /home/travis/build/nulib/donut/config/routes.rb:15)

here: https://github.com/nulib/donut/blob/deploy/staging/config/routes.rb#L15

We should refactor this sooner rather than later, but Carrick and I weren't sure what the new syntax was and didn't want to spend all day on it. I'm putting in this issue as a reminder that we'll need to change this before rails 5.2 is released (which is kind of soon)

Missed on first pass: AuthoritySelect field for CREATOR

Description

We missed this one in the initial round...

(Breakout of Issue https://github.com/nulib/next-generation-repository/issues/90)
-- for CREATOR

As a Collection Manager, I want to have authorities attached to certain fields and be able to grab them from a drop down menu (editing) so that I don't have to worry about editors putting in inconsistent information.

Spreadsheet: https://docs.google.com/spreadsheets/d/1F35hLSD11a1mf9UTXvgAc7xKXkAixOaBwVvzYQulnkc/edit#gid=396400352

Done Looks Like

An AuthoritySelect dropdown plus autocomplete is added.

Thumbnails not showing up

Here's a good example: http://donut.repo.rdc-staging.library.northwestern.edu/concern/images/cee2e75c-1d2e-4551-a46b-661878aa9b5d?locale=en#?c=0&m=0&s=0&cv=0&xywh=-783%2C-58%2C2588%2C1137

The images show up in the universal viewer, but there aren't any representative images showing up on the #show page.

I'm seeing this error in the logs:

I, [2018-01-23T18:23:09.387540 #26275]  INFO -- : [239a4161-c3be-4b85-a51b-0e01a357da65] Started POST "/" for 127.0.0.1 at 2018-01-23 18:23:09 +0000
D, [2018-01-23T18:23:09.427695 #26275] DEBUG -- : [239a4161-c3be-4b85-a51b-0e01a357da65]   Load LDP (21.5ms) http://fcrepo.repo.vpc.rdc-staging.library.northwestern.edu/rest/bb/52/0f/f8/bb520ff8-9d94-47db-9107-cd0b275b9ad0 Service: 47398585891020
D, [2018-01-23T18:23:09.493206 #26275] DEBUG -- : [239a4161-c3be-4b85-a51b-0e01a357da65]   Hyrax::Operation Load (1.8ms)  SELECT  "curation_concerns_operations".* FROM "curation_concerns_operations" WHERE "curation_concerns_operations"."id" = $1 LIMIT $2  [["id", 167], ["LIMIT", 1]]
F, [2018-01-23T18:23:09.495636 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65]
F, [2018-01-23T18:23:09.496104 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65] ActiveRecord::RecordNotFound (Couldn't find Hyrax::Operation with 'id'=167):
F, [2018-01-23T18:23:09.496198 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65]
F, [2018-01-23T18:23:09.496325 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65] activerecord (5.1.4) lib/active_record/relation/finder_methods.rb:343:in `raise_record_not_found_exception!'

which is weird, because i can pull up that record in the rails console. Maybe it's a race condition or something?

Anyway i'm looking into this now

Fix TechMD Predicates For EXIF

See the exif.rb model, fix the two todos for the ns ones

FITS issues on donut workers

I was just testing our import from s3 job on AWS and the jobs are failing on fits:

E, [2018-01-18T17:50:40.060580 #26161] ERROR -- : [fee2029b-36e4-4401-abea-770862632455] [ActiveJob] [CharacterizeJob] [576bc9c2-67f1-4be4-aa0f-dc9a8b88ee92] Error performing CharacterizeJob (Job ID: 576bc9c2-67f1-4be4-aa0f-dc9a8b88ee92) from BetterActiveElasticJob(default) in 140.21ms: RuntimeError (Unable to execute command "/usr/local/fits-1.0.5/fits.sh -i "/tmp/d20180118-26161-5qzrty/coffee.jpg""
Picked up JAVA_TOOL_OPTIONS: -Xmx128m
Error: Could not find or load main class edu.harvard.hul.ois.fits.Fits
):
/opt/rubies/ruby-2.4.2/lib/ruby/gems/2.4.0/gems/hydra-file_characterization-0.3.3/lib/hydra/file_characterization/characterizer.rb:51:in `internal_call'

Write Code to Take JSON Representation of Object and Fire CreateWorkJob

Given JSON read in from a CSV (#37) and validated (#38), fire the CreateWorkJob

get rid of factory_girl deprecation warnings

There's always a lot of noise in the specs and this is a pretty easy fix

Figure out a way to avoid env checking for S3 urls

Description

URL encoding is handled differently between Minio and S3. This is being handled by checking the Rails environment now, but that is not ideal.

Done looks like:

Conditional logic removed from Importer::Factory::ObjectFactory for Rails environment.

Update README for aws-cli and minio local configuration

Demo Rake Task Powered Ingest of a CSV

Ingests using CSV populated with Berkeley Metadata
Records display in DONUT
Records in DONUT have metadata
Records in DONUT have derivatives
Records in DONUT are owned by the nul-ingest user
Errors are logged into the environment (development, test, production) log file

Verify Various Failure States Log Errors

Description

When derivatives are created successfully, make it fail and follow the failure through the logs to verify we're logging. Beside #38 where an error is written if the metadata is invalid, ensure all the other edge cases write out errors, namely:

Fedora timeouts
Derivative Failures
File not found on S3
Error opening/reading CSV

Write Code for Reading CSV and creating JSON representation of the resource

Read the CSV and parse each row as JSON, pass JSON to validator

Clean start with hyrax 3

Description

Since donut was started on hyrax before 1.0 was released (i think) there might be generated views, configs, controllers, etc that were applicable at the time of they were run, but have been refactored away or are no longer needed or any other number of things.

Carrick and I were talking about starting fresh with Hyrax 2 and bringing over our customizations and configs from donut, but we think a more appropriate time to do that will be when Hyrax 3 is released, since that'll be valkyrie based and will be significantly different than hyrax 2 anyway.

So once Hyrax 3 is released and we're ready to transition Donut to it, we should start a new rails project, run all the updated generators, and then carefully bring over our customizations and configs and refactor where needed.

Fix derivative creation for import_from_s3 import script

When running our new import_from_s3 script, records are being imported and show up in donut but no file derivatives are showing up. We should see the coffee and library thumbnails but we're just getting the placeholder thumbnails instead.

My guess is that this has something to do with pulling the binaries from s3 to create derivatives and making sure we're hitting the remote_files part of the actor stack

Get travis build running with Minio

Investigate how other samvera apps deal with administrative metadata

Get csv_parser_spec working

Related to #53

Deal with Admin Sets in batches

Description

Per our workflow, a work needs to be in one and exactly one admin set, our spreadsheet batch ingestion needs to have an admin set column that takes an admin set ID.

Done looks like

Column added to batch spreadsheet that requires admin set ID
Validation takes place that ensures admin sets are there.

Trigger ingest from S3 add/update

When an ingest manifest spreadsheet is added to the correct S3 bucket, trigger ingest via the queue.

Jobify the existing command line app
Rewrite the command line app to run through the job class
Set up the S3 notification and queueing
Carrick will test out batch import

Ensure that Berkeley Spreadsheets are supported by current images model

Spreadsheet example

Predicate and PCDM

Ensure that all the columns that need to be ingested (note not all columns need to be ingested, such as BE and BF) map to something in the Images resource for DONUT.

Spawn tickets to expand resource as needed. The tickets can be done in a later phase.

Upgrade to collections extensions branch

Description

Upgrade CE

Done looks like

#174 merged
docs on collection extension read
Tested by jen, laura, david

figure out why derivatives aren't working on AWS (for batch upload)

batch uploads are creating derivative images locally, but aren't when we run it on staging. Look into why! We know this was working before, the Import URLs were being double encoded in a way that was easy to fix. We had it working using Minio in local dev environments.

Checked workers, they're running
Checked app for errors
We have to investigate where the double encoding is hyrax and fix it upstream.
Test with Bespoke Fedora (maybe it was simultaneous writes
Create more verbose log to dig into

Bring DONUT up to Hyrax Master Before Starting Ingest Work

Bundle update our DONUT and see what happens

get hyku importer specs passing in donut

The specs from hyku run in donut successfully now, but not all of them are passing yet. We should get them all green (we may have to modify some of the specs because we aren't going to be using filesystem based ingestion)

csv_importer_spec
csv_parser_spec
image_factory_spec
string_literal_processor_spec

Add Ruby 2.5 to the Travis Build Matrix

Files not present for derivative generation

We're getting this error message when trying to create derivatives on AWS

Errno::ENOENT (No such file or directory @ rb_sysopen - /var/donut-temp/hyrax/uploaded_file/file/34/<filename>.jpg)

on our EB worker instance, the /var/donut-temp folder exists but there are no subfolders under it.

So the file from s3 isn't being copied over to a temp folder and no derivatives are being created. We need to figure out why and where it's happening

get hyku importer specs running for donut

Hyku's importer code has associated rspec tests, we should get them running in donut to validate our work

Add TechMD Properties to DONUT

From the spreadsheet in https://github.com/nulib/next-generation-repository/issues/352

Add these properties following the pattern in: #158

Create Job That Deletes Masterfiles in the pending bucket after ingest success

Once CreateWorkJob has successfully ingested a resource, CreateWorkJob should enqueue a cleanup job for that masterfile. This could be done via hooks or by calling out to super for CreateWorkJob and then adding in desired code.

This job should delete the file from the pending bucket (#35)

Done looks like

job is written that cleans up after a successful ingest.

Find a way to mimic S3 on Dev and Test (Travis)

So we don't need code for using FileUtils on dev and test environments but the aws-sdk on the production environment (which also means the aws code is never tested by ci).

Right now we're going to start with minio:

https://jacky.wtf/weblog/arc-minio/

Host fits zip in an S3 bucket

Relates to #89

We probably shouldn't rely on Harvard for hosting this zip file since it stopped working for us last week.

Done looks like:

Fits zip file uploaded to an S3 bucket, and .ebextensions/01_packages.config updated to point to our hosted version.

nulib / donut Goto Github PK

donut's Introduction

DONUT (Archived)

Dependencies

Initial Setup

Running the App

Stopping the application

Set up an "NUL Collection" Collection Type

Running the Tests

Run the JavaScript tests

Running the Batch importer from the command line

Seed Data

Running the tests for our new CSV importer work from hyrax

Notes on the Docker stack

Adding an Admin user and assigning workflow roles

donut's People

Contributors

Stargazers

Watchers

donut's Issues

Description

Done Looks Like

Description

Done looks like:

Description

Done looks like

Description

Done Looks Like

Tasks

Description

Done looks like

Done Looks Like

Description

Done Looks Like

Description

Done looks like:

Description

Description

Description

Done looks like

Description

Done looks like

Done looks like

Recommend Projects

Recommend Topics

Recommend Org

Jobs