GithubHelp home page GithubHelp logo

nulib / donut Goto Github PK

View Code? Open in Web Editor NEW
5.0 15.0 0.0 6.59 MB

Digital Object Northwestern University Toolkit

Ruby 59.37% JavaScript 3.14% HTML 9.99% XSLT 5.92% Shell 0.02% Dockerfile 0.64% SCSS 20.92%

donut's Introduction

DONUT (Archived)

  • Note: the Donut project is no longer in use. Please visit Meadow to see our current repository.

Donut is a Hydra head based on Hyrax

Build Status

Dependencies

Initial Setup

  • Clone the Donut GitHub repository

  • Install Bundler (version that's in the Gemfile.lock) if it's not installed already gem install bundler -v "~>2.0.1"]

  • Install dependencies: bundle install

  • Run devstack up donut in a separate tab to start dependency services

  • Run rake donut:seed to initialize the stack.

    • Optional arguments to donut:seed (may be used in combination):
      • bundle exec rake donut:seed ADMIN_USER=[your NetID] ADMIN_EMAIL=[your email] to automatically add yourself an admin user
      • bundle exec rake donut:seed ADMIN_USER=[your NetID] ADMIN_EMAIL=[your email] SEED_FILE=[path to YAML file] to automatically add users and admin_sets. There is a sample seed file in spec/fixtures/files/test_seed.yml
  • Create a fake AWS profile:

$ aws --profile fake configure
# enter dummy values for "AWS Access Key ID" and "AWS Secret Access Key".
# Set the "Default region name" to "us-east-1", use default[None] for format

# add this to your .zshrc, .bashrc, etc.
export AWS_PROFILE=fake

Running the App

bundle exec rails s

Donut should be live at: https://devbox.library.northwestern.edu:3000/

Stopping the application

You can stop the Phoneix server with Ctrl + C

You can stop devstack by running devstack down. You local data (from the database, ldap, etc) will persist after devstack shuts down.

If you need to clear your data and reset the entire development environment, run devstack down -v

After initial setup, you don't need to run rake donut:seed... again unless you've run devstack down -v.

Read more about Devstack commands here.

Set up an "NUL Collection" Collection Type

Donut only wants "NUL Collection" types to be public. In order to make these available to the front-end React app:

  1. Go to Dashboard > Settings > Collection Type and add a "NUL Collection" collection type.
  2. In config/settings/development.local.yml, add the gid of the "NUL Collection" collection type (or one you want to index in Elasticsearch). Ex: nul_collection_type: gid://nextgen/hyrax-collectiontype/3.
  3. Re-start the Rails server

Note: Only Donut collections of the collection type "NUL Collection" will appear in the front-end application.

More detailed information on Collection/Indexing setup here: Collection Type Indexing

Running the Tests

Bring up the test stack in one window with:

$ devstack -t up donut

Run the SEED task for the test environment:

$ rake donut:seed RAILS_ENV=test

Run the test suite in another window:

$ rake spec

Or, you can run the test suite:

$ rake donut:ci

You can alternatively run rubocop and the specs independently with:

$ rake donut:ci:rubocop
$ rake donut:ci:rspec

Run the JavaScript tests

Run Jasmine server:

$ rake jasmine

Run all tests:

$ http://localhost:8888

Run the javascript test suite:

$ rake jasmine:ci

Running the Batch importer from the command line

  • Run the rake donut:seed or rake s3:setup rake task to create and populate the S3 bucket
  • Run the importer from the application root directory with the command:
$ bin/import_from_s3 dev-batch sample.csv

Seed Data

  • Run the batch importer with the seed-data.csv file to load 30 sample records (this will take some time)
  • Make sure you have first run bundle exec rake s3:setup to populate the s3 bucket
  • Then run:
$ bin/import_from_s3 dev-batch seed-data.csv

Running the tests for our new CSV importer work from hyrax

the active elastic job gem requires an environment variable to be set otherwise all the specs fail. so run this first:

$ export PROCESS_ACTIVE_ELASTIC_JOBS=true

Notes on the Docker stack

  • You can replace up with daemon in docker:dev:up and docker:test:up to run the Docker services in the background instead of in a separate tab. To stop the stack, use (for example) rake docker:dev:down.
  • The test stack always cleans up its data when it comes down. To clean the dev stack, use rake docker:dev:clean.

Adding an Admin user and assigning workflow roles

  1. Run the development servers with rake docker:dev:up (or daemon) and rails s
  2. Go to https://devbox.library.northwestern.edu:3000/ and login
  3. To make the user who logged in an admin, run rake donut:add_admin_role ADMIN_USER[your NetID]
  4. Go to https://devbox.library.northwestern.edu:3000/admin/workflow_roles and grant workflow roles if needed

donut's People

Contributors

adamjarling avatar bmquinn avatar carrickr avatar csyversen avatar davidschober avatar dependabot-preview[bot] avatar kdid avatar mbklein avatar toputnal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

donut's Issues

Simplify minio setup

Right now donut requires minio running to mimic s3, have a bucket created, and then that bucket needs to be populated to test out our import feature. It requires a few files to be created in the users home directory, an environment variable or two to be set up, and the aws cli scripts have to be run manually every time minio goes up or down.

It's great that we can mimic s3 locally and help speed up dev without relying on outside sources, but it's become unwieldy and missing any one of those steps mean that your tests will fail or give you a false positive. I'm going to try and organize and automate this as much as possible

Fix deprecation warnings

This is just good practice and we'll thank ourselves later.

After Carrick's-update-to-the-latest-hyrax branch passes and is merged I'll start fixing the warnings

Allow Authority Driven Dropdown for CREATOR ROLE

Description

(Breakout of Issue https://github.com/nulib/next-generation-repository/issues/90)
-- for CREATOR ROLE
As a Collection Manager, I want to have authorities attached to certain fields and be able to grab them from a drop down menu (editing) so that I don't have to worry about editors putting in inconsistent information.

Here is an example of using the relator endpoint through our local questioning authority:
http://devbox.library.northwestern.edu/authorities/search/loc/relators?q=art

Done Looks Like

  • A dropdown for single authority added

Exclude CreateWithRemoteFilesActor from rubocop

Right now we're overriding CreateWithRemoteFilesActor from Hyrax so we can exclude the area where it's encoding the URL one too many times, but it's also making rubocop upset.

Since it's not our file, we shouldn't really care if it's violating rubocop rules and it should be excluded from it's checks.

Investigate user key issue

Description

The user model is storing escaped email strings as ids, which seems to break things like deleting a user from a role. For example, first.last@northwestern is getting stored as the user key but trying to delete the user from role fails with a user key not found error looking for [email protected].

This might would be solved by storing the netid as the user key is User.rb by changing to

  def to_s
    username
  end

Done looks like:

  • the proper user key is used
  • users can be deleted from roles, etc. without not found errors

Create upstream pr from controlled vocabulary

Description

Determine what controlled vocab updates implemented locally would benefit or be appropriate for Hyrax core. At a minimum, see if Authority Select works with multiple items in Hyrax core, and fix that. Up for debate whether controlled vocab mixed with Authority Select is a generic enough use case for users outside of Northwestern.

Done Looks Like

  • Make a decision whether local controlled vocab / authority select updates are necessary for Hyrax core.
  • If yes, do the work.

Tasks

  • - Decide whether local controlled vocab / authority select updates are necessary for Hyrax core.
  • If so, update Hyrax and submit a PR.
  • Merge PR back into Donut and check everything still works (create a new ticket for this).

Compare current validation with BFF spreadsheet and Image model required fields

Primarily for Berkeley at this point, given JSON validate it and determine if a resource can be created or not. If it cannot write error to log.

For MVP required validations are:

  • File is present in the S3 bucket
  • It has a title
  • It has a collection to put it in

Done Looks Like

  • Update validator to conform to Northwestern model.

Refactor route for omniauth callbacks

We're getting a deprecation warning: DEPRECATION WARNING: Using a dynamic :action segment in a route is deprecated and will be removed in Rails 5.2. (called from block (2 levels) in <top (required)> at /home/travis/build/nulib/donut/config/routes.rb:15)

here: https://github.com/nulib/donut/blob/deploy/staging/config/routes.rb#L15

We should refactor this sooner rather than later, but Carrick and I weren't sure what the new syntax was and didn't want to spend all day on it. I'm putting in this issue as a reminder that we'll need to change this before rails 5.2 is released (which is kind of soon)

Missed on first pass: AuthoritySelect field for CREATOR

Description

We missed this one in the initial round...

(Breakout of Issue https://github.com/nulib/next-generation-repository/issues/90)
-- for CREATOR

As a Collection Manager, I want to have authorities attached to certain fields and be able to grab them from a drop down menu (editing) so that I don't have to worry about editors putting in inconsistent information.

Spreadsheet: https://docs.google.com/spreadsheets/d/1F35hLSD11a1mf9UTXvgAc7xKXkAixOaBwVvzYQulnkc/edit#gid=396400352

Done Looks Like

  • An AuthoritySelect dropdown plus autocomplete is added.

Thumbnails not showing up

Here's a good example: http://donut.repo.rdc-staging.library.northwestern.edu/concern/images/cee2e75c-1d2e-4551-a46b-661878aa9b5d?locale=en#?c=0&m=0&s=0&cv=0&xywh=-783%2C-58%2C2588%2C1137

The images show up in the universal viewer, but there aren't any representative images showing up on the #show page.

I'm seeing this error in the logs:

I, [2018-01-23T18:23:09.387540 #26275]  INFO -- : [239a4161-c3be-4b85-a51b-0e01a357da65] Started POST "/" for 127.0.0.1 at 2018-01-23 18:23:09 +0000
D, [2018-01-23T18:23:09.427695 #26275] DEBUG -- : [239a4161-c3be-4b85-a51b-0e01a357da65]   Load LDP (21.5ms) http://fcrepo.repo.vpc.rdc-staging.library.northwestern.edu/rest/bb/52/0f/f8/bb520ff8-9d94-47db-9107-cd0b275b9ad0 Service: 47398585891020
D, [2018-01-23T18:23:09.493206 #26275] DEBUG -- : [239a4161-c3be-4b85-a51b-0e01a357da65]   Hyrax::Operation Load (1.8ms)  SELECT  "curation_concerns_operations".* FROM "curation_concerns_operations" WHERE "curation_concerns_operations"."id" = $1 LIMIT $2  [["id", 167], ["LIMIT", 1]]
F, [2018-01-23T18:23:09.495636 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65]
F, [2018-01-23T18:23:09.496104 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65] ActiveRecord::RecordNotFound (Couldn't find Hyrax::Operation with 'id'=167):
F, [2018-01-23T18:23:09.496198 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65]
F, [2018-01-23T18:23:09.496325 #26275] FATAL -- : [239a4161-c3be-4b85-a51b-0e01a357da65] activerecord (5.1.4) lib/active_record/relation/finder_methods.rb:343:in `raise_record_not_found_exception!'

which is weird, because i can pull up that record in the rails console. Maybe it's a race condition or something?

Anyway i'm looking into this now

FITS issues on donut workers

I was just testing our import from s3 job on AWS and the jobs are failing on fits:

E, [2018-01-18T17:50:40.060580 #26161] ERROR -- : [fee2029b-36e4-4401-abea-770862632455] [ActiveJob] [CharacterizeJob] [576bc9c2-67f1-4be4-aa0f-dc9a8b88ee92] Error performing CharacterizeJob (Job ID: 576bc9c2-67f1-4be4-aa0f-dc9a8b88ee92) from BetterActiveElasticJob(default) in 140.21ms: RuntimeError (Unable to execute command "/usr/local/fits-1.0.5/fits.sh -i "/tmp/d20180118-26161-5qzrty/coffee.jpg""
Picked up JAVA_TOOL_OPTIONS: -Xmx128m
Error: Could not find or load main class edu.harvard.hul.ois.fits.Fits
):
/opt/rubies/ruby-2.4.2/lib/ruby/gems/2.4.0/gems/hydra-file_characterization-0.3.3/lib/hydra/file_characterization/characterizer.rb:51:in `internal_call'

Figure out a way to avoid env checking for S3 urls

Description

URL encoding is handled differently between Minio and S3. This is being handled by checking the Rails environment now, but that is not ideal.

Done looks like:

  • Conditional logic removed from Importer::Factory::ObjectFactory for Rails environment.

Demo Rake Task Powered Ingest of a CSV

  • Ingests using CSV populated with Berkeley Metadata
  • Records display in DONUT
  • Records in DONUT have metadata
  • Records in DONUT have derivatives
  • Records in DONUT are owned by the nul-ingest user
  • Errors are logged into the environment (development, test, production) log file

Verify Various Failure States Log Errors

Description

When derivatives are created successfully, make it fail and follow the failure through the logs to verify we're logging. Beside #38 where an error is written if the metadata is invalid, ensure all the other edge cases write out errors, namely:

  • Fedora timeouts
  • Derivative Failures
  • File not found on S3
  • Error opening/reading CSV

Clean start with hyrax 3

Description

Since donut was started on hyrax before 1.0 was released (i think) there might be generated views, configs, controllers, etc that were applicable at the time of they were run, but have been refactored away or are no longer needed or any other number of things.

Carrick and I were talking about starting fresh with Hyrax 2 and bringing over our customizations and configs from donut, but we think a more appropriate time to do that will be when Hyrax 3 is released, since that'll be valkyrie based and will be significantly different than hyrax 2 anyway.

So once Hyrax 3 is released and we're ready to transition Donut to it, we should start a new rails project, run all the updated generators, and then carefully bring over our customizations and configs and refactor where needed.

Fix derivative creation for import_from_s3 import script

When running our new import_from_s3 script, records are being imported and show up in donut but no file derivatives are showing up. We should see the coffee and library thumbnails but we're just getting the placeholder thumbnails instead.

My guess is that this has something to do with pulling the binaries from s3 to create derivatives and making sure we're hitting the remote_files part of the actor stack

Deal with Admin Sets in batches

Description

Per our workflow, a work needs to be in one and exactly one admin set, our spreadsheet batch ingestion needs to have an admin set column that takes an admin set ID.

Done looks like

  • Column added to batch spreadsheet that requires admin set ID
  • Validation takes place that ensures admin sets are there.

Trigger ingest from S3 add/update

When an ingest manifest spreadsheet is added to the correct S3 bucket, trigger ingest via the queue.

  • Jobify the existing command line app
  • Rewrite the command line app to run through the job class
  • Set up the S3 notification and queueing
  • Carrick will test out batch import

figure out why derivatives aren't working on AWS (for batch upload)

batch uploads are creating derivative images locally, but aren't when we run it on staging. Look into why! We know this was working before, the Import URLs were being double encoded in a way that was easy to fix. We had it working using Minio in local dev environments.

  • Checked workers, they're running
  • Checked app for errors
  • We have to investigate where the double encoding is hyrax and fix it upstream.
  • Test with Bespoke Fedora (maybe it was simultaneous writes
  • Create more verbose log to dig into

get hyku importer specs passing in donut

The specs from hyku run in donut successfully now, but not all of them are passing yet. We should get them all green (we may have to modify some of the specs because we aren't going to be using filesystem based ingestion)

  • csv_importer_spec
  • csv_parser_spec
  • image_factory_spec
  • string_literal_processor_spec

Files not present for derivative generation

We're getting this error message when trying to create derivatives on AWS

Errno::ENOENT (No such file or directory @ rb_sysopen - /var/donut-temp/hyrax/uploaded_file/file/34/<filename>.jpg)

on our EB worker instance, the /var/donut-temp folder exists but there are no subfolders under it.

So the file from s3 isn't being copied over to a temp folder and no derivatives are being created. We need to figure out why and where it's happening

Create Job That Deletes Masterfiles in the pending bucket after ingest success

Once CreateWorkJob has successfully ingested a resource, CreateWorkJob should enqueue a cleanup job for that masterfile. This could be done via hooks or by calling out to super for CreateWorkJob and then adding in desired code.

This job should delete the file from the pending bucket (#35)

Done looks like

  • job is written that cleans up after a successful ingest.

Host fits zip in an S3 bucket

Relates to #89

We probably shouldn't rely on Harvard for hosting this zip file since it stopped working for us last week.

Done looks like:

Fits zip file uploaded to an S3 bucket, and .ebextensions/01_packages.config updated to point to our hosted version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.