datadryad / dryad-product-roadmap Goto Github PK

Repository of issues for Dryad project boards

Home Page: https://github.com/orgs/datadryad/projects

dryad-product-roadmap's Introduction

dryad-product-roadmap

This repository is used to support the product development roadmap.
Work will be defined via the issue tracker and tracked via Github projects

dryad-product-roadmap's People

Contributors

Stargazers

Watchers

dryad-product-roadmap's Issues

Update CI tools for new GitHub structure

(changed description) We need to get Travis.ci working with both the dryad and the stash repositories in github to run tests on those repos.

Missing Datasets? Match emails with user record (from My Datasets)

After logging in and if no datasets we'd have a notification like in this wireframe (forget about the GDPR on that screen for now).

https://jxosix.axshare.com/#g=1&p=my_datasets_-_v1

Then they 'd go through this validation routine.

https://jxosix.axshare.com/#g=1&p=migrate_data_-_v1

I believe what this would switch the current_user to the matching email record and update any information in that user record and remove discarded user record.

Put together instructions on how to get tests to run locally

It's great that we can commit and see results on Travis.ci, but for writing tests and being sure they work, that long lag time is really unacceptable.

Tests are working locally on my machine, but I need good instructions for getting them set up for others so they can run them easily, also.

Request updates to our ORCID callbacks for new development/stage servers

I requested ORCID add these login callbacks to our current dash (sandbox login) application

https://dryad-dev.cdlib.org/stash/auth/orcid/callback
https://dryad-stg.cdlib.org/stash/auth/orcid/callback
http://daisiedash.datadryad.org/stash/auth/orcid/callback

UI & SOLR/Harvester Servers for Dryad/Dash

We need servers set up and they may involve a number of things. Development environment will be most time-critical to get into a basic useful state.

Things installed on them from puppet scripts, which might include the following
Shibboleth Service Provider
Certificates for ssl/https
Apache/configs
Ruby
Some basic Linux libraries like mysql and a few other things (I hope these are in puppet)
Crons
Any external shibboleth shenanigans to get logins working for the servers such as registering the domains
I'm sure there is more, Marisa and Jim, help me fill out this list

Set up Ryan dev environment

Get a minimal version of Dash working on Ryan's dev machine.
Document the development/deploy process.

Complete development environment setup

The development environment has running UI code, but it needs some additional things done to fully function.

@jimvanderveen is finishing our development SOLR/Harvester server setup and puppet deployment
Install harvester
Install SOLR
Configure SOLR with geoblacklight configuration
Get and configure Merritt collection for the generic Dryad tenant

Mapping of old Dryad metadata to new Dryad database

See https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

Create draft metadata mapping
Indicate what is a new database field, and what is a new UI field
get feedback from curators

Finishing touches for migrate button

Formatting of the UI
Error messages for wrong code
Add "No" button for no migration
If "No" or migration successful, make message disappear from dashboard.
Add option to send code to new email

Remove developer login

Remove developer login cases from callbacks since we do not need them unless we have trouble getting ORCID to enable localhost for the dev/sandbox login
We can manually set an associated partner in the database for an ORCID if we want to test per-institution functionality like logos, specific identifier minting or submission to specific Merritt collection
I also added some patches to the default developer login layout and functioning that we can remove

Migration for user accounts in the database

Perform the database-side user migration.

Determine whether old user accounts should go into the main (stash users) table, or into a completely new table. What info do we need to store in this table? (When this is done, notify team so they can build to the new user info).
Modify the tables appropriately (i.e., make a rails migration to update the schema)
Determine whether user accounts should be migrated via SQL import/export commands, or via some more rails-oriented method
Implement migration of the user account records (though these won't do anything until the UI account migration is enabled)

Add new fields to database for migration

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

We want to keep some Dryad data that may or may not be actively used in the new system

table stash_engine_internal_data
file-level DOI
antivirus tags
file-level description

GitHub/Slack integration

It is useful to be notified of GitHub activity through Slack.

Create a new (shared) slack channel for notifications and subscribe the appropriate people
Set up GitHub notifications from all appropriate repositories to the notification channel
Test whether checklist changes are added to Slack

File-level ReadMe transformation and filename munging

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

Some Dryad data has readme files on the file level. It sounds like we want to munge a related readme filename and add it to the file list and send it somehow.

code to create some kind of munged filename to correspond with filename being described
Do not recreate the exact identical file more than necessary with versioning

shibboleth set up on dryad-dev.cdlib.org

Configuration of shibboleth to work with dryad-dev.cdlib.org end point.

Remove "Connect your ORCID" in metadata entry

Remove "connect your ORCID" icon from the metadata entry page since they will already by logged in with orcid.

Also remove anything in the callbacks for this.

Note that we still need a callback for co-author orcids and people that receive an invitation to connect an orcid.

Create basic login screen for Dryad

Points: 1

Create new UI Library layout if it is significantly different than existing layouts we have
See https://jxosix.axshare.com/#g=1&p=login_-_v2

Email claim mechanics

I believe this might be how we take care of this.

We insert user records already connected to their resources/datasets, but missing ORCID when we do real data migration.
After user successfully claims their email, we update all resources that had the user id number corresponding with the claimed user. They are updated to their new user id number that has been claimed.
Do we delete the old, claimed user?
Other tables to update user_ids from old to new: stash_engine_resource_states, stash_engine_resources.current_editor_id
Do we need to update tenant_id in stash_engine_resources? Or maybe updated correctly at main data migration for any Dryad and DataONE records.

Validating required metadata through API

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

We need to be able to validate metadata besides through the review screen or attempting to submit a dataset

There is existing code to validate correct descriptive metadata we can hook in
More deep validation may be required to be sure all required internal tables are populated?
We should consider adding another mechanism, e.g. by checking for the presence of PATCH in the Allow: header on the /datasets/doi URL, or maybe something more sophisticated

Move API documentation to same space as the rest of docs

Currently it is at https://github.com/CDL-Dryad/stash/blob/master/stash_api/basic_submission.md but it should go with the rest.

We also have OpenAPI documentation at https://dash.ucop.edu/api/docs/ .

Migration integration, trial and validation in throwaway environment

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

I don't know exactly what this will look like but it can start with a small subset of datasets to test the process, most likely.

on meaningful subset of items that represent some a variety of cases
someplace like development (or other server)
test full process
validate that basics work

We'll probably need a fuller process after that.

API additional actions or information for Dryad

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

The API needs to have some additional things for the Dryad migration.

Way to validate dataset without submitting
- There is existing code to validate correct descriptive metadata we can hook in
- More deep validation may be required to be sure all required internal tables are populated?
- We should consider adding another mechanism, e.g. by checking for the presence of PATCH in the Allow: header on the /datasets/doi URL, or maybe something more sophisticated
Submit a specific version instead of only allowing one version in progress

I may need to talk through more about what the needs are here.

Look at Dryad usage in SOLR and compare to CoP

What would it take for old Dryad to crunch against CoP so that we can transfer these stats to new Dryad system? Scott to talk with Ryan.

Add missing # rubocop:enable comments, and fix any resulting style issues

(description copied from CDLUC3/stash#2 by @dmolesUC3)

When I first set up RuboCop, I assumed # rubocop:disable was block scoped, but it turns out it's meant to be paired with a matching # rubocop:enable. Without that, it means "disable till further notice".

We've put # rubocop:enable in a few places (e.g. Resource::duplicate_filenames), but there are still a lot of orphan disables out there. At some point we should clean these up, and fix any style issues we discover in code where checks were unintentionally disabled.

Login overview placeholder

Jane has created this flow and these wireframes for us. I believe the email flow is essentially OK, though the login flow wasn't complete for all the things we talked about. I'm sure we were both confused about this and could've used more definition.

We talked through a simplified login flow on Friday among our team: login_v2.pdf The main point of this flow is that everyone will log in with ORCID. The possible additional steps on an initial login will then be 1) associate that you're at a partner institution by logging in by Shibboleth and 2) claim/migrate your account if you were a previous Dryad user by way of something similar to a 2-factor authentication flow by getting a code to validate your email.

Seems like there is also a feature I'd forgotten about in there which is a GPDR acceptance? Also "missing data" thing for users on the list of datasets page.

Add privacy policy and terms of service to login process

When logging in for the first time, if the user does not have a previous Dryad account, they should be presented with information about the Dryad's Privacy Policy and Terms of Service, and asked to accept the Terms of Service.

The location within the login process should be after the Orcid login (and after institutional login). It could be included within the data migration page, but it may be better to present it as a separate page.

The current wording from old Dryad can be re-used:

The links should go to:

Upgrade to Rails 5.2 or later

We'd like to be on a rails that is getting more than just security fixes before we're even more obsolete.

Broken Unit Tests

Unit tests are going to start breaking in a number of areas

Because of config changes
Perhaps in users model
Some unexpected places
Adding new tests for added methods in new or existing models

Support Dryad DataCite Acct for new DOIs

It sounds like we should get the account from them and configure it to be used by the new Dryad code.

Also we still need to support old EZID stuff for previously submitted datasets?

Select your partner institution (or none)

Jane has https://jxosix.axshare.com/

There are two main options. Non-partner just sets the tenant to be Dryad.

Choosing a partner does a secondary (shibboleth) login and then comes back to set the tenant to be more specific after login is validated.

The Dryad option is already done.

Once shibboleth is working it should call back to the sessions controller as shown at https://github.com/CDL-Dryad/stash/blob/master/stash_engine/app/controllers/stash_engine/sessions_controller.rb .

It will be the callback method that it will go back to. A tenant_id parameter should be pass along in the URL.

Once the user has successfully validated their shibboleth then their tenant_id should be written to that field in their user record. (StashEngine::User.tenant_id)

So that could be done before redirecting them to the dashboard path.

Once a user's tenant has been set after logging it, it does not need to be set again on subsequent logins.

Check deploy and fix deploy from new repository location on github

We have an almost working deploy on uc3-dryad-dev.cdlib.org but are running into issues.

IMPORTANT, there is a branch for this stuff that I don't want to merge into the master branch until the deploy works. It has Capistrano changes and other stuff (and I set up the server like at https://confluence.ucop.edu/display/UC3/dryad_server_setup_steps ).

The branch you want is here: https://github.com/CDL-Dryad/dryad/tree/deploy_test and it's called deploy test. You might need a working application to deploy with Capistrano (maybe). I think this gives the basic steps. https://github.com/CDL-Dryad/dryad/blob/master/documentation/dryad_install.md Our private config repo is at https://github.com/cdlib/dryad-config.

After tall that fun, you should be able to type 'cap development deploy' and it will deploy to our new dryad-dev.cdlib.org server.

Right now it fails because of missing libraries on the server when it tries to compile a native-code gem. Jim needs to install it.

We hope after that then it will deploy and run and you could log in or try to log in to the application. Or maybe we'll have to fix shibboleth problems. Yay.

Do not update DOI in some circumstances

Set up way to skip updating the DOI if flag is presented

How to test:

After new dataset version is being edited in the UI, go in and find the version in the database and set the 'skip_datacite_update' flag to 1.

Submit the dataset (or new version of it).

The dataset will either not update the metadata in EZID (or else will not exist for the first version).

Create new DryadDash databases

We need the development one soon
Stage, production and maybe demo come later

Choose and validate partner institution membership

A user should be able to choose a partner institution (or choose none) if they are new and don't have something set. We have this example at https://jxosix.axshare.com/#g=1&p=affiliated_with_a_partner_ .

If they choose a partner and log in with Shibboleth then we need to set the tenant to the correct ID in the database for that tenant. If they choose not a partner then they'd be a generic Dryad tenant for submission.

Lengthen temporary Dryad URL timeouts for ingest into Merritt

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

The current dryad has 24-hour public urls. It sounds like these will be used to ingest things into Merritt. It sounds like they need to be longer than that.

Scott & Ryan Investigate Usage Stats in Dryad SOLR

Gauge how much work it would entail to standardize old Dryad usage metrics before migrating

Fork code for Dash into Dryad repositories

We want to move code over so that we have a separate space for it rather than just branching in the current repositories. I believe we can fork our current code to make this happen.

The current three repositories we've been using are below and development currently happens in the "development" branch. This could all be changed in the future.

https://github.com/CDLUC3/dashv2
https://github.com/CDLUC3/stash
https://github.com/cdlib/dash2-config (much of the info in here needs to stay private)

Our new space that Marisa created is at https://github.com/CDL-Dryad

Marisa created the repository at https://github.com/CDL-Dryad/dryad . This could be for the main app (currently dashv2) or if it doesn't make sense, please create a new repo.

The current stash repo could be called differently. It contains engines and gems where the heavier functionality lives. Maybe "dryad-engines" or whatever makes most sense to you.

The config repository has sensitive information so needs to stay private. I'm not sure we can create a private repo under CDL-Dryad because a paid account may be needed for private repos. CDL has a paid account under the /cdlib space, so if needed we can create a new private repository there.

Investigate Stripe Payment

I've asked Melissanne and Elizabeth to figure out what their needs are for Stripe invoicing and create an account. Once the account credentials are created we can hook it up to our system.

Stripe account set up by Elizabeth.

Our need for Stripe:
I need Stripe to be triggered to invoice a submitter when a data submission is approved by the curation team, if there is no fee waiver or sponsorship for the DPC. If a payment fails, I need Stripe to email the submitter with instructions for updating their payment information. When a payment is made, I need Stripe to send the researcher a customized receipt.

Get S3 Bucket for migrated data for Merritt to use

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

I think this will be a Dryad-paid bucket that Merritt will use for a collection? Maybe Ryan/David know more.

User record schema changes

It looks like these fields are no longer needed in the Dryad system with a main ORCID login. The table containing these is stash_engine_users.

uid
provider (always orcid)
oauth_token (was used for google and not used at least for now)

I believe for now we'll want to keep tenant_id (dryad for a generic login and some other value for more specifics for partner), though the way tenants will be used might be a little different soon.

To remove these fields, you'll want to read over http://guides.rubyonrails.org/v4.2/active_record_migrations.html to get an overview of how migrations work to track state of the database and keep it in sync with the code. If you want to get an overview of ActiveRecord you might end up using it sometime soon, also (but isn't necessary yet for migrations).

http://railscasts.com/ is also a really good resource for tutorials, though some of them are out of date for earlier versions of rails because they've been abandoned and not updated in a while. They still are good for some general concepts or an outline of how something works, though. Anything Rails 3 onward has had fewer changes than the earlier versions which had some fairly big changes from one major version to the next. So something talking about Rails 3 is often similar to Rails 4 (or maybe 5).

You can use a generator to create the basic file as outlined in that article. We would be creating the migration within the stash/stash_engine directory. This will put a file inside that engine for the migration.

The way we are using migrations is a bit complicated because it is set up so the main application will take care of the migrations for all engine sub-components. This means to run the migration itself you'd run "rake db:migrate" from the main dryad application.

Running the migration there will not only modify whatever development database you're using, but it will change the schema.rb in the dryad application.

Because of that you'll need to add parallel changes to both the stash repo and the dryad repo.

Add has_part as an exclusion for the citation information for a dataset

When we pull back citations from the crossref eventdata API we are excluding certain citation types.

It looks as though the DataCite folks have added one more that we don't have. We would add has_part to our list of exclusions.

Repair Capybara/Chrome Webdriver browser testing

many things will break with modifications to login flow and we will need to log in with ORCID sandbox
will these tests still work from travis (probably will if they use localhost)
Add tests for claiming old account
It will be difficult to do anything with Shibboleth, though

Travis tests break with pull request flow

Error like below:

Cloning https://github.com/CDLUC3/stash:
+git clone https://github.com/CDLUC3/stash
Cloning into 'stash'...
remote: Counting objects: 38426, done.
remote: Compressing objects: 100% (294/294), done.
remote: Total 38426 (delta 215), reused 256 (delta 113), pack-reused 38003
Receiving objects: 100% (38426/38426), 27.25 MiB | 14.60 MiB/s, done.
Resolving deltas: 100% (23869/23869), done.
+echo 'Checking out stash branch doc-update'
Checking out stash branch doc-update
+cd stash
+git checkout doc-update
error: pathspec 'doc-update' did not match any file(s) known to git.

We are still likely to have per-institution (partner) settings
Determine what to do with all settings
- Authentication becomes secondary authentication scheme for partner validation since primary is ORCID
- Different submission sizes or max files per partner?
- campus contacts or manager emails?
- agreements?
- different DOI registrars?
- Remove some items. Move other items to main configuration instead of per-partner institution.
- Search through code and update any of the config items if they have changed or been removed
Add tenant for generic Dryad users without a special partner institution

Update DOI targets with DataCite for landing pages

Dryad-classic DOIs currently go to URLs like https://datadryad.org/resource/doi:10.5061/dryad.9g . Our normally set URLs are in the format /stash/dataset/<doi>

Problem description, and links to relevant helpdesk tickets:

We need to update the target URLs for non-Dash, Dryad-classic DOIs.

Describe the solution you'd like:

In a rake task, but some of the code may be in other places, also.

Oh, some other random notes about checking targets manually by hand for validation. Looks like urls like https://doi.org/10.5061/dryad.9gf10?action=showurls will show the target urls easily without special headers.

More info than I ever wanted about DOIs at https://www.doi.org/doi_handbook/TOC.html .

And handy api for viewing like https://api.datacite.org/dois/10.5061/dryad.9gf10 . See docs at https://support.datacite.org/docs/api-get-doi .

Fix email fill-in from ORCID

This was filling in an email address if it was exposed from orcid into the user record, but now it no longer seems to work.

Code for API Submission of Datasets to Merritt through Dryash

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

Code that will submit specific version of a dataset instead of forcing one at a time
It sounds like there are some special versioning issues here
Not sufficient to just submit them in order one at a time?

datadryad / dryad-product-roadmap Goto Github PK

dryad-product-roadmap's Introduction

dryad-product-roadmap

dryad-product-roadmap's People

Contributors

Stargazers

Watchers

dryad-product-roadmap's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs