GithubHelp home page GithubHelp logo

dryad-product-roadmap's Introduction

dryad-product-roadmap

This repository is used to support the product development roadmap.
Work will be defined via the issue tracker and tracked via Github projects

dryad-product-roadmap's People

Contributors

ahamelers avatar ryscher avatar marisastrong avatar mariapraetzellis avatar sfisher avatar

Stargazers

Sarah Lippincott avatar  avatar  avatar roll avatar  avatar Jae Yeon Kim avatar Jim Vanderveen avatar  avatar

Watchers

 avatar James Cloos avatar  avatar Alin Vetian avatar Ashley Gould avatar Jim Vanderveen avatar Debra Fagan avatar Stephen C Diggs avatar

dryad-product-roadmap's Issues

Missing Datasets? Match emails with user record (from My Datasets)

After logging in and if no datasets we'd have a notification like in this wireframe (forget about the GDPR on that screen for now).

https://jxosix.axshare.com/#g=1&p=my_datasets_-_v1

Then they 'd go through this validation routine.

https://jxosix.axshare.com/#g=1&p=migrate_data_-_v1

I believe what this would switch the current_user to the matching email record and update any information in that user record and remove discarded user record.

Put together instructions on how to get tests to run locally

It's great that we can commit and see results on Travis.ci, but for writing tests and being sure they work, that long lag time is really unacceptable.

Tests are working locally on my machine, but I need good instructions for getting them set up for others so they can run them easily, also.

UI & SOLR/Harvester Servers for Dryad/Dash

We need servers set up and they may involve a number of things. Development environment will be most time-critical to get into a basic useful state.

  • Things installed on them from puppet scripts, which might include the following
  • Shibboleth Service Provider
  • Certificates for ssl/https
  • Apache/configs
  • Ruby
  • Some basic Linux libraries like mysql and a few other things (I hope these are in puppet)
  • Crons
  • Any external shibboleth shenanigans to get logins working for the servers such as registering the domains
  • I'm sure there is more, Marisa and Jim, help me fill out this list

Complete development environment setup

The development environment has running UI code, but it needs some additional things done to fully function.

  • @jimvanderveen is finishing our development SOLR/Harvester server setup and puppet deployment
  • Install harvester
  • Install SOLR
  • Configure SOLR with geoblacklight configuration
  • Get and configure Merritt collection for the generic Dryad tenant

Finishing touches for migrate button

  • Formatting of the UI
  • Error messages for wrong code
  • Add "No" button for no migration
  • If "No" or migration successful, make message disappear from dashboard.
  • Add option to send code to new email

Remove developer login

  • Remove developer login cases from callbacks since we do not need them unless we have trouble getting ORCID to enable localhost for the dev/sandbox login
  • We can manually set an associated partner in the database for an ORCID if we want to test per-institution functionality like logos, specific identifier minting or submission to specific Merritt collection
  • I also added some patches to the default developer login layout and functioning that we can remove

Migration for user accounts in the database

Perform the database-side user migration.

  • Determine whether old user accounts should go into the main (stash users) table, or into a completely new table. What info do we need to store in this table? (When this is done, notify team so they can build to the new user info).
  • Modify the tables appropriately (i.e., make a rails migration to update the schema)
  • Determine whether user accounts should be migrated via SQL import/export commands, or via some more rails-oriented method
  • Implement migration of the user account records (though these won't do anything until the UI account migration is enabled)

GitHub/Slack integration

It is useful to be notified of GitHub activity through Slack.

  • Create a new (shared) slack channel for notifications and subscribe the appropriate people
  • Set up GitHub notifications from all appropriate repositories to the notification channel
  • Test whether checklist changes are added to Slack

Remove "Connect your ORCID" in metadata entry

Remove "connect your ORCID" icon from the metadata entry page since they will already by logged in with orcid.

Also remove anything in the callbacks for this.

Note that we still need a callback for co-author orcids and people that receive an invitation to connect an orcid.

Email claim mechanics

I believe this might be how we take care of this.

  • We insert user records already connected to their resources/datasets, but missing ORCID when we do real data migration.
  • After user successfully claims their email, we update all resources that had the user id number corresponding with the claimed user. They are updated to their new user id number that has been claimed.
  • Do we delete the old, claimed user?
  • Other tables to update user_ids from old to new: stash_engine_resource_states, stash_engine_resources.current_editor_id
  • Do we need to update tenant_id in stash_engine_resources? Or maybe updated correctly at main data migration for any Dryad and DataONE records.

Validating required metadata through API

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

We need to be able to validate metadata besides through the review screen or attempting to submit a dataset

  • There is existing code to validate correct descriptive metadata we can hook in
  • More deep validation may be required to be sure all required internal tables are populated?
  • We should consider adding another mechanism, e.g. by checking for the presence of PATCH in the Allow: header on the /datasets/doi URL, or maybe something more sophisticated

API additional actions or information for Dryad

https://github.com/CDL-Dryad/dryad-product-roadmap/wiki/Migration

The API needs to have some additional things for the Dryad migration.

  • Way to validate dataset without submitting
    • There is existing code to validate correct descriptive metadata we can hook in
    • More deep validation may be required to be sure all required internal tables are populated?
    • We should consider adding another mechanism, e.g. by checking for the presence of PATCH in the Allow: header on the /datasets/doi URL, or maybe something more sophisticated
  • Submit a specific version instead of only allowing one version in progress

I may need to talk through more about what the needs are here.

Add missing # rubocop:enable comments, and fix any resulting style issues

(description copied from CDLUC3/stash#2 by @dmolesUC3)

When I first set up RuboCop, I assumed # rubocop:disable was block scoped, but it turns out it's meant to be paired with a matching # rubocop:enable. Without that, it means "disable till further notice".

We've put # rubocop:enable in a few places (e.g. Resource::duplicate_filenames), but there are still a lot of orphan disables out there. At some point we should clean these up, and fix any style issues we discover in code where checks were unintentionally disabled.

Login overview placeholder

Login is more of a milestone, probably, but putting some things in here where we can find them for background.

Jane has created this flow and these wireframes for us. I believe the email flow is essentially OK, though the login flow wasn't complete for all the things we talked about. I'm sure we were both confused about this and could've used more definition.

We talked through a simplified login flow on Friday among our team: login_v2.pdf The main point of this flow is that everyone will log in with ORCID. The possible additional steps on an initial login will then be 1) associate that you're at a partner institution by logging in by Shibboleth and 2) claim/migrate your account if you were a previous Dryad user by way of something similar to a 2-factor authentication flow by getting a code to validate your email.

Seems like there is also a feature I'd forgotten about in there which is a GPDR acceptance? Also "missing data" thing for users on the list of datasets page.

Add privacy policy and terms of service to login process

When logging in for the first time, if the user does not have a previous Dryad account, they should be presented with information about the Dryad's Privacy Policy and Terms of Service, and asked to accept the Terms of Service.

The location within the login process should be after the Orcid login (and after institutional login). It could be included within the data migration page, but it may be better to present it as a separate page.

The current wording from old Dryad can be re-used:
screenshot 2018-07-09 14 32 01

The links should go to:

Broken Unit Tests

Unit tests are going to start breaking in a number of areas

  • Because of config changes
  • Perhaps in users model
  • Some unexpected places
  • Adding new tests for added methods in new or existing models

Support Dryad DataCite Acct for new DOIs

It sounds like we should get the account from them and configure it to be used by the new Dryad code.

Also we still need to support old EZID stuff for previously submitted datasets?

Select your partner institution (or none)

Jane has https://jxosix.axshare.com/

There are two main options. Non-partner just sets the tenant to be Dryad.

Choosing a partner does a secondary (shibboleth) login and then comes back to set the tenant to be more specific after login is validated.

The Dryad option is already done.

Once shibboleth is working it should call back to the sessions controller as shown at https://github.com/CDL-Dryad/stash/blob/master/stash_engine/app/controllers/stash_engine/sessions_controller.rb .

It will be the callback method that it will go back to. A tenant_id parameter should be pass along in the URL.

Once the user has successfully validated their shibboleth then their tenant_id should be written to that field in their user record. (StashEngine::User.tenant_id)

So that could be done before redirecting them to the dashboard path.

Once a user's tenant has been set after logging it, it does not need to be set again on subsequent logins.

Check deploy and fix deploy from new repository location on github

We have an almost working deploy on uc3-dryad-dev.cdlib.org but are running into issues.

IMPORTANT, there is a branch for this stuff that I don't want to merge into the master branch until the deploy works. It has Capistrano changes and other stuff (and I set up the server like at https://confluence.ucop.edu/display/UC3/dryad_server_setup_steps ).

The branch you want is here: https://github.com/CDL-Dryad/dryad/tree/deploy_test and it's called deploy test. You might need a working application to deploy with Capistrano (maybe). I think this gives the basic steps. https://github.com/CDL-Dryad/dryad/blob/master/documentation/dryad_install.md Our private config repo is at https://github.com/cdlib/dryad-config.

After tall that fun, you should be able to type 'cap development deploy' and it will deploy to our new dryad-dev.cdlib.org server.

Right now it fails because of missing libraries on the server when it tries to compile a native-code gem. Jim needs to install it.

We hope after that then it will deploy and run and you could log in or try to log in to the application. Or maybe we'll have to fix shibboleth problems. Yay.

Do not update DOI in some circumstances

  • Set up way to skip updating the DOI if flag is presented

How to test:

After new dataset version is being edited in the UI, go in and find the version in the database and set the 'skip_datacite_update' flag to 1.

Submit the dataset (or new version of it).

The dataset will either not update the metadata in EZID (or else will not exist for the first version).

Fork code for Dash into Dryad repositories

We want to move code over so that we have a separate space for it rather than just branching in the current repositories. I believe we can fork our current code to make this happen.

The current three repositories we've been using are below and development currently happens in the "development" branch. This could all be changed in the future.

https://github.com/CDLUC3/dashv2
https://github.com/CDLUC3/stash
https://github.com/cdlib/dash2-config (much of the info in here needs to stay private)

Our new space that Marisa created is at https://github.com/CDL-Dryad

Marisa created the repository at https://github.com/CDL-Dryad/dryad . This could be for the main app (currently dashv2) or if it doesn't make sense, please create a new repo.

The current stash repo could be called differently. It contains engines and gems where the heavier functionality lives. Maybe "dryad-engines" or whatever makes most sense to you.

The config repository has sensitive information so needs to stay private. I'm not sure we can create a private repo under CDL-Dryad because a paid account may be needed for private repos. CDL has a paid account under the /cdlib space, so if needed we can create a new private repository there.

Investigate Stripe Payment

I've asked Melissanne and Elizabeth to figure out what their needs are for Stripe invoicing and create an account. Once the account credentials are created we can hook it up to our system.

Stripe account set up by Elizabeth.

Our need for Stripe:
I need Stripe to be triggered to invoice a submitter when a data submission is approved by the curation team, if there is no fee waiver or sponsorship for the DPC. If a payment fails, I need Stripe to email the submitter with instructions for updating their payment information. When a payment is made, I need Stripe to send the researcher a customized receipt.

User record schema changes

It looks like these fields are no longer needed in the Dryad system with a main ORCID login. The table containing these is stash_engine_users.

  • uid
  • provider (always orcid)
  • oauth_token (was used for google and not used at least for now)

I believe for now we'll want to keep tenant_id (dryad for a generic login and some other value for more specifics for partner), though the way tenants will be used might be a little different soon.

To remove these fields, you'll want to read over http://guides.rubyonrails.org/v4.2/active_record_migrations.html to get an overview of how migrations work to track state of the database and keep it in sync with the code. If you want to get an overview of ActiveRecord you might end up using it sometime soon, also (but isn't necessary yet for migrations).

http://railscasts.com/ is also a really good resource for tutorials, though some of them are out of date for earlier versions of rails because they've been abandoned and not updated in a while. They still are good for some general concepts or an outline of how something works, though. Anything Rails 3 onward has had fewer changes than the earlier versions which had some fairly big changes from one major version to the next. So something talking about Rails 3 is often similar to Rails 4 (or maybe 5).

You can use a generator to create the basic file as outlined in that article. We would be creating the migration within the stash/stash_engine directory. This will put a file inside that engine for the migration.

The way we are using migrations is a bit complicated because it is set up so the main application will take care of the migrations for all engine sub-components. This means to run the migration itself you'd run "rake db:migrate" from the main dryad application.

Running the migration there will not only modify whatever development database you're using, but it will change the schema.rb in the dryad application.

Because of that you'll need to add parallel changes to both the stash repo and the dryad repo.

Repair Capybara/Chrome Webdriver browser testing

  • many things will break with modifications to login flow and we will need to log in with ORCID sandbox
  • will these tests still work from travis (probably will if they use localhost)
  • Add tests for claiming old account
  • It will be difficult to do anything with Shibboleth, though

Travis tests break with pull request flow

Error like below:

Cloning https://github.com/CDLUC3/stash:
+git clone https://github.com/CDLUC3/stash
Cloning into 'stash'...
remote: Counting objects: 38426, done.
remote: Compressing objects: 100% (294/294), done.
remote: Total 38426 (delta 215), reused 256 (delta 113), pack-reused 38003
Receiving objects: 100% (38426/38426), 27.25 MiB | 14.60 MiB/s, done.
Resolving deltas: 100% (23869/23869), done.
+echo 'Checking out stash branch doc-update'
Checking out stash branch doc-update
+cd stash
+git checkout doc-update
error: pathspec 'doc-update' did not match any file(s) known to git.

Create new branch for Dryad development

When this branch is created, change Capistrano deploys to point toward new server names so we don't accidentally wipe out existing dev/stage/demo/production until we are ready.

Specific tenant (partner) config changes

We need to evaluate how these will change, specifically these issues/questions.

  • We are still likely to have per-institution (partner) settings
  • Determine what to do with all settings
    • Authentication becomes secondary authentication scheme for partner validation since primary is ORCID
    • Different submission sizes or max files per partner?
    • campus contacts or manager emails?
    • agreements?
    • different DOI registrars?
    • Remove some items. Move other items to main configuration instead of per-partner institution.
    • Search through code and update any of the config items if they have changed or been removed
  • Add tenant for generic Dryad users without a special partner institution

Update DOI targets with DataCite for landing pages

Dryad-classic DOIs currently go to URLs like https://datadryad.org/resource/doi:10.5061/dryad.9g . Our normally set URLs are in the format /stash/dataset/<doi>

Problem description, and links to relevant helpdesk tickets:

We need to update the target URLs for non-Dash, Dryad-classic DOIs.

Describe the solution you'd like:

In a rake task, but some of the code may be in other places, also.

  • Generate list of DOIs to update. Order by ID so the order doesn't change.
    • This would be items from non-Dash collections.
    • List should only be items that are pubished or embargoed.
    • Add some basic test or validation to be sure selection is correct.
  • For each DOI in the list, call our update code that is in our Rails app. This will update metadata and target to current values..
    • Add delays between updates of datacite metadata so as not to overload DataCite's servers (maybe a second or two), be sure we have generous retries since DataCite will undoubtably have problems at some point or network problems or whatever.
    • Create some way of restarting the script again if it fails out in the middle. Maybe just as simple as saying "start at item n" so long as the list we're working from is consistent and isn't regenerated.
  • Run and monitor the updates. IDK how long-running this task will be, but it might take a long time if we have to update a lot of datasets.
  • There will be some items with problems. Keep list of items that need manual intervention for fixing the small number of bad data from Dryad classic or weird problems.
  • Fix one-off or weird problems. I hope not too many of them.

Oh, some other random notes about checking targets manually by hand for validation. Looks like urls like https://doi.org/10.5061/dryad.9gf10?action=showurls will show the target urls easily without special headers.

More info than I ever wanted about DOIs at https://www.doi.org/doi_handbook/TOC.html .

And handy api for viewing like https://api.datacite.org/dois/10.5061/dryad.9gf10 . See docs at https://support.datacite.org/docs/api-get-doi .

Fix email fill-in from ORCID

This was filling in an email address if it was exposed from orcid into the user record, but now it no longer seems to work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.