GithubHelp home page GithubHelp logo

i-guide / catalog Goto Github PK

View Code? Open in Web Editor NEW
1.0 4.0 5.0 6.74 MB

The I-GUIDE Catalog is part of the I-GUIDE Platform and provides search, discovery, and dynamic interaction with resources created by or used by I-GUIDE researchers.

Home Page: https://i-guide.io/platform/

License: BSD 3-Clause "New" or "Revised" License

Python 49.69% Dockerfile 0.26% Makefile 0.15% JavaScript 0.03% Shell 0.02% HTML 0.13% Vue 36.39% SCSS 0.34% CSS 0.88% TypeScript 12.11%
catalog discovery interactive-content search i-guide actionable-data

catalog's People

Contributors

horsburgh avatar hydrocheck avatar igarousi avatar maurier avatar pkdash avatar sblack-usu avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

catalog's Issues

Create SchemaOrg Example: Web Application

Explore how SchemaOrg properties can be used to describe a Web Application. The outcomes of this task will be:

A document containing:

  • A JSON+LD example containing required metadata
  • A JSON+LD example containing recommended metadata
  • A JSON+LD example for a HydroShare Web Application

Release the new user interface

Collaborators at USU will work on this. Updates include the new interface of the data catalog, mainly the landing page of the submitted items.

#JIRA=CAM-54

Update Pydantic models

Need to update the pydantic models to match the recent changes to data catalog schema specification requirements..

Need a way to add properties to primitive array items in pydantic

We have subschemas consisting of arrays of primitive items for which there is no way to tap into the primitive items and add properties to the resulting schema.

For example, take the field below:

identifier: Optional[List[str]] = Field(
        title="Identifiers",
        description="Any kind of identifier for the resource. Identifiers may be DOIs or unique strings "
                    "assigned by a repository. Multiple identifiers can be entered. Where identifiers can be "
                    "encoded as URLs, enter URLs here."
    )

Generated schema:

"identifier": {
      "title": "Identifiers",
      "description": "Any kind of identifier for the resource. Identifiers may be DOIs or unique strings assigned by a repository. Multiple identifiers can be entered. Where identifiers can be encoded as URLs, enter URLs here.",
      "type": "array",
      "items": {
        "type": "string"
      }
    },

We would like to add a title to the schema generated for the str type so that the resulting schema looks like this:

"identifier": {
      "title": "Identifiers",
      "description": "Any kind of identifier for the resource. Identifiers may be DOIs or unique strings assigned by a repository. Multiple identifiers can be entered. Where identifiers can be encoded as URLs, enter URLs here.",
      "type": "array",
      "items": {
        "title": "Identifier",  // <----------
        "type": "string"
      }
    },

This will allow the renderers to show a title in fields like these:
image

Create dedicated HTML page for Dataset landing page

The renderers 'view' mode does not provide an elegant enough way to display the information in the Dataset landing page. Now that the Dataset JSON schema has settled we should create a dedicated page with HTML and CSS that elegantly displays the information.

Event for metadata extraction

The simplest would be to just fire it after a workflow runs and after files finish uploading in the browser.

#JIRA=CAM-54

The error related to incorrect spatial coverage information is not shown explicitly on the page

When entering values for a bounding box that are not within the appropriate ranges (-90 to 90 for latitude and -180 to 180 for longitude), the 'Save' button does not function. Currently, the only message displayed indicates that the submission has failed. To enhance user experience, it would be beneficial to direct the user to the spatial coverage field or provide a specific error message within the same pop-up box where the failure message appears.

Pin Pydantic to version 1

Pydantic v2 has breaking changes. Need to pin it to v1 to make the current code base work. We will do the upgrade to v2 (Issue #38)

List of specific types of CreativeWork for the I-GUIDE Data Catalog

We need to identify a list of specific types of CreativeWork and then create a schema example for each. This will help us complete the "What is a record" and "What is NOT a record" sections in the README.md file.

  • CreativeWork > TextDigitalDocument
  • CreativeWork > DigitalDocument
  • CreativeWork > MediaObject > ImageObject | DataDownload | VideoObject
  • CreativeWork > DataCatalog
  • CreativeWork > Dataset
  • CreativeWork > Course
  • CreativeWork > SoftwareSourceCode
  • CreativeWork > SoftwareApplication

We also need to identify a list of specific types of Thing.

  • Thing > Person
  • Thing > Organization
  • Thing > Place
  • Thing > Intangible > Grant
  • Thing > Intangible > Language
  • Thing > StructuredValue> PropertyValue

Export user and resource access control to mongo collections

A branch exists in HydroShare that exports user/resource access control to a mongo collection and is deployed to beta.hydroshare.org. Setup listeners on the mongo collection change stream to:

  1. Add/remove discoverable resources from discovery
  2. Map user access control with resources that externally reference S3 resources to console.minio.cuahsi.io

That mongo database can be found on atlas at CUAHSI->CZNET->Cluster0->hydroshare_beta. The two collections to listen to are resourceaccess and userprivileges

Add/remove discoverable resources from discovery
Documents in the resourcesaccess collection look like:

{
    "resource_id": "8bb057d9653c4abba8bb2e48fe3642ce",
    "is_public": true,
    "show_in_discover": true,
    "minio_resource_url": "some url"
}

Only listen for documents that have "minio_resource_url": "Not null value. Add/remove documents from discovery based on show_In_discover.

Map user access control with resources that externally reference S3 resources to console.minio.cuahsi.io
Documents in the userprivileges collection look like:

{
    "username": "sblack",
    "all": {},
    "minio": {
        "owner": [
            {
                "owners": [
                    "sblack"
                ],
                "resource_id": "8bb057d9653c4abba8bb2e48fe3642ce",
                "minio_resource_url": "https://console.minio.cuahsi.io/browser/sblack/YXJnb193b3JrZmxvd3MvcGFyZmxvdy9kYzRlYWZkNi0yNTM0LTQwMjEtODNiZS1iZjM2YWNhNDhhMjIv"
            }
        ],
        "edit": [
            {
                "owners": [
                    "sblack-admin"
                ],
                "resource_id": "b9ac783296cc4a93b8996247e120aa61",
                "minio_resource_url": "https://console.minio.cuahsi.io/browser/sblack-admin/editable"
            }
        ],
        "view": [
            {
                "owners": [
                    "sblack-admin"
                ],
                "resource_id": "b4c9b612f157452dbb6826aabeb15b0e",
                "minio_resource_url": "https://console.minio.cuahsi.io/browser/sblack-admin/viewable"
            }
        ]
    }
}

The username is the user that the access control applies to. The all property is a complete dump of all hydroshare resource privileges for the user, ignore it. The minio property contains user privileges for resources that have an additional metadata key of minio_resource_url and the value is copied to the mongo document. There are 3 lists; view, edit, owner. Each item in those lists has an owners property. The first owner maps to the bucket name. Resource_id is the hydroshare resource id. Minio_resource_url is the value in additional_metadata that points to a path on minio.

#JIRA=CAM-54

Catalog Hydroshare resource

Using hydroshare resource identifier/url as an input, user should be able to catalog hydroshare resource metadata. This functionality is similar to hydroshare resource registration in DSP.

Only public hydroshare resource can be cataloged.

Modification to software source code schema

Below are comments from an outstanding PR on software source code. Moving these to an issue so we can merge the PR and consolidate repositories.

  1. Consider changing to something like the following:

To classify a record as a computer programming source code, "@type: "SoftwareSourceCode" should be used in the json schema. This will classify the record such as compile ready solutions, code snippet samples, scripts, etc. as a specific Schema.Org type called SoftwareSourceCode for which the metadata should be described using the core metadata, as well as the software-source-code-specific properties for the Schema:SoftwareSourceCode class. The following table outlines the required and optional properties selected from Schema.Org vocabulary to design the I-GUIDE software source code metadata schema. These properties are encoded as 1 or 1+ for required and 0,1 or 0+ for optional in the Cardinality column of the table below.

to

To classify a record as a computer programming source code, use "@type: "SoftwareSourceCode" in the json schema. This is appropriate for records including code snippet samples, scripts, notebooks, etc. The following table outlines the required and optional properties to sufficiently describe software source code objects. Required properties have a cardinality of 1 or 1+ and optional properties have a cardinalities of 0, 0+, 1.

  1. I'm not sure that we want to embed source code within the schema. I wonder if this could have security implications. See:
| [text](https://schema.org/text) | CreativeWork | Text | 0,1 | The textual content of the source code. |

Register Public S3 datasets

Assuming the extracted metadata files are present and valid, pick up the root metadata file and register it in the catalog.

Sync the Discovery database

Listen to changes in the resourceaccess collection and sync an resource changes to the discovery collection. When a minio resource is made discoverable then retrieve the extracted metadata from S3 and place it in the discovery collection. When a minio resource is made NOT discoverable, then delete the entry from the discovery collection.

#JIRA=CAM-54

Add "Open With" button to resource page

Open with functionality

  • When rendering the landing page, check to see if the resource is a HydroShare resource
  • If yes, add a “Launch on I-GUIDE Platform” button at the top right of the form
  • Construct a URL for the button from the resource metadata - this URL will launch the whole resource into JupyterHub
  • When the user clicks this button, it will launch the URL
  • The user will be taken to the iGUIDE platform JupyterHub instance, which will download the resource using NBFetch
  • The User can then select a notebook from the folder to run

Add new attributes to MediaObject pydantic model

The MediaObject schema as part of the associatedMedia, needs to include the following two additional attributes:

  • sha256
  • isPartOf

Using isPartOf we will be able to associate a content file to its metadata file.

What units should contentSize be expressed as?

According to SchemaOrg contentSize is:

File size in (mega/kilo) bytes."

This will be strange for small or very large files:

256 TB ~= 2.56e+8 MB
256 B ~= 0.256 KB

Moreover, if we round to the nearest integer the becomes even worse: 256 B = 0 KB or 1KB

One solution is to recommend that all contentSize values simply contain units, e.g. Bytes (B), Kilobytes (KB), Megabytes (MB), Gigabytes (GB), Terabytes (TB).

Metadata extractor changes needed

Needs to set the name property of the MediaObject. The name should be set to the file name. The 'hasPart' object name property needs to be set to the name of the metadata file?

Store temporal coverage as date type in discovery

Temporal coverage in catalog is of type string. In order to search records based on temporal coverage date range, the string type date values need to be converted to date type and stored in discovery collection.

Turn off fetching of hydroshare resource files metadata

As part of registration of hydroshare resource, we are currently fetching metadata for each of the files in that resource. If a resource has thousands of files, fetching metadata for all files can cause timeout error. Displaying metadata for large number of files on the UI probably needs some changes to the UI. Due to these issues related to large number of files, for now we need to turn off fetching of files metadata as part of hydroshare resource registration.

PropertyValue should have a property called "unitText"

The unitText property is an optional feature when using "PropertyValue" type. For example, if the PropertyValue is used to express a measured variable, the unitText should contain the unit of the measured variable. See the example below:

{
"variableMeasured": {
"@type": "PropertyValue",
"name": "Streambed interface temperature values",
"unitText": "degC"
}
}

Clicking on the "contribute data" button takes users to the CZ Hub web page.

To register a record to the data catalog, a user has two options:

  1. Click on the "contribute data" button in the middle of the home page. This will redirect the user to the CZ Hub website. Please use the correct link.
  2. Click on the "contribute" button found on the top right toolbar. This will take the user directly to the IGUIDE data catalog submission page.

API for resource creation and metadata update

Add a POST endpoint that creates a hydroshare resource with the additional metadata key minio_resource_url and a value that points to an S3 path. Also add a PUT endpoint for updating metadata that updates the metadata files on S3.

#JIRA=CAM-54

Create Example for Schema with Graph Diagram

Create an example of the schema that illustrates the relationships between schema.org properties. Create a graphical visualization of this example using json-ld.org/playground.

Map user/resource privileges to S3 JSON policies and save to CUAHSI MinIO

The CUAHSI Subsetter application has a router that maps the resourceaccess documents to json policies along with the ability to save the policies on a S3 server, here. Copy this router to the catalogapi and wire it up to events that get the JSON policies saved to MinIO for the user. Below is a proposal to use Mongo changestream but it could instead be accomplished with an alternate solution.

Listen to the Mongo changestrem for userprivileges collection and map each document that has entries in the minio property to S3 JSON policies and save them to console.minio.cuahsi.io. The catalog uses changestreams already and an example of usage can be found at https://github.com/I-GUIDE/catalogapi/blob/develop/triggers/update_catalog.py#L32

The Minio client needs to be installed and configured with the image. Installation of the client is found here https://github.com/CUAHSI/domain-subsetter/blob/subsetter_argo/app/api/Dockerfile#L10

An minio client alias needs to be setup for the cuahsi server. This is done on the fastapi startup event, https://github.com/CUAHSI/domain-subsetter/blob/subsetter_argo/app/api/subsetter/main.py#L103

#JIRA=CAM-54

Resource Landing Page

Using the i-guide resource landing page, with the CUAHSI MinIO server, update it to:

  • read/write metadata files
  • read metadata extracted files (includes aggregation metadata)
  • Upload files
  • Download files
  • Create resource (create a resource in hydroshare with the minio_resource_url key)
  • Bonus a file viewer

The subsetter application has a router for creating presigned urls for GET/PUT. Copy this router to the catalog and use the endpoints to generate the urls. Use the urls to GET or PUT files directly from the browser to the CUAHSI S3 server.

https://github.com/CUAHSI/domain-subsetter/blob/subsetter_argo/app/api/subsetter/app/routers/storage/router.py#L10

@Maurier - I'm happy to help you break this up into smaller chunks as you prepare issues around the resource landing page.

#JIRA=CAM-54

Setup authentication refresh tokens

The site will log users out as soon as the authentication tokens expires, even if they are interacting with the site. We should setup authlib refresh token endpoints.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.