GithubHelp home page GithubHelp logo

iodepo / oceanbestpractices Goto Github PK

View Code? Open in Web Editor NEW
12.0 12.0 6.0 18.13 MB

Repository to store the OpenSource version of the code made by E84 for OceanBestPractices.org

Home Page: https://oceanbestpractices.org

License: GNU Affero General Public License v3.0

JavaScript 70.79% Ruby 0.40% HTML 1.24% CSS 7.34% SCSS 3.02% TypeScript 17.04% Shell 0.15% Dockerfile 0.02%

oceanbestpractices's People

Contributors

arnounesco avatar brianandres2 avatar dependabot[bot] avatar it-iode avatar justinykuo avatar parksjr avatar paulpilone avatar pbuttigieg avatar rpropri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

oceanbestpractices's Issues

Need SPARQL query to build relationship graph from a selected term.

The user is able to select an existing term and the application should display the relationships (defined by IODEPO) for that term. We need SPARQL queries that can build that relationship for a given term. I assume we'll need queries specific to each ontology (or at least groups of ontologies). I can offer the current query as an example but it's so complex (and wrong) that I'm not sure it'll be of much help.

This is to address OBP-256: As the owner, I want to see expected semantic neighborhoods with full and accurate relationships.

SPARQL query(ies) to extract labels for tagger index after ingesting a new ontology.

We're introducing dynamic ingest of ontologies and therefore are updating the tagging routine to respond when a new ontology is added to Neptune. The tagging routine needs to extract the labels from the new ontology in order to index them into the tagging index in Open Search. We need a SPARQL query that we can use to perform this as soon as a new ontology is loaded into Neptune. If we need to support different queries depending on the type of ontology ingested we need to know how to identify which query to use.

Add Working group about page

Add a web page or a link/redirect to an external webpage describing the OBPS Working Group and Steering Committee.

A decision on where the page will be hosted is yet to be made.

Need SPARQL query for fetching synonyms and like words.

When the user searches by keyword they have the option to ask for synonyms and like words to be included in the search query. We need a SPARQL query that supports finding those words based on the keyword entered in the search field.

For example, the current query that performs this looks like:

 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> \
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> \
 PREFIX owl: <http://www.w3.org/2002/07/owl#> \
 PREFIX skos:<http://www.w3.org/2004/02/skos/core#> \
 SELECT DISTINCT ?annotatedTarget ?annotatedPropertyLabel ?sameAsLabel \
 WHERE { \
  { \
    ?nodeID owl:annotatedSource ?xs . \
    ?nodeID owl:annotatedProperty ?annotatedProperty . \
    ?nodeID owl:annotatedTarget ?annotatedTarget . \
    ?nodeID ?aaProperty ?aaPropertyTarget . \
    OPTIONAL {?annotatedProperty rdfs:label ?annotatedPropertyLabel} . \
    OPTIONAL {?aaProperty rdfs:label ?aaPropertyLabel} . \
    FILTER ( isLiteral( ?annotatedTarget ) ) . \
    FILTER ( ?aaProperty NOT IN ( owl:annotatedSource, rdf:type, owl:annotatedProperty, owl:annotatedTarget ) ) \
    { \
      SELECT DISTINCT ?xs WHERE { \
        ?xs rdfs:label ?xl . \
        FILTER (?xl = '${term}'^^xsd:string) \
      } \
    }\
  } \
  UNION \
  { \
    SELECT ?sameAsLabel \
    WHERE { \
      ?concept skos:prefLabel ?prefLabel . \
      FILTER (str(?prefLabel) = '${term}') \
      ?concept owl:sameAs ?sameAsConcept . \
      ?sameAsConcept skos:prefLabel ?sameAsLabel . \
    } \
  } \
}

Which only works because we know exactly which ontologies we've ingested - and even then this query barely works. This can be a challenging query because the user isn't forced to select from a list of terms before submitting their search query. So we have to have a query that first finds the word that they typed in and then (I assume) use the information from that query to find synonyms and like words.

Improve deployments by migrating to Serverless Framework.

Serverless Framework (https://www.serverless.com/) has become almost the standard now for managing serverless deployments and OBP is a perfect candidate for it. There are a number of benefits to migrating the CloudFormation templates to this framework including deployment times and complexity.

Each "component" of the app can become a service. Resources such as Elasticsearch can still be defined with CloudFormation. We can start with a single serverless file for now and if it becomes too cumbersome replace it with multiple services managed by a single deployment script.

obp-scheduler-function-staging is throwing an exception

from the mail from Paul to Arno (21/02/2020) :

I started to look at why Elasticsearch was empty and found that the obp-scheduler-function-staging is throwing an exception (I found this by looking at the CloudWatch Logs for this function):

{ "errorType":
"Runtime.ImportModuleError",
"errorMessage":
"Error: Cannot find module 'xml2js'\nRequire stack:\n- /var/task/scheduler.js\n- /var/runtime/UserFunction.js\n-
/var/runtime/index.js", "stack":
[ "Runtime.ImportModuleError: Error: Cannot find module 'xml2js'",
"Require stack:",
"- /var/task/scheduler.js",
"- /var/runtime/UserFunction.js",
"- /var/runtime/index.js",
" at _loadUserApp (/var/runtime/UserFunction.js:100:13)",
" at Object.module.exports.load (/var/runtime/UserFunction.js:140:17)",
" at Object.<anonymous> (/var/runtime/index.js:43:30)",
" at Module._compile (internal/modules/cjs/loader.js:955:30)",
" at Object.Module._extensions..js (internal/modules/cjs/loader.js:991:10)",
" at Module.load (internal/modules/cjs/loader.js:811:32)",
" at Function.Module._load (internal/modules/cjs/loader.js:723:14)",
" at Function.Module.runMain (internal/modules/cjs/loader.js:1043:10)",
" at internal/main/run_main_module.js:17:11"
] }

It looks like it can't find the XML parsing library. It's possible this has something to do with updating the nodejs runtime.

bulk indexer output

the output of the bulk indexer does noet really reflect if the indexing was succesfull or not. It would be handy if that could change so there is an immediate indication that action has to be taken to solve problems with the indexing.

Document clustering

At the AtlantOS symposium at UNESCO, a desire to adapt the UX use the tagging to cluster submissions on similarity and to display smaller/more focused and informal best practices around those that are peer reviewed, come from a major consortium (which went through internal review), and/or are endorsed by some authority (e.g. a GOOS Panel).

The search will prioritise those that have higher QC, with an option to display all.

This is a means to prevent the highest quality docs from drowning in small/unreviewed (but still valuable) submissions

Auto-tweet new submissions from RSS feed

Grab new entries in the RSS feed and push to our Twitter channel.
Intelligent hash-tagging would be a nice-to-have feature, if selected metadata fields can be parsed.

Display OBPS record google analytic metric on the UI

GA provides download metrics for each OBPS record, but it can only be accessed on the DSpace repository. We need to display this download metric on the individual record in the UI results display. This metric is a real selling point for OBPS

Update copy on first visit screen.

REWORD:

The Ocean Best Practices System (OBPS) is a secure, permanent global repository of ocean research, operations, data/information management and applications methodologies (also known as “BestPractices”) ** The OBPS invites the ocean community to submit their own methodologies to share globally with their colleagues.

Please note, unless it is annotated as Endorsed by an Expert Panel, inclusion of a methodology in OBPS does not indicate a recommendation by OBPS.


(Please Make this smaller font size)** A Best Practice is defined as “a methodology that has repeatedly produced superior results relative to other methodologies with the same objective”. To be fully elevated to a best practice, a promising method will have been adopted and employed by multiple organizations.

Fix the indexer function to support the correct Elasticsearch index region.

The indexer.js file sets an environment variable:

const region = process.env.REGION || 'us-east-1';

which is used to build the requests for the Elasticsearch index. However, the CloudFormation template that deploys the function does not support a REGION parameter nor does it set the environment variable for you. The Cloudformation template should define a environment variable in function definition so that you can specify the region of the Elasticsearch index in case it is something other than us-east-1.

Allow section-level endorsement of documents by reviewers/panels

At times, Panels / reviewers will endorse components of methodological documents in the OBPS. The technology / metatdata should work towards allowing this granularity, promoting the synthesis of submissions with complementary and endorsed parts.

[Discussed at the PEGASuS / GOOS BioEco Panel meeting 2019-12-05]

Link submissions to the networks / programmes they come from

An expansion of the metadata field list, ideally against a controlled list of networks/programmes (new or unregistered programmes would have to register, xref to ODIS work).

image

This field would be distinct from publisher - and can have multiple entries identifying the networks/programmes it came from

Provide thematic search UXs

Some users may want to search the corpus from a specific axis (e.g. devices, environments, etc) - a tailored search interface for each of these major types (and associated UX) is desired. Providing these entry points would make the UX smoother.

Add "Endorsements" metadata field

xref #24 #27 #29

This field will have links to documents submitted by endorsing bodies (e.g. GOOS Panels) that will also be uploaded in the OBPS. For example:

Key Value
Endorsements [DOI of endorsement document]

Replace Ingest Queue with SNS utils and batch messaging.

Consider creating lib/sns-utils.ts with:

import { chunk } from 'lodash';
import pMap from 'p-map';
import { sns } from './aws-clients';

interface PublishBatchInput {
  topicArn: string
  messages: string[],
  concurrency?: number
}

export const publishBatch = async (
  params: PublishBatchInput
): Promise<void> => {
  const { concurrency, messages, topicArn: TopicArn } = params;

  await pMap(
    chunk(messages, 10),
    (messageChunk) => {
      const PublishBatchRequestEntries = messageChunk.map((Message, index) => ({
        Id: index.toString(),
        Message,
      }));

      return sns().publishBatch({
        TopicArn,
        PublishBatchRequestEntries,
      }).promise();
    },
    { concurrency }
  );
};

With that utility, this could be refactored to something like:

  const dSpaceItems = await pMap(
      feed.channel[0].item,
      (feedItem) => dspaceClient.find(
        dspaceEndpoint,
        'dc.identifier.uri',
        feedItem.link[0]
      ),
      { concurrency: 5 }
    );

    const uuids = dSpaceItems.flat().map((i) => i.uuid);

    await publishBatch({
      topicArn: ingestTopicArn,
      messages: uuids,
      concurrency: 5,
    });

That feels a bit easier to read.

Originally posted by @marchuffnagle in #106 (comment)

documentation on the index updater

we need information/documentation on the script that can be run to update the index after changes have been made to entries in the repository.
Paul has made and tested that script on 23/08/2019.

included metadata fields in the ORG

Ensure all the new metadata fields are incuded in the ORG upload ie metadata content from these fields is searchable
EH has included this with EOV field so possibly could be a joint EOV/ECV field?
Adding the ECVs to the OBPS tech would be more straightforward as most of them can go into ENVO - ASK PLB FOR CLARIFICATION

PLB: Best to use the ontology IRIs for these. Arno, contact me when you get here.

Controlled vocabularies not retrieved during submission process

During the submission of a new document, some metadata fields that have links to controlled vocabularies have issues. The links do not resolve, giving a "Not Found" error

vocabulary:https://www.oceanbestpractices.net/JSON/controlled-vocabulary?vocabularyIdentifier=paradis&metadataFieldName=dc_subject_parameterDiscipline 

For example:

Subject : Parameter Discipline:

Click the ‘Subject Categories’ link below to select appropriate parameter discipline keywords or phrases.

Need stop word lists for all supported ontologies.

OBP-250: As an admin, I want to be able to upload a text file with arbitrary terms for each vocabulary, to help reduce less meaningful search results.

We need the list of terms for at least the following ontologies:
[ ] CHEBI
[ ] ENVO
[ ] SDGIO
[ ] L05
[ ] L06
[ ] L22
[ ] WoRMS

It'd probably be easiest of those files were in csv format but whatever is easiest for now and we can decide on the format and we implement this feature.

Track citations of each document in literature

Crawl the peer reviewed literature and publications from other professional sources to evaluate the uptake / use / citation of an OBPS DOI (or a DOI mapped to that DOI).

Present this on the interface and dashboards

Update citations help text.

Change to:

To export this citation select individual record check boxes or click 'Select All' - a Download Citation box will display, click it and the records will be downloaded to your designated Download folder OR just copy and paste.

Dashboard: Documents by EOV

Create a dashboard to summarise the number of documents linked to each EOV, highlighting those that have been endorsed by a GOOS Panel.

Likely to be undertaken by CSIC.

Examine and fix tag exploration bugs

@paulpilone did something change in the backend relating to the semantic neighbourhood exploration?
It seems that the tag exploration isn't working as smoothly / evenly as it was before.

Looks okay for CHEBI and the NERC Vocabs, but the neighbourhood for ENVO classes isn't always displayed.
To replicate

  1. Search for "sea ice"
  2. Click on "View tags" for the first hit (Sea-Ice Information Services in the World. 3rd Edition, 2006 [SUPERSEDED by http://hdl.handle.net/11329/283])
  3. Clicking on "first year ice", "sea ice", or many other ENVO terms yields: "You must select a tag to view relationships"

This of course shouldn't be. Compare this to clicking on "A-factor" from CHEBI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.