GithubHelp home page GithubHelp logo

r-geoflow / geoflow Goto Github PK

View Code? Open in Web Editor NEW
40.0 7.0 14.0 2.41 MB

R engine to orchestrate and run (meta)data workflows

Home Page: https://github.com/r-geoflow/geoflow/wiki

R 100.00%
r geospatial spatial workflow data metadata fair inspire iso ogc orchestrator zenodo dataverse postgis ocs

geoflow's Introduction

geoflow

Build Status CRAN_Status_Badge Github_Status_Badge DOI

R engine to orchestrate and run (meta)data workflows

geoflow provides an engine to facilitate the orchestration and execution of metadata-driven data management workflows, in compliance with FAIR (Findable, Accessible, Interoperable and Reusable) data management principles. By means of a pivot metadata model, relying on the DublinCore standard, a unique source of metadata can be used to operate multiple and inter-connected data management actions. Users can also customise their own workflows by creating specific actions but the library comes with a set of native actions that have been identified as key steps most data managers, in particular actions oriented to the publication on the web of metadata and data resources to provide standard discovery and access services.

At first, default actions of the library were meant to focus on providing turn-key actions for geospatial (meta)data:

  • by creating manage geospatial (meta)data complying with ISO/TC211 and OGC geographic information standards (eg 19115/19119/19110/19139) and related best practices (eg. INSPIRE); and
  • by facilitating extraction, reading and publishing of standard geospatial (meta)data within widely used software that compound a Spatial Data Infrastructure (SDI), including spatial databases (eg. 'PostGIS'), metadata catalogues (eg. 'GeoNetwork', CSW servers), data servers (eg. GeoServer).

The library was then extended to actions for other domains:

  • biodiversity (meta)data standard management including handling of EML metadata, and their management with DataOne servers,
  • in situ sensors, remote sensing and model outputs (meta)data standard management by handling part of CF conventions, 'NetCDF' data format and OPeNDAP access protocol, and their management with Thredds servers,
  • generic / domain agnostic (meta)data standard managers (Dublin Core, DataCite), to facilitate the publication of data within (meta)data repositories such as Zenodo or DataVerse.

The execution of several actions will then allow to cross-reference (meta)data resources in each action performed, offering a way to bind resources between each other (eg. reference 'Zenodo' DOIs in 'GeoNetwork'/'Geoserver' metadata, or vice versa reference 'Geonetwork'/Geoserver' links in 'Zenodo' or EML metadata). The use of standardized configuration files (JSON format) allows fully reproducible workflows to facilitate the work of data and information managers.

Please check the online documentation for more details! (documentation in preparation)

For questions about using or contributing to geoflow, you can ask them in the discussions panel: https://github.com/r-geoflow/geoflow/discussions

Sponsors

Many thanks to the following organizations that have provided fundings for strenghtening the geoflow package:


The following projects have contributed to strenghten geoflow:

  • Blue-Cloud Blue-Cloud has received funding from the European Union's Horizon programme call BG-07-2019-2020, topic: [A] 2019 - Blue Cloud services, Grant Agreement No.862409.
  • CCSAFE
  • G2OI project, cofinanced by the European Union, the Reunion region, and the French Republic.

Sponsoring

For geoflow sponsoring/funding new developments, enhancements, support requests, please contact me by e-mail

Citation

We thank in advance people that use geoflow for citing it in their work / publication(s). For this, please use the citation provided at this link DOI

geoflow's People

Contributors

abennici avatar bastienird avatar eblondel avatar emilielerigoleur avatar jeroen avatar kikislater avatar wheintz avatar yvanlebras avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

geoflow's Issues

Enrich DOI CSV output table with Zenodo deposit action

The output CSV will contain:

  • Identifier: entity identifier
  • Status: simple status handling the Zenodo status draft or published
  • DOI_for_allversions: generic / top level DOI (named "conceptdoi" by Zenodo).
  • DOI_for_version: record version-specific DOI

Keep spatial coverage if not a bbox to enrich geometa with bounding polygons

The spatial coverage may be a simple bbox wkt but it could be also a more complex spatial coverage (eg. simplified polygons, simplified lines) less reductor than a bounding box. At now geoflow reduces the WKT into a bounding box, which results in a loss of meta information. Some logic should be added to keep such more detailed spatial coverage. In geometa action it will result in setting bounding polygons, in addition to the geographic bounding box.

Support for software argument handler

In some software, such as for DB connectors (#30), we need to be able to specify a handler function for a given argument.

For example, in the case of DBI, the main handler is DBI::dbConnect, but the first argument drv requires an instance of driver, that can be instantiated using DBI::dbDriver(<drivername>). The software argument definition could be extended with such handler, here in that case DBI::dbDriver.

The software function getHandlerInstance should then be able to evaluate such argument handler.

Support multiple entity types

By default, a single type will be specified as generic.
For Zenodo action, several types will be managed for:

  • upload: zenodoUploadType. If missing, by default, the generic type will be used
  • publication: zenodoPublicationType, specified if available and if zenodo upload type is set to publication
  • image: zenodoImageType, specified if available and if zenodo upload type is set to image

Add action for generating feature catalogue (ISO 19110)

The action should:

  • interpret entity data component to see if they are attributes and variables declared. When attributes are declared, the list of unique values will be added to the feature attribute description. If no attributes are listed, all attributes of the source features will be set by default (raw feature catalogue description).
  • produce an ISO 19110 feature catalogue ISO 19139 xml object
  • add FC scopes for geoflow (uploadType) and OpenFairViewer (query strategy among ogc_filters, ogc_dimensions - not yet supported in geoflow -, or ogc_viewparams) required for OFV feature catalogue handling

In addition:

  • action geonapi-publish-iso-19139 should detect the produced FC and publish it
  • action ows4R-publish-iso-19139 should detect the produced FC and publish it

Excluded from this task:

  • handling of vocabularies/codelists to enrich feature attribute listed values with labels/definitions

Add global option to skip file download

When files do not changes, either to update record metadata or to trigger the final record publication, in particular in Zenodo, we may want to skip the file download. This can be done by adding a global option skipFileDownload.

Capacity to configure SQL layers, either directly (data sql) or through sql file

The configuration of SQL layers could be done in two ways:

  • as SQL file path / URL put to a datasource. In that way geoflow will try to read the SQL file to pick up the SQL query (then set using geoflow_data$setSql()), or
  • directly as sql data property in the Data column

Some code will be experimented to read the SQL query, if a Db driver is set-up in the configuration. Reading the SQL query will enable dynamic metadata retrieval, as it is already done for shapefile (uploadType = 'shp')

Support multiple data sources

To manage multiple sources, the geoflow data component requires a refactoring. From the table viewpoint, the column "Data" will be refactored:

  • source: now the same logic as other columns can be applied. For a given file, we will specify the syntax filename@fileurl or filename@filepath. Example: sample.pdf@http://someurl.org/sample.pdf
  • sourceName: this property is removed. The source name is now the filename specified in source
  • upload: this remains. Until now it was mandatory to specify it, now if not specified, upload will be set to TRUE by default. upload = TRUE means data will be uploaded to the web-tool considered. If set to false, the data files in source will not be uploaded. This is valid for Zenodo or Geoserver
  • type: Given the fact it deals with upload task, it is now renamed to uploadType for more consistency. Until now it was mandatory to specify it, now if not specified, the uploadType will be set to 'other'.
  • identifier: this is specific to the production of OGC web layers, this will be renamed layername

Example of a Data element for a Zenodo file upload:

source:sample1.pdf@http://www.africau.edu/images/default/sample.pdf,sample2.pdf@https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf;
uploadType:other
upload:true

Example of a Data element for a Geoserver shapefile upload:

source:shapefile1.zip@D://sandbox-geoflow/shapefile1.zip;
uploadType:shp;
upload:true;
layername:layer1

Zenodo: allow to upload multiple files separately or as Zip

new data options (to use in Data entity column):

  • uploadZip: TRUE/FALSE; Default is FALSE. If TRUE, data source files will be zipped, and the zip will be uploaded in Zenodo action.
  • uploadZipOnly: TRUE/FALSE; Default is FALSE. Ignored if uploadZip is FALSE. If TRUE, only the zip file will be kept and upload in Zenodo action.

Support registers handling

The principe is to let users configure registers. Later geoflow may embedd default common registers, eg country codelists, etc. For now no register included in geoflow, but the possibility to configure registers in the configuration with registers.

Different handling of gsheet entities depending on 'readr' availability

Depending on readr availability in R session, gsheet::gsheet2tbl will not return the same output. Indeed as stated in the documentation: If the package readr is available, then it will be used. This can produce slightly different, but normally better, parsings.

To make sure that geoflow entity/contact loading works properly in the same way for all. readr is going to be added.

Support upload of Geotiff coverage

Manage upload of raster (coverage) resources including Geotiff format supported by GeoServer base distribution. other formats will be tackled in different tickets as not directly targeted by use cases now.

  • ArcGrid,
  • WorldImage,
  • ImageMosaic.

See eblondel/geosapi#31.

Additional support required for extensions such as NetCDF.

Spatial Extent troubles

Hi again Emmanuel,

Spatial extent of datasets with native EPSG is not visible in geonetwork (although it's in the xml document).

Cheers,

wilfried

Zenodo action: support update based on concept DOI / DOI

When a DOI is available as identifier of an entity, the Zen4R action should try to get existing Zenodo record based on this DOI, first with a attempt by Concept DOI (assuming the DOI is a concept DOI), next to get record by simple DOI. This will allow to get/update published records (update metadata only) including existing versions of a record that are uniquely identified with a DOI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.