GithubHelp home page GithubHelp logo

digideskio / gbif-dataset-metrics Goto Github PK

View Code? Open in Web Editor NEW

This project forked from datafable/gbif-dataset-metrics

0.0 2.0 0.0 12.68 MB

Get insights in GBIF-mediated datasets with charts and metrics.

Home Page: https://chrome.google.com/webstore/detail/gbif-dataset-metrics/kcianglkepodpjdiebgidhdghoaeefba

License: MIT License

Python 66.75% CSS 5.82% JavaScript 22.33% HTML 5.11%

gbif-dataset-metrics's Introduction

GBIF dataset metrics

Rationale

The Global Biodiversity Information Facility (GBIF) facilitates access to over 13,233 species occurrence datasets, collectively holding more than 570 million records. GBIF dataset pages are important access points to GBIF-mediated data (e.g. via DOIs) and currently show dataset metadata, a map of georeferenced occurrences, some basic statistics, and a paged table of download events. If a user wants to know more about the occurrences a dataset contains, he/she has to filter/page through a table of occurrences or download the data. Neither are convenient ways to get quick insights or assess the fitness for use.

Result

For the 2015 GBIF Ebbe Nielsen challenge, we developed a proof of concept for enhancing GBIF dataset pages with aggregated occurrence metrics. These metrics are visualized as stacked bar charts - showing the occurrence distribution for basis of record, coordinates, multimedia, and taxa matched with the GBIF backbone - as well as an interactive taxonomy partition and a recent downloads chart. Metrics that score particularly well are highlighted as achievements. Collectively these features not only inform the user what a dataset contains and if it is fit for use, but also help data publishers discover what aspects could be improved.

Screenshot

The proof of concept consists of two parts: 1) an extraction and aggregation module to process GBIF occurrence downloads and calculate, aggregate, and store the metrics for each dataset and 2) a Google Chrome extension, allowing you to view these metrics in context on the GBIF website.

For the 2015 GBIF Ebbe Nielsen Challenge - Round 2, we added a sample of the images referenced in (the occurrences of) a dataset. Together with the multimedia bar and achievement, it highlights the currently undervalued multimedia richness of some datasets. We also improved our extraction and aggregation module to process all GBIF occurrences on the Amazon EC2 infrastructure and are now able to provide metrics for all GBIF occurrence datasets. We strongly believe however, that the functionality of our proof of concept - if considered useful - should be implemented on the GBIF infrastructure. For our motivation on this, including its challenges and opportunities, see our feedback to the jury comments.

Installation

Install the Google Chrome Extension and visit a GBIF dataset page.

How it works


Limitations

  • The metrics are processed using a download of all occurrences on September 1, 2015. It contains 13,221 occurrences datasets, covering 570,238,726 occurrences. If a dataset is published or republished since then, it respectively won't have metrics or those might be out of date. If so, a message will be shown on the dataset page. If you want us to reprocess a specific dataset, submit an issue.

Follow @Datafable to be notified of new metrics or improvements.

Contributors

Developed by Datafable:

License

LICENSE

gbif-dataset-metrics's People

Contributors

bartaelterman avatar niconoe avatar peterdesmet avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.