GithubHelp home page GithubHelp logo

inveniosoftware / invenio-rdm-records Goto Github PK

View Code? Open in Web Editor NEW
15.0 75.0 78.0 3.99 MB

DataCite-based data model for InvenioRDM flavour.

Home Page: https://invenio-rdm-records.readthedocs.io

License: MIT License

Python 72.91% Shell 0.13% JavaScript 18.99% HTML 0.12% XSLT 6.27% Jinja 1.57%

invenio-rdm-records's Introduction

Invenio-RDM-Records

image

image

image

image

DataCite-based data model for Invenio.

Further documentation is available on https://invenio-rdm-records.readthedocs.io/

Development

Install

Choose a version of search and database, then run:

pipenv run pip install -e .[all]
pipenv run pip install invenio-search[<opensearch[1]>]
pipenv run pip install invenio-db[<[mysql|postgresql|]>]

Tests

pipenv run ./run-tests.sh

invenio-rdm-records's People

Contributors

alejandromumo avatar alexdutton avatar anikachurilova avatar carlinmack avatar chriz-uniba avatar dfdan avatar egabancho avatar fenekku avatar frankois avatar glignos avatar ines-cruz avatar jennur avatar jrcastro2 avatar kpsherva avatar lnielsen avatar max-moser avatar mb-wali avatar ntarocco avatar phette23 avatar pineirin avatar psaiz avatar ptamarit avatar rekt-hard avatar slint avatar sotostsepe avatar tlgino avatar tmorrell avatar utnapischtim avatar yashlamba avatar zzacharo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

invenio-rdm-records's Issues

review theme module structure

Is the __init__.py file needed: I assume yes cuz is a python module. Next question, is the version import needed? I think is not, shall it be removed?

from ..version import __version__

__all__ = ('__version__', )

datamodel: create datacite model based on zenodo

InvenioRDM data model

Metadata fields

  • Resource Type (resourceTypeGeneral)
    • {'resourceTypeGeneral': ..., 'resourceType': ...} -> {'type': ..., 'subtype': ...}
    • Cannot use the CV from DataCite.
    • We should have a mapping from internal types to DataCite types
    • Two level type: Type and Subtype (see Zenodo implementation).
  • Identifiers:
    • wait
  • Creators: See #19
    • Use DataCite
  • Titles
    • title (string field)
    • additional_titles (like DataCite titles)
  • Publisher:
    • Use DataCite
    • (to be discussed in detail)
  • Publication Year:
    • Use publication_date instead. Must be a valid date. In the future we can support extended time format or similar to represent only years.
  • Subjects
    • Use DataCite
    • To be investigated if we can simplify
  • Contributors: See #19
    • Use DataCite
  • Dates
    • Use DataCite (and see Zenodo implementation)
    • Special dates: publication_date, embargo_date.
    • How to deal with Embargo Date?
    • Check Zenodo for description text
  • Language
    • Use DataCite
  • Alternate and Related Identifiers:
    • Use DataCite
    • Issues with naming and metadata schema
    • Issue with resourceTypeGeneral
  • Sizes
    • Use DataCite
    • sizes. Not present in Zenodo, followed DataCite.
      Questions:
      • Shall we separate the unit (e.g. MB) from the number? This would be helpful for sorting/filtering.
      • Shall it be only one size in the file? Do we want metadata sizes too?
  • Formats
    • Use DataCite
      Questions:
      • Shall it be only one format in the file? Do we want metadata formats too (This does not make much sense IMHO)?
  • Version:
    • Use DataCite
  • Rights list:
    • Use DataCite
    • TO be understood if we can use it to link to a license vocabulary or we need a separate license field.
  • Descriptions:
    • description (string field)
    • additional_descriptions (like DataCite descriptions)
    • How does it relate to Zenodo notes field (Note, abstract, descriptions).
    • GeoLocations:
    • Use Zenodo (see needs a description text)
  • FundingReference:
    • Use DataCite
    • However, we need to understand like with licenses/rights how to link to a controlled vocabulary.

Extras for all

  • Add references (need to check zenodo structure if ok). Not all has related identifiers, sometimes is just a reference to the text.
  • internal notes (discuss with ILS what they have). "curators notes", non-public notes.
  • Access right. Discuss _access TBD in the future.
  • Access condition

How to hanlde custom fields?

  • See Zenodo implementation (custom)

Zenodo custom fields

  • Imprint
  • Journal
  • Part of?
  • Thesis

TODO

  • Check against ILS schema.

Discussion points for community:

  • Keywords or only subjects?

*CV: Control Vocabulary

publication_date: support for EDTF lvl 0

Currently only publication_date is specified to support EDTF.

However, all the rest of fields have date-time as format. Should they only be of format date? date was only introduced in JSONSchema draft 7 (more info here).

Redefinition of this task ;)

As a depositor, I may be unsure about the publication_date of a record (e.g. sometime in World War II), but I want to convey this range as the publication date, because a publication date is needed to mint a DOI (DataCite) and it is more informative than nothing.

technical implications
Allow publication_date field to be an EDTF of lvl 0

  • marhsmallow
  • jsonschema
  • elasticsearch + create a lower end date for sorting purposes

Potential simplification of additional titles/descriptions

We could potentially collapse additional_descriptions and additional_titles fields into a descriptions and titles fields resp. if we build an indexing hook (or other means) to allow description=mySearchTerm and title=mySearchTerm to still be valid search box searches.

Support less specific dates

When dealing with historic material, dates are not always known to an accuracy of a specific day.

Would it be possible to support YYYY / YYYY-MM (and even, dare I ask -MM-DD) both in publication_date, and the general dates?

datamodel: reference sheet

This is a reference sheet for the core metadata shared by InvenioRDM records:

Jsonschema as of 2019-12-20 :

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "id": "http://localhost/schemas/records/record-v1.0.0.json",
  "title": "Invenio Datacite based Record Schema v1.0.0",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "_access": {
      "metadata_restricted": {
        "default": false,
        "description": "Record metadata accesibility. Public by default (False).",
        "type": "boolean"
      },
      "files_restricted": {
        "default": false,
        "description": "Record associated files accesibility. Public by default (False).",
        "type": "boolean"
      }
    },
    "_bucket": {
      "description": "Record bucket.",
      "type": "string"
    },
    "access_right": {
      "default": "open",
      "description": "Access right for record.",
      "type": "string"
    },
    "additional_descriptions": {
      "type": "array",
      "items": {
          "type": "object",
          "properties": {
              "description": {
                "description": "Description/abstract for record.",
                "type": "string"
              },
              "description_type": {
                "description": "Type of description.",
                "type": "string"
              },
              "lang": {
                "description": "Language of the description. ISO 639-3 language code.",
                "type": "string",
                "maxLength": 3
              }
          },
          "required": ["description", "description_type"]
      },
      "uniqueItems": true
    },
    "additional_titles": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
            "title": {
              "description": "Title of the record.",
              "type": "string"
            },
            "title_type": {
              "description": "Type of title.",
              "type": "string"
            },
            "lang": {
              "description": "Language of the title. ISO 639-3 language code.",
              "type": "string",
              "maxLength": 3
            }
        },
        "required": ["title"]
      },
      "uniqueItems": true
    },
    "contributors": {
      "description": "Contributors in order of importance.",
      "minItems": 1,
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "properties": {
          "ids": {
            "description": "List of IDs related with the person.",
            "type": "array",
            "uniqueItems": true,
            "items": {
              "additionalProperties": false,
              "type": "object",
              "properties": {
                "source": {
                  "type": "string"
                },
                "value": {
                  "type": "string"
                }
              }
            }
          },
          "name": {
            "description": "Full name of person or organisation. Personal name format: family, given.",
            "type": "string"
          },
          "affiliations": {
            "description": "Affiliation(s) for the purpose of this specific record.",
            "type": "array",
            "uniqueItems": true,
            "items": {
              "type": "string"
            }
          },
          "email": {
            "type": "string",
            "description": "Contact email for the purpose of this specific record.",
            "format": "email"
          },
          "role": {
            "description": "",
            "type": "string"
          }
        },
        "required": [
          "name"
        ]
      }
    },
    "dates": {
      "description": "Date interval.",
      "items": {
        "additionalProperties": false,
        "properties": {
          "description": {
            "description": "Description of the date interval.",
            "type": "string"
          },
          "end": {
            "description": "End date.",
            "type": "string",
            "format": "date-time"
          },
          "start": {
            "description": "Start date.",
            "type": "string",
            "format": "date-time"
          },
          "type": {
            "description": "Type of the date interval."
          }
        },
        "required": [
          "type"
        ],
        "type": "object"
      },
      "type": "array"
    },
    "description": {
      "description": "Description for record.",
      "type": "string"
    },
    "embargo_date": {
      "description": "Embargo date of record (ISO8601 formatted date).",
      "type": "string",
      "format": "date-time"
    },
    "keywords": {
      "description": "Free text keywords.",
      "items": {
        "type": "string"
      },
      "type": "array"
    },
    "language": {
      "description": "Primary language of the resource. ISO 639-3 language code.",
      "type": "string",
      "maxLength": 3
    },
    "owners": {
      "description": "List of user IDs that are owners of the record.",
      "items": {
        "type": "number"
      },
      "type": "array",
      "minItems": 1,
      "uniqueItems": true
    },
    "publication_date": {
      "description": "Record publication date (IS8601-formatted). EDTF support to be added for field.",
      "type": "string",
      "format": "date-time"
    },
    "recid": {
      "description": "Invenio record identifier (alphanumeric).",
      "type": "string"
    },
    "resource_type": {
      "additionalProperties": false,
      "description": "Record resource type.",
      "properties": {
        "subtype": {
          "description": "Specific resource type.",
          "type": "string"
        },
        "type": {
          "default": "publication",
          "description": "General resource type.",
          "type": "string"
        }
      },
      "required": [
        "type",
        "subtype"
      ],
      "type": "object"
    },
    "rights": {
      "description": "Any rights information for this resource.",
      "type": "array",
      "items": {
          "type": "object",
          "properties": {
              "rights": {
                "description": "The right itself. Free text.",
                "type": "string"
              },
              "uri": {
                "description": "The URI of the license.",
                "type": "string",
                "format": "uri"
              },
              "identifier": {
                "description": "A short, standardized version of the license name.",
                "type": "string"
              },
              "identifier_scheme": {
                "description": "The name of the scheme.",
                "type": "string"
              },
              "scheme_uri": {
                "description": "The URI of the identifier_scheme.",
                "type": "string",
                "format": "uri"
              },
              "lang": {
                "description": "Language of the right information. ISO 639-3 language code.",
                "type": "string",
                "maxLength": 3
              }
          }
      },
      "uniqueItems": true
    },
    "title": {
      "description": "Record title.",
      "type": "string"
    },
    "version": {
      "description": "Record version tag.",
      "type": "string"
    }
  },
  "required": [
    "_access",
    "access_right",
    "contributors",
    "description",
    "owners",
    "publication_date",
    "resource_type",
    "title"
  ]
}

Fields

  • _access / access_right : See #37

  • Access levels : See #47

  • Authors / Creators / Collaborators : See RFC: inveniosoftware/rfcs#11 Implementation: #19

  • Dates

    • Use DataCite (and see Zenodo implementation)
    • Special dates: publication_date, embargo_date.
    • How to deal with Embargo Date?
    • Check Zenodo for description text
    • See: #22, #23
  • FundingReference:

    • Use DataCite
    • However, we need to understand like with licenses/rights how to link to a controlled vocabulary.
  • Identifiers: See inveniosoftware/rfcs#11

    • scheme
    • controlled vocabulary
    • Issues with naming and metadata schema
    • Issue with resourceTypeGeneral
  • Language

    • Use DataCite
  • publication_date (instead of publication_year):

    • Must be a valid date. In the future we can support extended time format or similar to represent only years.
  • Publisher:

    • Use DataCite
    • (to be discussed in detail)
  • Resource Type (resourceTypeGeneral)

    • {'resourceTypeGeneral': ..., 'resourceType': ...} -> {'type': ..., 'subtype': ...}
    • Cannot use the CV from DataCite.
    • We should have a mapping from internal types to DataCite types
    • Two level type: Type and Subtype (see Zenodo implementation).
    • Required: See #43
    • Customization: See #5
  • Sizes

    • Use DataCite
    • sizes. Not present in Zenodo, followed DataCite.
      Questions:
      • Shall we separate the unit (e.g. MB) from the number? This would be helpful for sorting/filtering.
      • Shall it be only one size in the file? Do we want metadata sizes too?
  • Formats

    • Use DataCite
      Questions:
      • Shall it be only one format in the file? Do we want metadata formats too (This does not make much sense IMHO)?
  • Rights list:

    • Use DataCite
    • TO be understood if we can use it to link to a license vocabulary or we need a separate license field.
  • Subjects

    • Use DataCite
    • To be investigated if we can simplify
    • See #3
  • Titles

    • title (string field)
    • additional_titles (like DataCite titles)
  • Version:

    • Use DataCite

Extras for all

  • Add references (need to check zenodo structure if ok). Not all has related identifiers, sometimes is just a reference to the text.
  • internal notes (discuss with ILS what they have). "curators notes", non-public notes.
  • Access condition
  • "internal fields" : See #38

How to handle custom fields?

  • See Zenodo implementation (custom)
  • See #2

Zenodo custom fields

  • Imprint
  • Journal
  • Part of?
  • Thesis

TODO

  • Check against ILS schema.

Discussion points for community:

  • Keywords or only subjects?

*CV: Control Vocabulary
updated 2019-08-16 with comments below.
updated 2019-12-20 with issues.

identifiers field (and subfields) as dict

The identifiers field is an array of scheme and identifier.

Pros:

  • matches up well with a potentially reusable UI component that adds an element to a list.
  • allows for preference among different identifiers (1 one is the preferred one to be displayed)

Cons: (in contrast with identifiers as a dict of scheme key and identifier value)

  • no structural enforcement of unique scheme / identifier
  • no reason could not create a reusable UI component that could do the trick for dict approach
  • no reason could not add preference as a field (but would be finiky to be honest)

Perhaps we should switch it to a dict.

After In video life (IVL, I am coining this :) ) conversation

Use dict for identifiers in top level identifiers and creators/contributors identifiers. Other cases have more data. To be seen on a case by case basis.

customization: resource types

Support custom resource types. There are different approaches:

  • Add the list as a configuration variable in invenio.cfg.
  • Create a command in the CLI (in rdm-records or in the scritps) in order to allow: load, search, removal, and edit of the types.

Note: We must consider the maximum size we expect this list to grow, as in the second case we need to store them in DB. Consider the use of controlled vocabularies.

demo: create and load records

As a hoster/developer I want to be able to test my instance with demo data.

  • Create demo dataset.
  • Create a script to load the data in the repository. (Note: A fixture approach might be easier)

Add files to records

Invenio 3.2 brings a solid Files bundle that we want to rely on to attach files to records.

This task includes integration of the Files upload API.

We will use prior files' fields for the data model.

datamodel: treating internal fields with `_`

Currently the internal fields are prefix with underscore. However, the user submits the record without them (e.g. access).

An issue also arises when displaying the record. Currently the record view shows the record with _access.

  • Marshmallow?
  • Record API class?

Mint records with DataCite-like provider

Confirmation is needed on this task:

Implement and integrate minters to generate internal record persistent identifiers (pid_type=recid). Rather than legacy record identifiers, random 10-character alphanumeric string (with checksum) should be used.

This is the task that will finally connect base32-lib functionality with record minting (via invenio-pidstore).

Data model extension

As a developer-hoster I want to be able to extend the core metadata schema with fields specific to my domain so that my metadata schema addresses my needs.

These custom fields are limited to the following types: array, string, integer and date.

Links of interest
Marshmallow schema

jsonschema

Inner filling for ES indexing

An example of such extension would be added biomedical profile fields:

The "biomedical" profile will have the following extensions to the core metadata (see #1)

Field or equivalent Notes Why Task Implemented
Language language Language of the content, optional. Just 1 for now. filtering, legacy in main model as language
Presentation location presentation_location Location where content was presented, optional. Applies to exhibits and presentations. Geolocation values or controlled vocabulary values or both? filtering, legacy
Content location content_location Location pertaining to the content itself. (e.g., Uganda for dataset of vaccinated population in Uganda). Optional. Geolocation values or controlled vocabulary values (Feed from MeSH) or both? We may or may not be able to stick this in terms searching, filtering, legacy in main model as location
Number in sequence number_in_sequence Indicates page or order of record in an ensemble. Integer, optional sorting, in collection record ordering, synergy with part relation type
Private Note private_note Free text, optional. Used internally for repo managers. SUPER_USER, librarian, owner, proxy can see it. Need to understand use case better? in main model as internal_notes
Subject: Name (re-use terms) Name of person/organization referred in content (e.g. book about someone). Optional. Fed from controlled vocabulary legacy, searching, filtering in main model as subjects

Note that acknowledgements are not included for now.
"Abstract" and "content date" will be addressed by core metadata.

dependencies: pin invenio-records-permissions

setup.py includes an unpinned reference to invenio-records-permissions. This is to avoid not noticing breaking changes. Once a stable release of invenio-records-permissions has been release it should be pinned.

jsonschema: removed fields

The following fields have been removed in order to use DataCite's schema according to what was decided #18.

  • contributors:
    • ids

Validation errors don't get passed for resource_type

Typically when one creates a record with an incorrect field a validation error with helpful message is returned:

e.g. 'titles': [{'title': 'A Romans story', 'type': 'Otherss', 'lang': 'eng'}] returns an errors field that includes Invalid title type. Otherss not one of MainTitle, AlternativeTitle, Subtitle, TranslatedTitle, Other

However when one provides an incorrect resource_type in either the type or subtype field no errors field is returned:

e.g. 'resource_type': {'type': 'imagess', 'subtype': 'photo'} only returns {"status": 400, "message": "Validation error."}

ui: creator/contributor icon

In contributors.html:

<span class="text-muted" {% if creator.affiliations and creator.affiliations[0] %}data-toggle="tooltip" title="{{creator.affiliations[0].name}}"{% endif %}>{{creator.name}}</span>{% if not loop.last %}; {% endif %}

Could eventually get away with no ; and always have an icon (even if generic) next to a creator

Implement Creators and Contributors equivalent metadata fields

This issue tracks implementation of the "creators" / "contributors" metadata fields.

An accompanying RFC would be the place to define and document to others those fields (I think).

Priming content for RFC (should be moved to it):

Schema discussion starting point

    "<creators/contributors>": {
      "description": "Contributors in order of importance.",
      "minItems": 1,
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "properties": {
          "ids": {
            "description": "List of IDs related with the person.",
            "type": "array",
            "uniqueItems": true,
            "items": {
              "additionalProperties": false,
              "type": "object",
              "properties": {
                "source": {
                  "type": "string"
                },
                "value": {
                  "type": "string"
                }
              }
            }
          },
          "name": {
            "description": "Full name of person or organisation. Personal name format: family, given.",
            "type": "string"
          },
          "affiliations": {
            "description": "Affiliation(s) for the purpose of this specific record.",
            "type": "array",
            "uniqueItems": true,
            "items": {
              "type": "string"
            }
          },
          "email": {
            "type": "string",
            "description": "Contact email for the purpose of this specific record.",
            "format": "email"
          },
          "role": {
            "description": "",
            "type": "string",
            "enum": [
              "ContactPerson",
              "Researcher",
              "Other"
            ]
          }
        },
        "required": [
          "name"
        ]

For unknown authors, use DataCite's unknown.

See w3C recommendations on names.

Organization as author is something we will eventually want. Perhaps they are an alternative with their own fields. The organization use case may also be an opportunity to solve the "too many authors" problem. Organization ids: Research Organization Registry

Why this field, these properties and this implementation?

  • For citation purposes
  • For DOI minting
  • To respect w3C standard
  • To disambiguate authors
  • To allow auto-complete from a source

version: new schema

Switch to the new versioning schema:

  • Change setup.py to not be alpha
  • Update version to 0.0.1
  • Requires invenio-records-permissions released under the new schema first

dates: support for ranges and dates metadata

Currently dates are specified as two fields (start + end). This comes from ES2 and makes it difficult with certain advanced range queries (e.g. intersections).

ES6.x onwards has the range type, which would allow this type of queries.

However, we should also take into account actual metadata from type date. For example, from @fenekku:

for actual date metadata (as opposed to created and updated which are meta-metadata). We have "present day" books about "olden days" medical practices for instance.

Open discussion and possible solutions:

  1. Keep the same structure and not give support to advance queries. In this case both use cases (ranges and content metadata) could be address with a structure similar to:
"dates": {
      "description": "Related dates and intervals.",
      "items": {
        "additionalProperties": false,
        "properties": {
          "description": {
            "description": "Description of the date or interval.",
            "type": "string"
          },
          "value": {
            "description": "Date value in ISO-8601 format. If interval, this is the start.",
            "type": "string"
          },
          "value_end": {
            "description": "End date value of interval.",
            "type": "string"
          },
          "type": {
            "description": "Type of date/interval: 'created', 'updated', 'content'..."
          }
        },
        "required": [
          "type", "value"
        ],
        "type": "object"
      },
      "type": "array"
    },
  1. Have two fields. One for ranges, using the ES6.x+ range type, and one for dates metadata. This might require the creation of some sort of query parsing.
  2. One of the two should be added as custom field.

templates: bug rendering record detail template due to edtf dates

The following code in record_landing_page.html does not support EDTF dates:

{{ record.publication_date|to_date|dateformat(format='long') }}

Example traceback:

Traceback (most recent call last):
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/app.py", line 2463, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/werkzeug/middleware/proxy_fix.py", line 232, in __call__
    return self.app(environ, start_response)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/werkzeug/middleware/dispatcher.py", line 66, in __call__
    return app(environ, start_response)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/app.py", line 2449, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/app.py", line 1866, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/invenio_records_ui/views.py", line 205, in record_view
    return view_method(pid, record, template=template, **kwargs)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/invenio_records_ui/views.py", line 227, in default_view_method
    record=record,
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/templating.py", line 140, in render_template
    ctx.app,
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/flask/templating.py", line 120, in _render
    rv = template.render(context)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "/Users/lnielsen/src/invenio-rdm-records/invenio_rdm_records/theme/templates/invenio_rdm_records/record_landing_page.html", line 9, in top-level template code
    {%- extends config.BASE_TEMPLATE %}
  File "/Users/lnielsen/src/invenio-app-rdm/invenio_app_rdm/theme/templates/invenio_app_rdm/page.html", line 7, in top-level template code
    #}
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/invenio_theme/templates/invenio_theme/page.html", line 28, in top-level template code
    {%- endblock head_title %}
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/invenio_theme/templates/invenio_theme/page.html", line 31, in block "body"
    {%- if keywords %}<link rel="canonical" href="{{ canonical_url }}"/>{% endif %}
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/invenio_theme/templates/invenio_theme/page.html", line 32, in block "body_inner"
    {%- block head_links_langs %}
  File "/Users/lnielsen/src/invenio-rdm-records/invenio_rdm_records/theme/templates/invenio_rdm_records/record_landing_page.html", line 12, in block "page_body"
    {{ webpack['invenio-app-rdm-theme.css'] }}
  File "/Users/lnielsen/src/invenio-rdm-records/invenio_rdm_records/theme/templates/invenio_rdm_records/record_landing_page.html", line 13, in block "record_body"
    {{ webpack['invenio-rdm-records-theme.css'] }}
  File "/Users/lnielsen/src/invenio-rdm-records/invenio_rdm_records/theme/views.py", line 52, in to_date
    return arrow.get(date_string).date()
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/arrow/api.py", line 21, in get
    return _factory.get(*args, **kwargs)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/arrow/factory.py", line 196, in get
    dt = parser.DateTimeParser(locale).parse_iso(arg)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/arrow/parser.py", line 211, in parse_iso
    return self._parse_multiformat(datetime_string, formats)
  File "/Users/lnielsen/envs/cli/lib/python3.6/site-packages/arrow/parser.py", line 494, in _parse_multiformat
    string, ", ".join(formats)
arrow.parser.ParserError: Could not match input '1970-06-16/2003-12-05' to any of the following formats: YYYY-MM-DD, YYYY-M-DD, YYYY-M-D, YYYY/MM/DD, YYYY/M/DD, YYYY/M/D, YYYY.MM.DD, YYYY.M.DD, YYYY.M.D, YYYYMMDD, YYYY-DDDD, YYYYDDDD, YYYY-MM, YYYY/MM, YYYY.MM, YYYY

communities: schema fields

Currently the schema for the communities field is:

"community": {
      "type": "object",
      "properties": {
        "primary": {"type": "string"},
        "secondary": {
          "type": "array",
          "minItems": 0,
          "items":{"type": "string"}
        }
      }
    }

Is this enough or should we add something like (also applicable to secondary):

{
  "primary": {
    "type": "object",
    "properties": {
      "name": { "type": "string"},
      "identifier": {"type": "string"}
    }
} 

marshmallow: upgrade to version 3

Current Marshmallow schemas are made for version 2. Until Invenio v3.2 is released the compatibility with version 3 is not widely available around Invenio.

For now, it is pinned 'marshmallow<3' in the setup.py. Fix and unpin once Invenio v3.2 is out.

Ref

Reuse IdentifierScheme

From @ppanero

IdentifierScheme could be reused in many other schemas. However, there is no easy and clean way to flatten its attributes. Tested Pluck and Method. The latter worked but it has a more difficult code comprehension, which in my perspective makes is not a good choice.

ui: stats collapsed message

Currently it reads "See more details" even when rolled out. Then is should read "Show less details"

I tried to make it with the same logic than the files. However, it was an inconsistent behavior:

1- First time showing "See less details" when collapsed
2- After the first click it keeps showing "See less details"
3- In next clicks, it changes between labels as expected :O

Terms datamodel (index)

The "terms" index contains:

  • source : keyword
  • id : keyword
  • value : text or keyword?
  • definition : text
  • deprecated : bool
  • suggest : list of stopword-removed words making up the value that will be used by the suggester.

As a depositor, when I search for a term through the auto-complete feature, I want to be able to select one word terms. I don't want them to be occluded by other keywords that contains the one word, but are longer.

To solve the above issue, a scoring algorithm on the suggest endpoint can be used to give a higher score to shorter suggest list (I think).

Combine / disambiguate _access and access_right

2 parts to this task: discuss the differences / combination and implement.

access_right can currently be:

  • 'open'
  • 'embargoed'
  • 'restricted'
  • 'closed'

'_access' can be:

    'metadata_restricted': <true|false>    
    'files_restricted': <true|false>

This leads to strange combinations: an open access record with files restricted or a record with metadata restricted but files not restricted or an open record with metadata restricted... The first in particular is something participants in our usability sessions have complained about: "An open access record should have its files available" to paraphrase.

What also makes things hard to keep straight is that we have a rights field (the license) and an access_levels field.

Because of this, I suggest we merge them together. This way, only combinations that make sense are possible and the combinations are not across different fields but within 1 field. We can always have the UI reflect what we want from this 1 field. Enforcing a strict semantic we can cover most cases (I think?) with something like the following for example:

New "access_right" field metadata available? files available?
open ✔️ ✔️
embargoed ✔️ ❌ until embargo date (unless user has permissions)
scheduled (or embargoed with added metadata) ❌ until embargo date (unless user has permissions) ❌ until embargo date (unless user has permissions)
restricted ✔️ ❌ (unless user has permissions)
closed private ❌ (unless user has permissions) ❌ (unless user has permissions)

I am completely fine with having access_right be bibliographic AND "interpreted" metadata.

What are your thoughts @lnielsen @ppanero ?

[UPDATED 2019-12-20]

DOI template fails if there is no DOI

Similar to #69 - the DOI template assumes a DOI is present and fails to render if the identifier(s) are of a separate scheme.

EDIT from discussions below
A record may not have a DOI. This task is about fixing the template to account for that without crashing.

Validate marshmallow identifiers via idutils

We could use idutils to validate identifier scheme in the marshmallow schema.

Make sure that references to identifiers are made to their lowercase form.

There was talk about distinguishing identifiers for different uses / targets (people / orgs / objects)
Should we (invenio-rdm-records) check for those appropriately / is that idutils jobs ?

[Parent Issue] datamodel: still todo

TODO as outcome of #49:

  • Implement versioning (conceptrecid minting)
  • OAI serializing/schema, add to tests and fixtures.
  • Customization of enums
  • Resource types are loaded from a JSON file. We need to converge on how to load them, for all enums.
  • IdentifierScheme could be reused in many other schemas. However, there is no easy and clean way to flatten its attributes. Tested Pluck and Method. The latter worked but it has a more difficult code comprehension, which in my perspective makes is not a good choice.
  • Tests for access_condition since the access part is not fully defined yet. It is not added to the fixtures either.
  • Tests for FilesSchema, since is dump_only (see Zenodo). How should this be tested?
  • Access: #37

marshmallow: evaluate fields.Str vs SanitizedUnicode

Several fields, such as type, are strings that can only take as value one of an enumeration. Therefore at the moment there is no need for them to be sanitized and could go along with just being a Str(). However, if we open these to customization (e.g. introduce your own CVs) it might cause problems (maybe in other languages that are not English).

Run some tests on performance from marshmallow.fields.Str vs invenio_records_rest.schemas.fields.SanitizedUnicode.

1 load/dump:

Sanitized: 0.0006556510925292969 seconds
Marshmallow Str: 0.0002028942108154297 seconds

1000 load/dumps:

Sanitized:  0.18001985549926758 seconds
Marshmallow Str: 0.09990668296813965 seconds

100000 load/dumps:

Sanitized: 18.859431743621826 seconds
Marshmallow Str: 10.20322585105896 seconds

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.