GithubHelp home page GithubHelp logo

"Identifiers review" recipe about the-fair-cookbook HOT 49 CLOSED

 avatar commented on August 23, 2024
"Identifiers review" recipe

from the-fair-cookbook.

Comments (49)

AlasdairGray avatar AlasdairGray commented on August 23, 2024 1

Thanks both for the responses. I missed the update of the recipe (too much marking).

I think the lack of understanding around the target audience and how many recipes to split this into are the challenges.

I'll take a look over the revised draft.

from the-fair-cookbook.

 avatar commented on August 23, 2024

Dear Chris, dear Alasdair,

I would like to invite you to contribute a recipe to the FAIR cookbook.

It is supposed to be an β€œingredient description”, i.e. including some basic facts, and some first references to turn to.

To clarify the scope of this recipe together, I would like to have a short abstract / objective for this recipe as a first step, and if we see that we want to take different perspectives on this broad topic, split it up into multiple recipes.

I would like to have the objective = abstract ready by end of next week, i.e. on Friday, 10.04.2020.

Could you draft something until then, or already indicate your unavailability for this task at this time?

I am always happy about any feedback, via email or a call. Right now we are in the process of getting the recipes ramped up, it would be great if you can contribute already now. :)

If you have any questions about the recipe fitting into the scope of the overall cookbook, you might want to turn to the current draft of the Table of Contents, available here: https://docs.google.com/spreadsheets/d/13B2aLm5ZXFUwAu6DlcL7soWL8yzX6juR4ijgUpzIsv4/edit?ts=5e662956#gid=818333385

Looking forward to hearing from you,
Robert

from the-fair-cookbook.

 avatar commented on August 23, 2024

/remind me on Tuesday

from the-fair-cookbook.

reminders avatar reminders commented on August 23, 2024

@robertgiessmann set a reminder for Apr 6th 2020

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

I'll try to draft something on Monday, but I will be taking some leave from the middle of next week into the following week.

/remind me on Monday

from the-fair-cookbook.

reminders avatar reminders commented on August 23, 2024

@AlasdairGray set a reminder for Apr 6th 2020

from the-fair-cookbook.

 avatar commented on August 23, 2024

I'll try to draft something on Monday, but I will be taking some leave from the middle of next week into the following week.

/remind me on Monday

Great, thanks!

from the-fair-cookbook.

 avatar commented on August 23, 2024

/remind me on Tuesday

from the-fair-cookbook.

reminders avatar reminders commented on August 23, 2024

@robertgiessmann set a reminder for Apr 7th 2020

from the-fair-cookbook.

 avatar commented on August 23, 2024

/remind @AlasdairGray on Monday

from the-fair-cookbook.

reminders avatar reminders commented on August 23, 2024

@robertgiessmann set a reminder for Apr 6th 2020

from the-fair-cookbook.

reminders avatar reminders commented on August 23, 2024

πŸ‘‹ @AlasdairGray,

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

@robertgiessmann sorry, I didn't get to it yesterday. An critical issue came up in another project. Hopefully I'll get to it this afternoon.

from the-fair-cookbook.

 avatar commented on August 23, 2024

Sure, no problem! Looking forward to it! :)

/remind me on Wednesday, 9am

from the-fair-cookbook.

reminders avatar reminders commented on August 23, 2024

@robertgiessmann set a reminder for Apr 8th 2020

from the-fair-cookbook.

reminders avatar reminders commented on August 23, 2024

πŸ‘‹ @robertgiessmann,

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

Started drafting thing in https://docs.google.com/document/d/1oLSte9rPKOS_sXAq7_SBFWZ75vfyMTBHoMQXZPCww3A/edit?usp=sharing

from the-fair-cookbook.

Chris-Evelo avatar Chris-Evelo commented on August 23, 2024

I will try to look at it on Friday. Until then fully blocked. I remember we had an ELIXIR-Excelerate report on this 1-2 years ago. That might help a lot

from the-fair-cookbook.

proccaserra avatar proccaserra commented on August 23, 2024

@AlasdairGray @Chris-Evelo thx for starting on this.

if you are to use gdoc I am pointing you to the gdoc template available from fairplus drive..
image

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

The gdoc is very much just for drafting ideas at this point. I would hope that we would write the recipe directly in MD, but thanks for the pointer to the template as well.

from the-fair-cookbook.

 avatar commented on August 23, 2024

Sounds great! (I fear you might be the first and only one writing in markdown, but don't let Philippe hear this..) I started to provide some feedback in the gdoc. What do you think: shall we rather start with a concrete recipe, or with an ingredient description? I'm slightly more in favor of a concrete recipe, either on BridgeDB or identifiers.org ... What's your opinion?

from the-fair-cookbook.

 avatar commented on August 23, 2024

/remind me on 3pm

from the-fair-cookbook.

reminders avatar reminders commented on August 23, 2024

@robertgiessmann set a reminder for Apr 9th 2020

from the-fair-cookbook.

 avatar commented on August 23, 2024

Hi @AlasdairGray , shall we wait for Chris Evelo's feedback? Or do you have a specific recipe / ingredient description already in mind? Happy to hear from you, and first of all: have a good Easter break! (I will move this invitation to finalize in the next cycle)

from the-fair-cookbook.

reminders avatar reminders commented on August 23, 2024

πŸ‘‹ @robertgiessmann, on

from the-fair-cookbook.

 avatar commented on August 23, 2024

@AlasdairGray , I heard that you are very busy with bioschema.org now due to corona. Is that correct? Shall I come back another time to you?

(Considering also the current trend for annotating files with schemas, maybe the chapter "3.1.2.1. discovery by search engines: schema.org, bioschema" might be a good fit for you?)

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

It's likely to be the F2F meeting that I'll be able to have time to work on the recipe, although happy to use that time for interactions with others as well.

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

First draft identifier recipe available in commit 1e749df

@nsjuty can you check the identifier resolution section?

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

Should we discuss the AZ approach as published at https://fairtoolkit.pistoiaalliance.org/use-cases/adoption-and-impact-of-an-identifier-policy-astrazeneca/

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

@robertgiessmann bringing the updated status of the identifier recipe to your attention. I hope that I've updated the labels appropriately.

First draft identifier recipe available in commit 1e749df

@nsjuty can you check the identifier resolution section?

from the-fair-cookbook.

 avatar commented on August 23, 2024

Hi @AlasdairGray, thank you, yes, this slipped my attention. Having scrolled over it, I can already say from editorial point of view that this recipe is too broad and would make better for smaller sets (or needs collapsible/extendable sections). I propose to split it up.

I also believe that it would need a layman's introduction to identifiers to get absolute beginners started into that direction. --> that is not necessarily your task.

Some illustrations (simple graphics) would probably be helpful to make this recipe more easily digestible.

Tasklist for myself:

  • create a meta-tracker for this issue pond
  • how to split it up -> editorial board
  • assign someone to split it up
  • get splitted version delivered
  • invite absolute beginners introduction
  • assign someone to illustrate this recipe
  • get illustrations delivered

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

@rgiessmann where are we with this recipe?

from the-fair-cookbook.

 avatar commented on August 23, 2024

Hi @AlasdairGray! I assume you have seen that @proccaserra created a derivative of this recipe; right? It lives in dev branch: https://github.com/FAIRplus/the-fair-cookbook/blob/dev/docs/content/recipes/findability/identifiers.md

We (@proccaserra and myself) have actually a dispute about how to proceed with the recipe, as I believe it is too complicated for laymen, BUT: we don't have the target audience for the cookbook properly scoped in the consortium (is it IMI projects of the future? All R&D scientists?).

What would be your wish/opinion how to proceed?

from the-fair-cookbook.

proccaserra avatar proccaserra commented on August 23, 2024

@AlasdairGray I notified the revision about a month ago . I felt it was a good start but in several areas, assumptions were made and felt it was necessary to expanded and provide more insights.
We (@robertgiessmann and me) have been discussing (a discussion, rather than a dispute I hope!)
Making things too simple(simplistic) also carries a risk: alienating people looking for guidance and support on how to implement things so I guess we'll need another round of iteration to strike the right balance.

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024
  1. URI Resolution:

URI resolution is fundamentally about directing requests to the relevant identified entity

The standard approach would be to have a REST service with content negotiation.

I don't agree that REST is the standard approach. Serving back web pages is the standard approach (with content negotiation).

from the-fair-cookbook.

 avatar commented on August 23, 2024

@AlasdairGray, I understand you mean "REST" (including the HTTP actions: GET, PUT, POST, DELETE...) should not be the standard approach (meaning: that it is required to react to all actions), but just replying to GET (which will send back the webpage then)?

Actually, I found best-practice for answering to an entity identifier to be a reply with one (open question: which?) status code "30x" (301, 302, 303, 308) (cf https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection). This redirects you to a page with information about the entity, then. What do you think, @AlasdairGray @proccaserra ? Is my point after all a relevant discussion? πŸ™ˆ

from the-fair-cookbook.

Chris-Evelo avatar Chris-Evelo commented on August 23, 2024

I think that is an overly complex way to say you want to just have the page returned that the identifier refers to.

I think this recipe this recipe explains a bit more than just resolving an identifier. It mentions CURIE's but assumes that these are kind of magically resolved. Now for things like DOIs that magic might actually be provided by a modern web browser but for most biological database IDs that is not the case.

A CURIE like Database:Item can be resolved by identifiers.org and nt2.net (both are already mentioned, but this specific feature is not). So you can use: https://identifiers.org/Database:item (e.g https://identifiers.org/Uniprot:P12345) and that will be resolved. Additional advantages are:

  • The ID becomes persistent (basically that is the case for all identifiers.org IDs). The support or that is centralized at identifiers.or. If ever the databases is moved, or the database resolution scheme is changed we can just update the resolution procedure at identefiers.org.
  • The ID becomes more human-readable. Typically you will at least understand what database is referred to. If you also use a database ID type that aims to be human-readable (and still unique and resolvable) like a HGNC label.
  • It possible to have the ID resolution work with a specific data provider or choose one that is currently active if a main website is down.

Given these advantages, I think we should actually choose for this approach in the recipe, or at least clearly describe the advantages and not just list it as one option. I did not really write it yet because I think @nsjuty can write this better.

Of course, this has consequences for minting your own identifiers. If you do that and you want a resolution method that supports the same type of advantages you might want to install an indetfiers.org instance locally. That is something I discussed with Nick. But we haven't really decided whether that is something to try.

from the-fair-cookbook.

 avatar commented on August 23, 2024

@Chris-Evelo, Thanks for your input! Really a minor comment because I do slightly disagree on this:

The ID becomes persistent (basically that is the case for all identifiers.org IDs). The support or that is centralized at identifiers.or. If ever the databases is moved, or the database resolution scheme is changed we can just update the resolution procedure at identefiers.org.

What if uniprot drops (or better: changes) an local identifier on their side? identifiers.org is not able to keep track of this; the persistence has to come from uniprot itself (identifiers.org is not making the identifier persistent, is the point)

The other way around: Taking as an example https://identifiers.org/resolve?query=uniprot:P99O12 -- there an invalid identifier seems to be persistently available, even though it does not even exist in the database.

BUT, after all: I agree that identifiers.org is doing a great contribution in acting as a central hub to resolve originally local identifiers! πŸ’ͺ

from the-fair-cookbook.

Chris-Evelo avatar Chris-Evelo commented on August 23, 2024

You are right of course. If UniProt removes or changes the actual ID then it will not be persistent indeed. Of course, UniProt should not do that and instead should deprecate old IDs and have them link to new entries. We have seen complicated examples of this in the past, especially in Unigene where. gene clusters were split and one old ID linked to multiple new ones.

What I meant to write is that identifiers.org makes the resolution persistent. If the ID is still available in UniProt (or even in a copy after an original database disappears) then identifiers.org can make sure you the ID is resolved to that new location.

For the other aspect. I know @nsjuty used to set up regular expressions that will filter out IDs that do not conform to the database identifier scheme. That allows giving warnings when an ID should not be in the database. But yes typically you can also form non-existing IDs that fulfill that regular expression.

from the-fair-cookbook.

 avatar commented on August 23, 2024

πŸ‘ I really like the functionality of seeing in one place what backups /alternatives would be available to resolve an identifier, too.

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

@robertgiessmann wrote

@AlasdairGray, I understand you mean "REST" (including the HTTP actions: GET, PUT, POST, DELETE...) should not be the standard approach (meaning: that it is required to react to all actions), but just replying to GET (which will send back the webpage then)?

Actually, I found best-practice for answering to an entity identifier to be a reply with one (open question: which?) status code "30x" (301, 302, 303, 308) (cf https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection). This redirects you to a page with information about the entity, then. What do you think, @AlasdairGray @proccaserra ? Is my point after all a relevant discussion? πŸ™ˆ

Content negotiation does not necessarily mean redirection. It means that you get the form of content that you requested back, so from the same IRI I can get HTML, JSON, CSV, etc without needing to be passed onto a different location.

@Chris-Evelo wrote

I think that is an overly complex way to say you want to just have the page returned that the identifier refers to.

I agree with Chris that bringing in REST is an unnecessary complication at this point. Resolution of an IRI means GET from the HTTP protocol, there is no need to implement a web service in order to achieve this functionality. What we need is that something can be retrieved from the IRI.

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

The Understanding URLs section of the recipe uses URI and URL interchangeably without explaining that in most situations they can be used interchangeably. It also does not mention IRI.

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

Again in the Understanding URLs section, there are hyperlink syntax for URL and HTTP but the links are empty. Are you intending to point to another recipe or out to wikipedia or something else?

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

I found that there were no description of some parts the URL structure. I've added these in as part of commit 1e52376

from the-fair-cookbook.

AlasdairGray avatar AlasdairGray commented on August 23, 2024

@proccaserra should there be a discussion of hash vs slash based IRIs?

Note that this is a different usage of the term 'hash' from the hash generated identifiers discussed in the first part of the recipe.

from the-fair-cookbook.

sgtp avatar sgtp commented on August 23, 2024

A

You are right of course. If UniProt removes or changes the actual ID then it will not be persistent indeed. Of course, UniProt should not do that and instead should deprecate old IDs and have them link to new entries. We have seen complicated examples of this in the past, especially in Unigene where. gene clusters were split and one old ID linked to multiple new ones.

What I meant to write is that identifiers.org makes the resolution persistent. If the ID is still available in UniProt (or even in a copy after an original database disappears) then identifiers.org can make sure you the ID is resolved to that new location.

For the other aspect. I know @nsjuty used to set up regular expressions that will filter out IDs that do not conform to the database identifier scheme. That allows giving warnings when an ID should not be in the database. But yes typically you can also form non-existing IDs that fulfill that regular expression.

The question is what you want as "persistent" identifier. That the thing will be resolvable forever is something nobody can really guarantee. But you want that, as far as the resource produce information, they use URIs coherently and never re-assign them. So the source is really the authority here. Identifiers.org helps in brokering translations for non URIified resouces.

from the-fair-cookbook.

sgtp avatar sgtp commented on August 23, 2024

@proccaserra should there be a discussion of hash vs slash based IRIs?

Note that this is a different usage of the term 'hash' from the hash generated identifiers discussed in the first part of the recipe.

Very old thread....
I think # puts you in a corner for a range of possible usages.
If you think that you may want URIs as http://whatever/planet/country/city/street/house... the moment you put a # you limit how you can expand the pattern. Maybe it's good if you want to hard-code a set (like for earlier ontologies).

from the-fair-cookbook.

Chris-Evelo avatar Chris-Evelo commented on August 23, 2024

A

You are right of course. If UniProt removes or changes the actual ID then it will not be persistent indeed. Of course, UniProt should not do that and instead should deprecate old IDs and have them link to new entries. We have seen complicated examples of this in the past, especially in Unigene where. gene clusters were split and one old ID linked to multiple new ones.
What I meant to write is that identifiers.org makes the resolution persistent. If the ID is still available in UniProt (or even in a copy after an original database disappears) then identifiers.org can make sure you the ID is resolved to that new location.
For the other aspect. I know @nsjuty used to set up regular expressions that will filter out IDs that do not conform to the database identifier scheme. That allows giving warnings when an ID should not be in the database. But yes typically you can also form non-existing IDs that fulfill that regular expression.

The question is what you want as "persistent" identifier. That the thing will be resolvable forever is something nobody can really guarantee. But you want that, as far as the resource produce information, they use URIs coherently and never re-assign them. So the source is really the authority here. Identifiers.org helps in brokering translations for non URIified resources.

Yes, I do agree. However, there may be advantages in using URIs even for resources that do use their own. Some recent examples where that could have been beneficial even if the resource had provided its own URI:

  1. The change in database organisation at NCBI where Entrez gene identifiers are now part of NCBI gene. Of course, NCBI could have, in fact, may have - I am not really sure about the current status of NCBI RDF, forwarded URIs there. But by having a separate resolver the community could support that even if they do not.
  2. The renaming of a number of genes from HGNC to prevent the use of common words and the typical autocorrect problems in tools like Excel. Again NCBI could offer resolution of the old names to the new ones, but being able to do that separately too can be an advantage. (And yes, you could have used HGNC gene IDs instead of HGNC gene labels, but then you give up on the idea that it is helpful if humans understand the data too).
  3. EBI stopped support (updates) for RDF. If it goes away completely and we used BridgeDb URIs(e.g. in Open PHACTS) we can just update the resolution.

Note that both these examples are really about ID mapping from old IDs to new IDs , and maybe mapping between ID types. So we should also cover that in the recipe about mapping. For the HGNC problem, we are actually looking into this to see whether we can provide a separate BridgeDb linkset.

from the-fair-cookbook.

sgtp avatar sgtp commented on August 23, 2024

I can't see the text of the whole recipe, but maybe, from what I get in the thread, we can articulate this in different scenarios:

  • If you are a resource provider, make sure that you follow a good policies for the IDs, and that you guarantee resolution as far as the resource is actively produced.
  • If you are a repository or integrator, consider adding/using URIs that abstract away from the source, to guarantee that, in case of updates or other changes were ID can be traced back, you consumers will not be affected.
  • If you are a user consuming data...

from the-fair-cookbook.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.