GithubHelp home page GithubHelp logo

Comments (8)

stefan-korn avatar stefan-korn commented on July 23, 2024 1

@dafeder : Thanks for your reply. We are now relying in some cases on "Dataset properties to be stored as separate entities", so hopefully you will not remove this feature entirely in the future.

I still remember your offer for connection outside Github. We did not manage to get a scheduling on our end yet, but I will hopefully can come back to your offer in near future.

from dkan.

stefan-korn avatar stefan-korn commented on July 23, 2024

Please see PR #4055 for an idea how to handle this differently without creating a new distribution (or other embedded elements) every tme.

from dkan.

dafeder avatar dafeder commented on July 23, 2024

@stefan-korn for now it is expected behavior, although we are aware it is counter-intuitive. Because distributions are linked to datasets through a reference, and the reference system as it exists now only uses node UUIDs and not version IDs, there is no way to see previous versions of the dataset if we don't know which version of the distribution to load inside of it. So for now, datasets are versioned but referenced items are not, and are simply saved as new nodes so that both the old and new revisions of the dataset will be dereferenced to show the intended values for the distributions.

I think your PR would suffer from the same issue. We are looking into better solutions for this, since the way it is now does work but doesn't follow expected patterns in Drupal and creates an impractical number of distributions in a lot of cases. One question is, should distributions be referenced at all? Maybe they don't need to be, and we could just store the array of distributions within the dataset node...

If they do need to be stored, we should probably figure out a way to revision them so that we don't keep making new ones. But then we need to rethink references to include the version ID, otherwise we could be showing incorrect data for old revisions of the dataset.

from dkan.

stefan-korn avatar stefan-korn commented on July 23, 2024

@dafeder : Thanks for the explanation.

What do you mean by

One question is, should distributions be referenced at all? Maybe they don't need to be, and we could just store the array of distributions within the dataset node

No node for the distributions, saving them together with dataset? Technically there is this option now with unchecking this metastore setting?

Dataset properties to be stored as separate entities

Though this probably won't work because of some special handling of the distribution. But in a more general way, I suppose the problem with creating new entities for references is prevalent for other properties too that are stored as separate entities and will be edited inside the dataset.
One thing that does look a bit difficult to me is, that the the hash of the properties values is considered for deciding whether a new entity is created or not. Then if you maybe allow only a few values of the property be edited inside dataset and the full range of values only in the separate editing of the property, this maybe cause some troubles (though I am not sure, if this is considered to be valid practice now).

from dkan.

dafeder avatar dafeder commented on July 23, 2024

No node for the distributions, saving them together with dataset? Technically there is this option now with unchecking this metastore setting?

There is. I think the datasets would save but most likely other things would break. Certainly datastore would not work, and the frontend would have to be refactored. But I think in general we are not particularly well-served by having distributions be saved separately. The important thing is the dataset-to-resource relationship and the distribution ID/reference complicates it with no real benefits as far as I can see. So factoring out the distribution-specific code and just finding a way to signal to DKAN where resource URLs can be found in the dataset object would I think resolve a lot of these problems and also open the door to more diverse schema structures we could support.

from dkan.

dafeder avatar dafeder commented on July 23, 2024

Another reason for having distributions decoupled from datasets was that in theory you could have datasets that are published with distributions that are not. But I think few people are doing this and the same thing could be accomplished by having a published version of the dataset w/o the distribution and an unpublished draft that has it.

from dkan.

stefan-korn avatar stefan-korn commented on July 23, 2024

@dafeder : Coming back to your explanations on distribution I would like to know if you prefer (in the long term) to get rid of any "Dataset properties to be stored as separate entities" in the metastore (/admin/dkan/properties)? Or do you still see this as a valid approach?

I am currently thinking about integrating publishers in search api by providing a SearchApiDatasource and ComplexDataFacade analogous to how it is done with the dataset. If doing this, it would rely on publishers being saved as separate entities. Therefore if DKAN will be going away from the concept of saving dataset properties as separate entities, I maybe would not want to go this way with the Search API integration.

from dkan.

dafeder avatar dafeder commented on July 23, 2024

@stefan-korn it is a bit of an open question still to be honest. I think having distributions as separate entities is maybe more trouble than its worth, but there are a lot of situations where publishers really need to exist in the system somehow independently of the datasets. I would love to hear more about your use case and experience, maybe we can find a way to connect outside of github? :)

from dkan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.