Describe the bug Editing an embedded distribution inside a dataset

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Please see PR <a class="issue-link js-issue-link" data-error-text="Failed to load titl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Editing distribution inside a dataset creates new distribution about dkan HOT 8 OPEN

stefan-korn commented on July 23, 2024

Editing distribution inside a dataset creates new distribution

from dkan.

Comments (8)

stefan-korn commented on July 23, 2024 1

@dafeder : Thanks for your reply. We are now relying in some cases on "Dataset properties to be stored as separate entities", so hopefully you will not remove this feature entirely in the future.

I still remember your offer for connection outside Github. We did not manage to get a scheduling on our end yet, but I will hopefully can come back to your offer in near future.

from dkan.

stefan-korn commented on July 23, 2024

Please see PR #4055 for an idea how to handle this differently without creating a new distribution (or other embedded elements) every tme.

from dkan.

dafeder commented on July 23, 2024

@stefan-korn for now it is expected behavior, although we are aware it is counter-intuitive. Because distributions are linked to datasets through a reference, and the reference system as it exists now only uses node UUIDs and not version IDs, there is no way to see previous versions of the dataset if we don't know which version of the distribution to load inside of it. So for now, datasets are versioned but referenced items are not, and are simply saved as new nodes so that both the old and new revisions of the dataset will be dereferenced to show the intended values for the distributions.

I think your PR would suffer from the same issue. We are looking into better solutions for this, since the way it is now does work but doesn't follow expected patterns in Drupal and creates an impractical number of distributions in a lot of cases. One question is, should distributions be referenced at all? Maybe they don't need to be, and we could just store the array of distributions within the dataset node...

If they do need to be stored, we should probably figure out a way to revision them so that we don't keep making new ones. But then we need to rethink references to include the version ID, otherwise we could be showing incorrect data for old revisions of the dataset.

from dkan.

stefan-korn commented on July 23, 2024

@dafeder : Thanks for the explanation.

What do you mean by

One question is, should distributions be referenced at all? Maybe they don't need to be, and we could just store the array of distributions within the dataset node

No node for the distributions, saving them together with dataset? Technically there is this option now with unchecking this metastore setting?

Dataset properties to be stored as separate entities

Though this probably won't work because of some special handling of the distribution. But in a more general way, I suppose the problem with creating new entities for references is prevalent for other properties too that are stored as separate entities and will be edited inside the dataset.
One thing that does look a bit difficult to me is, that the the hash of the properties values is considered for deciding whether a new entity is created or not. Then if you maybe allow only a few values of the property be edited inside dataset and the full range of values only in the separate editing of the property, this maybe cause some troubles (though I am not sure, if this is considered to be valid practice now).

from dkan.

dafeder commented on July 23, 2024

No node for the distributions, saving them together with dataset? Technically there is this option now with unchecking this metastore setting?

There is. I think the datasets would save but most likely other things would break. Certainly datastore would not work, and the frontend would have to be refactored. But I think in general we are not particularly well-served by having distributions be saved separately. The important thing is the dataset-to-resource relationship and the distribution ID/reference complicates it with no real benefits as far as I can see. So factoring out the distribution-specific code and just finding a way to signal to DKAN where resource URLs can be found in the dataset object would I think resolve a lot of these problems and also open the door to more diverse schema structures we could support.

from dkan.

dafeder commented on July 23, 2024

Another reason for having distributions decoupled from datasets was that in theory you could have datasets that are published with distributions that are not. But I think few people are doing this and the same thing could be accomplished by having a published version of the dataset w/o the distribution and an unpublished draft that has it.

from dkan.

stefan-korn commented on July 23, 2024

@dafeder : Coming back to your explanations on distribution I would like to know if you prefer (in the long term) to get rid of any "Dataset properties to be stored as separate entities" in the metastore (/admin/dkan/properties)? Or do you still see this as a valid approach?

I am currently thinking about integrating publishers in search api by providing a SearchApiDatasource and ComplexDataFacade analogous to how it is done with the dataset. If doing this, it would rely on publishers being saved as separate entities. Therefore if DKAN will be going away from the concept of saving dataset properties as separate entities, I maybe would not want to go this way with the Search API integration.

from dkan.

dafeder commented on July 23, 2024

@stefan-korn it is a bit of an open question still to be honest. I think having distributions as separate entities is maybe more trouble than its worth, but there are a lot of situations where publishers really need to exist in the system somehow independently of the datasets. I would love to hear more about your use case and experience, maybe we can find a way to connect outside of github? :)

from dkan.

Editing distribution inside a dataset creates new distribution about dkan HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs