Comments (8)
@dafeder : Thanks for your reply. We are now relying in some cases on "Dataset properties to be stored as separate entities", so hopefully you will not remove this feature entirely in the future.
I still remember your offer for connection outside Github. We did not manage to get a scheduling on our end yet, but I will hopefully can come back to your offer in near future.
from dkan.
Please see PR #4055 for an idea how to handle this differently without creating a new distribution (or other embedded elements) every tme.
from dkan.
@stefan-korn for now it is expected behavior, although we are aware it is counter-intuitive. Because distributions are linked to datasets through a reference, and the reference system as it exists now only uses node UUIDs and not version IDs, there is no way to see previous versions of the dataset if we don't know which version of the distribution to load inside of it. So for now, datasets are versioned but referenced items are not, and are simply saved as new nodes so that both the old and new revisions of the dataset will be dereferenced to show the intended values for the distributions.
I think your PR would suffer from the same issue. We are looking into better solutions for this, since the way it is now does work but doesn't follow expected patterns in Drupal and creates an impractical number of distributions in a lot of cases. One question is, should distributions be referenced at all? Maybe they don't need to be, and we could just store the array of distributions within the dataset node...
If they do need to be stored, we should probably figure out a way to revision them so that we don't keep making new ones. But then we need to rethink references to include the version ID, otherwise we could be showing incorrect data for old revisions of the dataset.
from dkan.
@dafeder : Thanks for the explanation.
What do you mean by
One question is, should distributions be referenced at all? Maybe they don't need to be, and we could just store the array of distributions within the dataset node
No node for the distributions, saving them together with dataset? Technically there is this option now with unchecking this metastore setting?
Dataset properties to be stored as separate entities
Though this probably won't work because of some special handling of the distribution. But in a more general way, I suppose the problem with creating new entities for references is prevalent for other properties too that are stored as separate entities and will be edited inside the dataset.
One thing that does look a bit difficult to me is, that the the hash of the properties values is considered for deciding whether a new entity is created or not. Then if you maybe allow only a few values of the property be edited inside dataset and the full range of values only in the separate editing of the property, this maybe cause some troubles (though I am not sure, if this is considered to be valid practice now).
from dkan.
No node for the distributions, saving them together with dataset? Technically there is this option now with unchecking this metastore setting?
There is. I think the datasets would save but most likely other things would break. Certainly datastore would not work, and the frontend would have to be refactored. But I think in general we are not particularly well-served by having distributions be saved separately. The important thing is the dataset-to-resource relationship and the distribution ID/reference complicates it with no real benefits as far as I can see. So factoring out the distribution-specific code and just finding a way to signal to DKAN where resource URLs can be found in the dataset object would I think resolve a lot of these problems and also open the door to more diverse schema structures we could support.
from dkan.
Another reason for having distributions decoupled from datasets was that in theory you could have datasets that are published with distributions that are not. But I think few people are doing this and the same thing could be accomplished by having a published version of the dataset w/o the distribution and an unpublished draft that has it.
from dkan.
@dafeder : Coming back to your explanations on distribution I would like to know if you prefer (in the long term) to get rid of any "Dataset properties to be stored as separate entities" in the metastore (/admin/dkan/properties)? Or do you still see this as a valid approach?
I am currently thinking about integrating publishers in search api by providing a SearchApiDatasource and ComplexDataFacade analogous to how it is done with the dataset. If doing this, it would rely on publishers being saved as separate entities. Therefore if DKAN will be going away from the concept of saving dataset properties as separate entities, I maybe would not want to go this way with the Search API integration.
from dkan.
@stefan-korn it is a bit of an open question still to be honest. I think having distributions as separate entities is maybe more trouble than its worth, but there are a lot of situations where publishers really need to exist in the system somehow independently of the datasets. I would love to hear more about your use case and experience, maybe we can find a way to connect outside of github? :)
from dkan.
Related Issues (20)
- DKAN on headless VM HOT 2
- WidgetRouter accepts only PHP UUID HOT 5
- Number widget HOT 4
- Code documentation little messed up in WidgetRouter.php
- Set the default theme to Stark instead of Olivero when generating new site using DDEV for correct styling.
- Catch Guzzle Exception to avoid breaking harvest HOT 4
- Metastore service needs to call logger.factory HOT 1
- Use drupal-composer/info-rewrite for showing DKAN version in Drupal backend HOT 3
- Expose dataset properties as pseudo/extra fields HOT 8
- DKAN JS Frontend module - ReadMe Section Tips HOT 1
- Provide text_format render element for string schema properties HOT 1
- View DKAN Metastore (Datasets) should only list nodes of type Data
- Make harvester more defensive against bad data HOT 3
- Deprecated: Creation of dynamic property :$has_json_form_widget HOT 2
- Error : Call to a member function out() on boolean in leaflet_widget_widget_validate() (line 230 HOT 1
- Notice : Undefined index: label_visibility in field_group_table_field_group_pre_render() (line 181) HOT 2
- Increase textfields maxlength HOT 15
- Allow to activate trailing delimiter for CSV parser HOT 1
- Dataset ComplexDataFacade returning wrong empty value
- Search API integration - Property label and description HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dkan.