Comments (5)
Yes. The responses were very helpful.
from commoncoreontologies.
The standard schema for mapping data is to create an IBE (Information Bearing Entity) for each value (i.e. the content of each cell in each column). So, in most cases, if you have 4 columns of data to map, you will have 4 IBEs per row.
In the use case you mention, you want to identify that all of this information came from a single file. We handle that by creating an additional IBE to represent the file and then link the file IBE to the other IBEs using the ro:has_part relation. Note: you may wish to use a subclass of IBA (Information Bearing Artifact), such as cco:Spreadsheet, instead of the standard IBE to represent the excel file.
I assume the data properties listed above are shorthand for a fuller graph representation since we want to avoid burying semantic content in relations. The resulting mapping for your 4 columns of data should look something like:
Artifact1 ---designated_by---> DesignativeName1 ---inheres_in---> IBE1 ---has_text_value---> "literal value"
Artifact1 ---prescribed_by---> ArtifactModel1 ---designated_by---> ArtifactModelName1 ---inheres_in---> IBE2 ---has_text_value---> "literal value"
Artifact1 ---bearer_of---> Weight1 ---is_measured_by---> MeasurementICE1 ---inheres_in---> IBE3 ---has_decimal_value---> "literal value"
IBE3 ---uses_measurement_unit---> URI of pound measurement unit
Artifact1 ---bearer_of---> Length1 ---is_measured_by---> MeasurementICE2 ---inheres_in---> IBE4 ---has_decimal_value---> "literal value"
IBE4 ---uses_measurement_unit---> URI of inch measurement unit
Note that the measurement units are linked to the IBEs that have the appropriate values.
To assert that all of this information is from a particular excel file, or even more specifically, from a particular line in a particular excel file, you would also include:
Spreadsheet1 ---has_part---> InformationLine1
InformationLine1 ---has_part---> IBE1
InformationLine1 ---has_part---> IBE2
InformationLine1 ---has_part---> IBE3
InformationLine1 ---has_part---> IBE4
The result should be 1 spreadsheet with as many rows as needed, each of which link to the specific items included in that row. You may also want to add information about the spreadsheet such as when it was created, who created it, and what its name is. Doing so will produce a lot of provenance data that can be easily retrieved using queries.
from commoncoreontologies.
Thanks Alex for the answer. That was very helpful. Now I have a clear understanding of how to like ICEs to IBEs and their values.
I have a question about when we eliminate the IBE for the cases in which provenance data is not required or just for the sake of computational efficiency. My understanding is that we can use the annotation property 'is tokenozied by ' to directly link an ICE to its value. For example:
Artifact1 ---bearer_of---> Length1 ---is_measured_by--->
MeasurementICE2 ---is tokenized by ---> "literal value
MeasurementICE2 ---is tokenized by --->URI of inch measurement unit
So I am using 'is tokenized by' property for both the value of the measured length and its unit of measurement. Is this the right way to do it? Don't we need to create more specilaized sub-types of 'is tokenized by' to designate different types of values in the range of property?
Farhad
from commoncoreontologies.
The property is_tokenized_by is, as you say, intended to simplify some representations by providing a shortcut relation that holds between the ICE and the literal value. The simplification typically comes at the expense of provenance (though there is nothing preventing you from combining the standard -- ICE inheres_in IBE has_value "literal" -- representation with the use of ICE is_tokenized_by "literal" if you'd like to simplify queries without foregoing provenance). As you note, the simplification also inhibits the use of semantics that attach to IBEs. This includes the use of measurement units, languages, and spatial or temporal reference systems.
You could avoid this shortcut shortcoming by pre-processing your data so the measurement values and units appear as a single literal (e.g. 107 inches, or 67.3 lbs.). Doing so would allow you to use a single is_tokenized_by relation to connect a MeasurementICE to a useful literal value; however, you'll have to weigh the benefits of this approach against your data retrieval and querying needs. For example, suppose you want to find all measurements that use imperial units (perhaps because you want to convert them to metric units). If you used the standard representation you can search for any IBE that uses a measurement unit from a list of imperial units. There will be no ambiguity or guesswork involved in identifying these values assuming you mapped them properly. Alternatively, if you used is_tokenized_by to link to literals that include textual measurement units, you will have to perform a series of regex searches to (hopefully) find every literal that uses imperial units. Depending on your data sources, this could be a painful task (e.g. search for 'pound', 'pounds', 'lb', 'lbs', 'lbs.', and so on).
In general, we do not recommend using the is_tokenized_by shortcut for standard data representations. This is especially the case for measurements, data that is expected to change (e.g. a person's weight, height, or age), and data for which provenance is important (e.g. who reported or measured it, when was it reported or measured, what method was used, etc.). In such cases, you want to be able to identify all of the relevant information and the CCO accomplishes a lot of this work through the use of IBEs.
The shortcut is primarily used to simplify visual diagrams where provenance is irrelevant. It can, however, also be useful when mapping certain types of data, especially designators that may appear many times in a dataset.
For example, you may want to say that a particular person only has 1 name (e.g. 'John Example Doe'), but acknowledge that that person's name may be written in various ways (e.g. 'John Example Doe', 'John Doe', 'John E. Doe', 'JE Doe', 'JD', etc.). You may also be uninterested in tracking the provenance of every token of the person's name each time it occurs in your data. Using is_tokenized_by to link the name to the literal can be useful here. If the person John Example Doe appears 100 times in your data and you used is_tokenized_by to link the name to the literal, you will end up with 1 triple for each variation on the spelling of John Example Doe's name instead of 100 triples linking the nameICEs to the IBEs and another 100 triples linking the IBEs to the literals. This is where the shortcut can reduce complexity and improve query results.
from commoncoreontologies.
@fameri Did these responses resolve your questions?
from commoncoreontologies.
Related Issues (20)
- Clarification on cco:has_interest_in HOT 1
- Inexact owl:equivalentClass for Action Prohibition, Action Permission, Action Requirement
- Typo in definition of 'generically_depends_on'
- Can object aggregates bear GDCs? HOT 1
- owl:equivalentClass for Cause and Effect do not match definitions
- clarification on 'organization member' subClassOf
- typo in definition of 'permanent resident'
- Spatial and temporal reference systems are directive HOT 2
- Documentation for has-participant properties in bfo2020 branch HOT 2
- Discrepancy between definition and axioms for 'Agent' class
- Should 'legal instrument' be a subclass of 'document'?
- Range of has_text_value is too narrow HOT 2
- Definition of relational quality in bfo2020 branch HOT 4
- Agent and Dual Inheritance HOT 1
- List of Open Issues on Topic of Information Content Entities
- Information Content that is both Directive and Descriptive HOT 2
- City located_in 'Constitutive State' generates contradiction HOT 7
- adding relation 'instrument in', vs adjusting 'accomplice in'? HOT 7
- Question about class Measurement Unit of Geocoordinate HOT 3
- Comments from Brian Haugh from motion to Approve CCO v1.4 as Working Document HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from commoncoreontologies.