Suppose I have an Excel file that lists the machines in a factory. It has the followin

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Information Bearing Entity,about commoncoreontology/commoncoreontologies

Comments (5)

fameri commented on May 27, 2024 1

Yes. The responses were very helpful.

from commoncoreontologies.

APCox commented on May 27, 2024

The standard schema for mapping data is to create an IBE (Information Bearing Entity) for each value (i.e. the content of each cell in each column). So, in most cases, if you have 4 columns of data to map, you will have 4 IBEs per row.

In the use case you mention, you want to identify that all of this information came from a single file. We handle that by creating an additional IBE to represent the file and then link the file IBE to the other IBEs using the ro:has_part relation. Note: you may wish to use a subclass of IBA (Information Bearing Artifact), such as cco:Spreadsheet, instead of the standard IBE to represent the excel file.

I assume the data properties listed above are shorthand for a fuller graph representation since we want to avoid burying semantic content in relations. The resulting mapping for your 4 columns of data should look something like:

Artifact1 ---designated_by---> DesignativeName1 ---inheres_in---> IBE1 ---has_text_value---> "literal value"
Artifact1 ---prescribed_by---> ArtifactModel1 ---designated_by---> ArtifactModelName1 ---inheres_in---> IBE2 ---has_text_value---> "literal value"
Artifact1 ---bearer_of---> Weight1 ---is_measured_by---> MeasurementICE1 ---inheres_in---> IBE3 ---has_decimal_value---> "literal value"
IBE3 ---uses_measurement_unit---> URI of pound measurement unit
Artifact1 ---bearer_of---> Length1 ---is_measured_by---> MeasurementICE2 ---inheres_in---> IBE4 ---has_decimal_value---> "literal value"
IBE4 ---uses_measurement_unit---> URI of inch measurement unit

Note that the measurement units are linked to the IBEs that have the appropriate values.

To assert that all of this information is from a particular excel file, or even more specifically, from a particular line in a particular excel file, you would also include:

Spreadsheet1 ---has_part---> InformationLine1
InformationLine1 ---has_part---> IBE1
InformationLine1 ---has_part---> IBE2
InformationLine1 ---has_part---> IBE3
InformationLine1 ---has_part---> IBE4

The result should be 1 spreadsheet with as many rows as needed, each of which link to the specific items included in that row. You may also want to add information about the spreadsheet such as when it was created, who created it, and what its name is. Doing so will produce a lot of provenance data that can be easily retrieved using queries.

from commoncoreontologies.

fameri commented on May 27, 2024

Thanks Alex for the answer. That was very helpful. Now I have a clear understanding of how to like ICEs to IBEs and their values.

I have a question about when we eliminate the IBE for the cases in which provenance data is not required or just for the sake of computational efficiency. My understanding is that we can use the annotation property 'is tokenozied by ' to directly link an ICE to its value. For example:

Artifact1 ---bearer_of---> Length1 ---is_measured_by--->
MeasurementICE2 ---is tokenized by ---> "literal value
MeasurementICE2 ---is tokenized by --->URI of inch measurement unit

So I am using 'is tokenized by' property for both the value of the measured length and its unit of measurement. Is this the right way to do it? Don't we need to create more specilaized sub-types of 'is tokenized by' to designate different types of values in the range of property?

Farhad

from commoncoreontologies.

APCox commented on May 27, 2024

The property is_tokenized_by is, as you say, intended to simplify some representations by providing a shortcut relation that holds between the ICE and the literal value. The simplification typically comes at the expense of provenance (though there is nothing preventing you from combining the standard -- ICE inheres_in IBE has_value "literal" -- representation with the use of ICE is_tokenized_by "literal" if you'd like to simplify queries without foregoing provenance). As you note, the simplification also inhibits the use of semantics that attach to IBEs. This includes the use of measurement units, languages, and spatial or temporal reference systems.

You could avoid this shortcut shortcoming by pre-processing your data so the measurement values and units appear as a single literal (e.g. 107 inches, or 67.3 lbs.). Doing so would allow you to use a single is_tokenized_by relation to connect a MeasurementICE to a useful literal value; however, you'll have to weigh the benefits of this approach against your data retrieval and querying needs. For example, suppose you want to find all measurements that use imperial units (perhaps because you want to convert them to metric units). If you used the standard representation you can search for any IBE that uses a measurement unit from a list of imperial units. There will be no ambiguity or guesswork involved in identifying these values assuming you mapped them properly. Alternatively, if you used is_tokenized_by to link to literals that include textual measurement units, you will have to perform a series of regex searches to (hopefully) find every literal that uses imperial units. Depending on your data sources, this could be a painful task (e.g. search for 'pound', 'pounds', 'lb', 'lbs', 'lbs.', and so on).

In general, we do not recommend using the is_tokenized_by shortcut for standard data representations. This is especially the case for measurements, data that is expected to change (e.g. a person's weight, height, or age), and data for which provenance is important (e.g. who reported or measured it, when was it reported or measured, what method was used, etc.). In such cases, you want to be able to identify all of the relevant information and the CCO accomplishes a lot of this work through the use of IBEs.

The shortcut is primarily used to simplify visual diagrams where provenance is irrelevant. It can, however, also be useful when mapping certain types of data, especially designators that may appear many times in a dataset.

For example, you may want to say that a particular person only has 1 name (e.g. 'John Example Doe'), but acknowledge that that person's name may be written in various ways (e.g. 'John Example Doe', 'John Doe', 'John E. Doe', 'JE Doe', 'JD', etc.). You may also be uninterested in tracking the provenance of every token of the person's name each time it occurs in your data. Using is_tokenized_by to link the name to the literal can be useful here. If the person John Example Doe appears 100 times in your data and you used is_tokenized_by to link the name to the literal, you will end up with 1 triple for each variation on the spelling of John Example Doe's name instead of 100 triples linking the nameICEs to the IBEs and another 100 triples linking the IBEs to the literals. This is where the shortcut can reduce complexity and improve query results.

from commoncoreontologies.

mark-jensen commented on May 27, 2024

@fameri Did these responses resolve your questions?

from commoncoreontologies.

Information Bearing Entity about commoncoreontologies HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs