das-rcn / das_metadata Goto Github PK

Tools for standardizing Distributed Acoustic Sensing (DAS) metadata

License: Creative Commons Attribution 4.0 International

das distributed-acoustic-sensing reporting-format data-standard

das_metadata's Introduction

DAS-RCN Reporting Format for Distributed Acoustic Sensing (DAS) metadata v1.1.0

Distributed Acoustic Sensing (DAS) is a transformative technology and its applications in geosciences and engineering are numerous and growing. To move this frontier technology forward, it is important to compare and integrate measurements across deployments and make the data reusable by others following Findable, Accessible, Interoperable, and Reusable (FAIR) data principles (Wilkinson et al., 2016). Long-standing metadata standards such as the Standard for the Exchange of Earthquake Data (SEED) (Ahern et al., 2009) developed for seismic data do not adapt well for DAS due to fundamental differences in sensor and data acquisition parameters. Existing formats for seismic metadata developed in the early days of digital seismic recording, such as dataless SEED and SEG-Y headers, cannot accommodate all the acquisition parameters, cable environment, and channel location information needed for proper characterization.

Here, DAS-RCN Data Management Working Group presents DAS metadata standard (v1.1.0), specifically for DAS research community to facilitate the integration of DAS measurements across experiments and increase reusability. This standard V1.1.0 (improved from V1.0.0) fully describes the five key components of a DAS experiment: interrogator, data acquisition, channels, cable, and fiber. The intent is that this metadata standard should be independent of the specific implementation and the emphasis is on the content and structure (i.e., schema). This also implies that the metadata is independent of the time-series data.

Getting started

Optional to use our reader-friendly website url
Read the DAS metadata terms and definitions
Download templates for the standardized attributes
Explore our example gallery for DAS metadata in different deployment scenarios.

How to contribute

Want to make a change to the reporting format?

Submit a new issue here and use one of several templates that we provide to:

suggest a new term
modify a term
propose changes to documentation within this repository
request features
report bugs

Version history

• v1.1.0 (Sep 21, 2023): Updated DAS metadata schema.
Key changes include creating separate child branch for Cable (and fiber), and for Interrogator (and Acquisition, Channel Group, and Channel).
• v1.0.0 (Aug 24, 2022): This is the first release of the DAS Metadata Reporting Format

Copyright information

The DAS-RCN Reporting Format for DAS metadata is licensed under the Creative Commons Attribution 4.0 International (CCby4).

Funding and acknowledgements

The team (Voon Hui Lai, Kathleen Hodgkinson, Robert Porritt, Robert Mellors) would like to thank all who provided input and in particular, members of the DAS RCN Metadata working group: Jonathan Ajo-Franklin; Kent Anderson; Sandra Barajas; Paul Bodin; Jerry Carter; Luigia Cristiano; Ben Evans; Ge Jin; Meghan Miller; Helle Pedersen; Javier Quinteros; Nigel Rees; Diane Rivet; Jonathan Schaeffer; Nicole Taverna; Chad Trabant; Arantza Ugalde; Antonio Villaseñor; Jon Weers; Ray Willemann; Lesley Wyborn; Nate Lindsey; Zack Spica; Andreas Wuestefeld. We would also like to specially acknowledge Dan Auerbach, Marius Isken, Julián Pelaez, Earthscope Consortium (formally IRIS) Data Management Center for their extensive input and suggestions in improving this metadata standard.

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

KMH and RWP were supported by the Source Physics Experiment. The Source Physics Experiment (SPE) would not have been possible without the support of many people from several organizations. The authors wish to express their gratitude to the National Nuclear Security Administration, Defense Nuclear Nonproliferation Research and Development (DNN R&D), and the SPE working group, a multi-institutional and interdisciplinary group of scientists and engineers.

Recommended citation

A manuscript is currently being prepared for submission. At the meantime, the draft can be requested directly from Voon ([email protected]).

das_metadata's People

Contributors

Stargazers

Watchers

Forkers

miili mdbarin ckanu13k

das_metadata's Issues

SSA/EGU comments

For completeness
SSA 2022/Seattle Washington
• Important parameters for using a DAS dataset:
o Method of photonic estimation (dual-pulse, single-pulse, chirp, local oscillator) or at least require the category of photonic estimation (quantitative, non-quantitative)
o SEAFOM standard of noise level (e.g., noise level estimate in rad/rt-hz at 1, 5,10,50 km)
• Less important, but still relevant:
o provenance of location estimation (why, how, when, who)
o Dark fiber vs direct install
o Fiber owner
o Fiber operator
o OTDR for array (w provenance as fibers do change)

EGU 2022
(related) Ways to transfer large amounts of data?
Add trace start time to channel metadata? [but would this also imply sample rate and number of samples?

Other
Timing could be other than GPS (e.g., NTP or PTP).
Could add a timing metric for segments where timing lock was missing.
Ownership of the cable, location of the first repeater, depth of water (marine or lake)
Use a pointer to a file containing locations rather than repeat (implemented in July 25 version)
Add metadata version number
Add “user-defined” space

cable/fiber ID

Right now it reads like there's just one cable-fiber ID for each portion of the fiber. Am I understanding that correctly?

As I'm working on an implementation of this in DASCore, I'm realizing it probably makes sense to keep a fiber ID and a cable ID for each portion of the light path.

Even though this may be redundant for some experiments, it's common enough to have a mix of straight stretches and loop-backs that I think it could help clarify acquisition setup for others after the fact.

A couple examples: This lets you leave a clear record when you start collecting data on a new fiber in the same cable. It also lets you have a clear indication of two fibers with a U connection at the end that sit in the same cable being differentiated from two fibers in two separate cables that are side-by-side or a cable deployed as a U. Or if you have a single mode DAS and a multimode DTS recording in the same cable (e.g. to do temp. corrections on your DAS data), and you use a similar metadata standard for DTS, then you can make it clear that they're in the same cable (and this means same thermal properties) versus side-by-side cables (potentially quite different thermal response).

Define cable mapping

Request new feature

Short description of new feature: EPSG designators
Submitter: David Podrasky, Silixa LLC

Feature request (reason, suggestion to implement):

Coordinate system descriptions are too vague. If provided in meta-data, geodetic or projected coordinate systems should be fully defined by things like horizontal datum, vertical datum, ellipsoid, units, projection, zone, etc. EPSG codes exist for this purpose and propose that they be required when including any sensing cable coordinates. Descriptive text is fine too, but EPSG codes should be a requirement.

Optical path?

The PRODML standard has a concept called 'Optical Path', which resembles our cable/fiber but is intended to capture the entire optical path during a single acquisition and would map to an OTDR measurement (which might be linked as an ancillary file). In this way the entire response could be captured including any splices or connections and could be validated by comparison with an OTDR measurement, which is usually conducted prior to data collection anyway. Perhaps something to think about.

geo-referenced coordinates

reading through the whitepaper it is not clear how the distinction between channels, cable distance and geographical coordinates is incorporated
It may be useful to have a dedicated section on coordinates somewhere with

channel number /ID
channel spacing dx
channel distance to interrogator (optical distance)
geo-referenced coordinates (after tap test corrections),
- X,Y,Z
- optionally lat,long

this info seems to be all in there, but a bit scattered. However this seems one of the most critical information in the data, so it might deserve a more prominent position

Clarification on description

Reading the meta-data descriptions several points come up, and need perhaps clarification:

Overview:
- Start/End Date: Why is that limited to yyyy-mm-dd and not full yyyy-mm-ddTHH:MM:SS
Cable
- CableCoordinates: This is redundant with the "Channel XYZ" coordinates. It should be unique where such information is stored to avoid confusion. Here it also not specified how the coordinates should be sorted (XYZ, YXZ, Lat/Long,....)
- Connector_Coordinates: Maybe Channel Number is an easier and more unique descriptor?
- Fibre_Mode: Perhaps a boolean is better? : Is_Single_Mode
- Refraction_Index: I am not a native speaker, but "refractive index" sounds more correct to me :-)
- Fibre Geometry is redundant with "winding angle"
- Winding angle: Clarify that 0 deg equals a straight cable
- Fiber start location: It says that it should be a float, but I guess you mean a list of floats? what coordinate system?
- Comments: I like to have this field available! Should perhaps be available for a main sections?
Acquisition
- SamplingRate: Should that be stored sampling rate?
- InterrogatorRate: Maybe more specific would be InterrogationRate?
- PRR: I wonder if there is any useful application to keep InterrogationRate and PRR
- PulseWidth: This may be not applicable for chirp-pulse interrogators (like ASN) Perhaps worthwhile to check with them
- GaugeLength: It says "between a pair of PULSES", but it should be "pair of CHANNELS" right?
- ChannelSpacing: Some interrogators allow storing of spatially downsampled data. If this meta data is storing all possible information from the interrogator, then this should be stored here as well. But I feel the native units should be rather stored in the interrogator field, to avoid confusion
- Decimation: Clarify if this is spatial or temporal (Redundant anyway with native/stored sampling Rate)
- TimeFilter: Nice to have this as a separate field, but a more general "Applied_Pre-Processing" field may be better?
Channel
- Clarify if this should be a list of objects or a single object with lists (I prefer the latter)
- Distance along fiber: Clarify if this is optical, cable, or geodetic distance (optical should perhaps be covered in the fibre/cable section? Geodetic distance is redundant if coordinates are given)
- Elevation: perhaps be consistent and call this Z-Coordinate?
- Depth: To avoid confusion I strongly suggest to only allow one Z-coordinate direction. Since this is for IRIS, I suggest to stick to that system with positive up!
- Strike/Dip: This is redundant information and can be obtained programatically from the coordinates
- Direction of Pulse: Is there any case where this is not increasing? It says it is in units of degrees, that needs more clarification

Potential new entries for Metadata + other comments/ideas

Request new features

Short description of new feature: Potential new entries for Metadata
Submitter: Julián Pelaez ([email protected])

Feature request:

Hi everyone, I wanted to propose a few additional entries for the metadata. I grouped them according to the categories agreed until now. Most of them should not be always required, I think. I leave it up to you to decide. Feel free to change/adapt/etc..

-- Overview

Environmental conditions --> Weather/Ocean state/Air temperature or pressure/Rain... during acquisition or any other special environmental situation that might be worth mentioning for a given application (e.g. for oceanography,..)
Data Storage/Transmission --> some info about the protocol through which the data is being stored, sent to somewhere, converted,... this could help identify sources of gaps or corrupt files, speed transmission issues,.. I don't know much about the specifics here, so I'm guessing a lot.

-- Cable and Fiber:

Cable owner/manager --> Perhaps I missed it, but I didn't see this one in the current Metadata.

-- Interrogator:

Room temperature/other site-specific remarks --> where the interrogator rests. I'm not sure if useful, but this may say something about the performance of DAS and partially explain the data quality?

-- Acquisition:

Noise sources --> Remarkable urban noise, machinery, construction sites, rivers, etc. nearby that might be worth reporting.
Simultaneous recordings --> I've seen recent test of DAS interrogation along fibers transferring independent data simultaneously. This may become common in the future. Also, perhaps other DFOS (DTS, DSTS,..) might be active on other fibers on the same cable, which may become handy for a given campaign at a later stage.
Dedicated Geometry --> If experiments are done using cables as arrays with particular geometrical layouts, it may be good to label them accordingly for quick query, although this is obvious from the channel coordinates as well.

-- Channel

Geographic horizontal cable distance --> The distance along the XY-plane projection of the cable. This is generally not the same as the distance along cable, and could actually be computed from the geographical coordinates as well, but it may be also nice to know the true cable spread on a map when it has notable vertical variations.
Distance along fiber uncertainty --> just a measure of how accurate the Distance along fiber estimate is, as I'm never sure of how accurate it really is.
First/Last channel distance relative to interrogator --> an independent measure to accurately know where the cable begins or ends, as I've had problems in the past to georeference the cable when channel coordinates are missing or have notable uncertainties

Other comments/ideas:

Saving a number of free metadata slots for custom/optional entries with descriptions can be helpful for specific cases. I guess the extra volume increase for a few extra entries is not dramatic. A custom name could be given to these entries (e.g. Feature1, Feature2,..), or alternatively one of these could be assigned for each broad category (Overview, Interrogator,...). Some of these may need to be customized during processing, and not always previous to acquisition
I also think that it may be a good option to allow for coordinate reference frames that are not geographical, such as a user-defined Cartesian system with units of choice, for example, for small cables, geotechnical applications,...
As proposed in the meeting today, I also agree to generate a new category for pre-processing. It may even become customary to apply several procedures (perhaps even not yet popular...) to the raw data during acquisition.
Would a single, separate file for all the metadata make sense in the end? It could also include an overview with warnings about acquisition parameters changes during a given campaign. However, this may be a bit tricky to agree upon.

Use Markdown in RFC phase

I want to propose to move the proposal to Markdown instead of a static PDF.

Markdown has the advantage that it can be discussed, edited and revision in an transparent and open fashion. Once the community / owner is happy a Markdown document can be rendered into any format. PDF, HTML, Word, LaTeX or whatever.

For a proper and transparent RFC I advocate strongly for modern Markdown on GitHub.

json example

the structure of the meta data makes it seem very suitable for json format
This could also be easy to read in various other code, and exported if the meta data is used as header in a file format (e.g. miniDAS https://github.com/DAS-RCN/RCN_DASformat)

custom data field

Experience with many other formats is that during conception of the format it is never fully anticipated how it is going to be used.
Often, pre-defined header fields are canabalized for other purposes (notably in SEGY...)
I would suggest to add an empty container that can be used for custom (non-standard) header information where people could write things like VSP charge, or moment tensors, or weather conditions...

Strike and dip versus azimuth and inclination or dip

The spec currently suggests that the orientation of the fibre at each channel be given in strike ('degrees clockwise from east positive') and dip (angle downwards from the horizontal?).

Firstly, it feels unnatural to define an orientation or direction in space using 'strike'; it is usually used to define planes. (Trend and plunge are more common for directions or orientations.) Secondly, strike is more commonly measured as an azimuth from north, not from east. Finally, there are multiple geological conventions on how strike is measured, making this more ambiguous than need be.

I propose that we match existing conventions. SEED defines channel directions with azimuth (degrees east from local north) and dip (degrees down from the local horizontal; both in blockette 52). SAC uses inclination (degrees downwards from the local vertical direction) instead of dip.

My own preference is for the SAC convention, but would be happy with either as an improvement over the current proposal.

Tangentially, it is worth considering that strain(-rate) systems are orientational, not directional, so the direction of measure is 180°-ambiguous. In other words, it doesn't matter whether a channel is defined as pointing one way or the other in space.

Given than, should the spec constrain azimuths to be in the range [0°, 180°) (or [–90°, 90°))? Or is it better to leave the possibility of specifying the channel orientation in two ways? Or perhaps should it be defined as being the direction in which the laser travels? This would then remove the directional ambiguity.

Metadata formalize as JSON-Schema

I propose to formalize meta data proposal in term/ to a machine and human readable format. I vote for json-schema, the schemas can easily be validated and translated into various other schemas, see https://app.quicktype.io/.

The thingy would look like this

{
    "title": "Distributed Acoustic Sensing Metadata - Overview",
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "http://example.com/example.json",
    "type": "object",
    "default": {},
    "required": [
        "location",
        "number_of_interrogators",
        "principle_investigators",
        "start_datetime",
    ],
    "properties": {
        "location": {
            "type": "string",
            "default": "",
            "title": "Description of geographic location",
            "examples": [
                "Parkfield, California, USA"
            ]
        },
        "deployment_type": {
            "type": "string",
            "default": "",
            "title": "Describes the permanency of the deployment",
            "examples": [
                "permanent"
            ]
        },
        "number_of_interrogators": {
            "type": "number",
            "default": "",
            "title": "Number of interrogators used to collect data over the course of data collection",
            "examples": [
                2
            ]
        },
        "principle_investigators": {
            "type": "string",
            "default": "",
            "title": "Point of Contact(s)",
            "examples": [
                "P.I. Doe"
            ]
        },
        "start_datetime": {
            "type": "string",
            "format": "date-time",
            "default": "",
            "title": "Start date of experiment",
            "examples": [
                "2018-02-11T00:00:00"
            ]
        }
    },
    "examples": [
        {
            "location": "",
            "number_of_interrogators": 1,
            "principle_investigators": "",
            "start_datetime": "",
        }
    ]
}