GithubHelp home page GithubHelp logo

wetzel-desy / scingestor Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jkotan/scingestor

0.0 0.0 0.0 1.76 MB

License: GNU General Public License v3.0

Shell 0.07% Python 99.40% Dockerfile 0.53%

scingestor's Introduction

Scicat Dataset Ingestor

github workflow docs Pypi Version Python Versions

The scingestor python package provides a support for scripts which ingest Datasets and OrigDatablocks into the SciCat metadata server.

scicat_dataset_ingestor

SciCat Dataset ingestor server ingests scan metadata just after a scan is finished. It can be executed by

scicat_dataset_ingestor -c ~/.scingestor.yaml

Configuration variables

The configuration written in YAML can contain the following variables

  • scicat_url (str) , default: "http://localhost:3000/api/v3"
  • ingestor_credential_file (str) , default: None
  • beamtime_dirs (list<str>) , default: []
  • beamtime_base_dir (str) , default: ""
  • ingestor_var_dir (str) , default: ""
  • ingestor_username (str) , default: "ingestor"
  • dataset_pid_prefix (str) , default: ""
  • dataset_update_strategy ("no", "patch", "create", "mixed") , default: "patch"
  • relative_path_in_datablock (bool) , default: False
  • scandir_blacklist (list<str>) , default: ["/gpfs/current/scratch_bl", "/gpfs/current/processed", "/gpfs/current/shared"]
  • beamtimeid_blacklist_file (str) , default: None
  • beamtime_type_blacklist (list<str>) , default: ["P"]
  • chmod_json_files (str) , default: None
  • max_scandir_depth (int), default: -1
  • oned_in_metadata (bool) , default: False
  • max_oned_size (int) , default: None
  • scan_metadata_postfix (str) , default: ".scan.json"
  • datablock_metadata_postfix (str) , default: ".origdatablock.json"
  • attachment_metadata_postfix (str) , default: ".attachment.json"
  • metadata_in_var_dir (bool) , default: False
  • use_corepath_as_scandir (bool) , default: False
  • beamtime_filename_postfix (str) , default: "beamtime-metadata-"
  • beamtime_filename_prefix (str) , default: ".json"
  • datasets_filename_pattern (str) , default: "scicat-datasets-{beamtimeid}.lst"
  • ingested_datasets_filename_pattern (str) , default: "scicat-ingested-datasets-{beamtimeid}.lst"
  • file_dataset_metadata_generator (str) , default: "nxsfileinfo metadata -k4 -o {metapath}/{scanname}{scanpostfix} -b {beamtimefile} -p {beamtimeid}/{scanname} -w {ownergroup} -c {accessgroups} {mastefile}"
  • dataset_metadata_generator (str) , default: "nxsfileinfo metadata -k4 -o {metapath}/{scanname}{scanpostfix} -c {accessgroups} -w {ownergroup} -b {beamtimefile} -p {beamtimeid}/{scanname}"
  • datablock_metadata_generator (str) , default: "nxsfileinfo origdatablock -s *.pyc,*{datablockpostfix},*{scanpostfix},*~ -p {pidprefix}{beamtimeid}/{scanname} -w {ownergroup} -c {accessgroups} -o {metapath}/{scanname}{datablockpostfix} "
  • datablock_metadata_stream_generator (str) , default: "nxsfileinfo origdatablock -s *.pyc,*{datablockpostfix},*{scanpostfix},*~ -w {ownergroup} -c {accessgroups} -p {pidprefix}{beamtimeid}/{scanname} "
  • datablock_metadata_generator_scanpath_postfix (str) , default: " {scanpath}/{scanname} "
  • attachment_metadata_generator (str) , default: "nxsfileinfo attachment -w {ownergroup} -c {accessgroups} -o {metapath}/{scanname}{attachmentpostfix} {plotfile} "
  • metadata_generated_callback (str) , default: "nxsfileinfo groupmetadata {lastmeasurement} -m {metapath}/{scanname}{scanpostfix} -d {metapath}/{scanname}{datablockpostfix} -a {metapath}/{scanname}{attachmentpostfix} -p {beamtimeid}/{lastmeasurement} -f -k4 "
  • metadata_group_map_file (str) , default: ""
  • raw_metadata_callback (bool) , default: False
  • skip_multi_datablock_ingestion (bool) , default: False
  • skip_multi_attachment_ingestion (bool) , default: False
  • skip_scan_dataset_ingestion (bool) , default: False
  • call_metadata_generated_callback (bool) , default: False
  • metadata_group_map_file_generator_switch (str) , default: " --group-map-file {groupmapfile} "
  • raw_metadata_callback_switch (str) , default: " --raw "
  • execute_commands (bool) , default: False
  • plot_file_extension_list (list<str>) , default: ["png", "nxs", "h5", "ndf", "nx", "fio"]
  • master_file_extension_list (list<str>) , default: ["nxs", "h5", "ndf", "nx", "fio"]
  • chmod_generator_switch (str) , default: " -x {chmod} "
  • relative_path_generator_switch (str) , default: " -r {relpath} "
  • oned_dataset_generator_switch (str) , default: " --oned "
  • max_oned_dataset_generator_switch (str) , default: " --max-oned-size {maxonedsize} "
  • override_attachment_signals_generator_switch (bool) , default: " --override "
  • hidden_attributes_generator_switch (str) , default: " -n {hiddenattributes} "
  • hidden_attributes (str) , default: "nexdatas_source,nexdatas_strategy,units"
  • attachment_signals_generator_switch (str) , default: " -s {signals} "
  • attachment_axes_generator_switch (str) , default: " -e {axes} "
  • attachment_frame_generator_switch (str) , default: " -m {frame} "
  • attachment_signal_names (str) , default: ""
  • attachment_axes_names (str) , default: ""
  • attachment_image_frame_number (str) , default: ""
  • ingest_dataset_attachment (bool) , default: False
  • override_attachment_signals (bool) , default:False
  • retry_failed_dataset_ingestion (bool) , default:False
  • retry_failed_attachement_ingestion (bool) , default:False
  • log_generator_commands (bool) , default: False
  • add_empty_units_generator_switch (str) , default: " --add-empty-units "
  • add_empty_units (bool) , default: True
  • metadata_copy_map_file (str) , default: None
  • metadata_copy_map_file_generator_switch (str) , default: " --copy-map-file {copymapfile} "
  • inotify_timeout (float) , default: 0.1
  • get_event_timeout (float) , default: 0.01
  • ingestion_delay_time (float) , default: 5.0
  • max_request_tries_number (int) , default: 100
  • recheck_dataset_list_interval (int) , default: 1000
  • recheck_beamtime_file_interval (int) , default: 1000
  • request_headers (dict<str,str>) , default: {"Content-Type": "application/json", "Accept": "application/json"}
  • scicat_datasets_path (str) , default: "Datasets"
  • scicat_proposals_path (str) , default: "Proposals"
  • scicat_datablocks_path (str), default: "OrigDatablocks"
  • scicat_attachments_path (str), default: "Datasets/{pid}/Attachments"
  • scicat_users_login_path (str), default: "Users/login"
  • owner_access_groups_from_proposal (bool), default: False
  • metadata_keywords_without_checks (list<str>), default: ["techniques", "classification", "createdBy", "updatedBy", "datasetlifecycle", "numberOfFiles", "size", "createdAt", "updatedAt", "history", "creationTime", "version", "scientificMetadata", "endTime"]

e.g.

beamtime_dirs:
  - "{homepath}/gpfs/current"
  - "{homepath}/gpfs/commissioning"
scicat_url: http://localhost:3000/api/v3
ingestor_credential_file: "{homepath}/gpfs/pwd"

Pattern keywords for configuration variables

The datasets_filename_pattern, ingested_datasets_filename_pattern and ingestor_var_dir can contain the {beamtimeid} and {hostname} keywords, e.g. "scicat-ingested-datasets-{beamtimeid}.lst" or "scicat-ingested-datasets-{hostname}-{beamtimeid}.lst" which is instantiated during the ingestor execution.

The beamtime_dirs, beamtime_base_dir, ingestor_var_dir, ingestor_credential_file, scandir_blacklist can contain the {homepath} keyword.

Similarly, file_dataset_metadata_generator, dataset_metadata_generator, datablock_metadata_generator, datablock_metadata_stream_generator, datablock_metadata_generator_scanpath_postfix, attachment_metadata_generator, chmod_generator_switch, relative_path_generator_switch can contain the following keywords: {beamtimeid} , {scanname}, {chmod}, {scanpath}, {metapath}, {relpath}, {beamtimeid}, {beamline}, {pidprefix}, {beamtimefile}, {scanpostfix}, {datablockpostfix}, {ownergroup}, {accessgroups}, {hostname}, {homepath}, {hiddenattributes}, {ext}, "{masterfile}", "{plotfile}"

The "{masterfile}" is either equal to "{scanpath}/{scanname}.{ext}" or "{scanpath}/{scanname}/{scanname}.{ext}". Also the "{plotfile}" is either equal to "{scanpath}/{scanname}.{plotext}" or "{scanpath}/{scanname}/{scanname}.{plotext}".

scicat_dataset_ingest

Re-ingestion script for SciCat Datasets and OrigDatablocks is usually launched at the end of the beamtime.

scicat_dataset_ingest -c ~/.scingestor.yaml

Its configuration written YAML like for scicat_dataset_ingestor

scicat_ingest

General ingestion script for SciCat Models could be used for manual scicat model ingestion, e.g. Sample, Instrument or DerivedDataset.

scicat_ingest  -m Samples  -c ~/.scingestor.yaml  ./metadata.json

Its configuration written YAML like for scicat_dataset_ingestor

Installation

Required packages

  • python3 >= 3.7
  • nxstools >= 3.38.0
  • inotifyx (python3 version)
  • requests
  • setuptools
  • pyyaml
  • pytest (to run tests)
  • sphinx (to build the documentation)

Install from sources

The code from https://github.com/jkotan/scingestor can be built with

python3 setup.py install

To build the documentation use

python3 setup.py build_sphinx

The resulting documentation can be found below build/sphinx/html in the root directory of the source distribution.

Finally, the package can be tested using

python3 -m pytest test

Install in conda or pip environment

The code can be installed in your conda environment by

conda create -n myenv python=3.9
conda activate myenv

pip install inotifyx-py3
pip install scingestor

or in your pip environment by

python3 -m venv myvenv
. myvenv/bin/activate

pip install inotifyx-py3
pip install scingestor

Debian and Ubuntu packages

Debian bookworm, bullseye, buster or Ubuntu lunar, jammy, focal packages can be found in the HDRI repository.

To install the debian packages, add the PGP repository key

sudo su
curl -s http://repos.pni-hdri.de/debian_repo.pub.gpg  | gpg --no-default-keyring --keyring gnupg-ring:/etc/apt/trusted.gpg.d/debian-hdri-repo.gpg --import
chmod 644 /etc/apt/trusted.gpg.d/debian-hdri-repo.gpg

and then download the corresponding source list, e.g. for bookworm

cd /etc/apt/sources.list.d
wget http://repos.pni-hdri.de/bookworm-pni-hdri.list

or jammy

cd /etc/apt/sources.list.d
wget http://repos.pni-hdri.de/jammy-pni-hdri.list

respectively.

Finally,

apt-get update
apt-get install python3-scingestor

scingestor's People

Contributors

jkotan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.