GithubHelp home page GithubHelp logo

pvenka / wallaby_database Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aussrc/wallaby_database

0.0 0.0 0.0 43 KB

Database schema for the WALLABY science post-processing

License: GNU Lesser General Public License v2.1

Python 95.70% Dockerfile 4.30%

wallaby_database's Introduction

WALLABY Database

src

The repository contains all of the .sql scripts (in the src/ folder) necessary for initialising a PostgreSQL database with the schema used for WALLABY post-processing. The tables can be separated into three different groups:

  • Source finding
  • Kinematics
  • Multi-wavelength.

Each group captures the tables required for contributions by the AusSRC, CIRADA and SpainSRC, respectively, to the production of WALLABY catalogue data.

Initialisation

The first script 01-users.sql is used to create the database and create the users that will be able to access the user. It will create:

  • admin user (full access)
  • wallaby user (read-only access)
  • vo user

You will need to create a file called psql.env which contains the environment variable POSTGRES_PASSWORD.

Source finding

The 02-source-finding.sql and 05-privileges.sql scripts are used to create the tables required for source finding and access control. The tables are summarised below

Name Description
run Name of the run (source finding application applied to a specific WALLABY data cube)
instance Instance of the run (source finding application is parallelised and will split the entire WALLABY data cube into sub-cubes).
detection Source finding application automatically identified detection.
product Data products associated with a given detection (e.g. moment maps)
source WALLABY admin determined source with a formal source name.
source_detection Many-to-one table for mapping a source to detections.
comment Comments applied to the detections table during manual inspection.
tag Tag name and description.
tag_detection Mapping from tags to the detections, added during manual inspection.
tag_source_detection Mapping from tags to the source_detection table for specifying release data.

Kinematics

The 03-kinematics.sql script is used to create the tables required for source finding and access control. The tables are summarised below:

..

Multi-wavelength

The 04-multi-wavelength.sql script is used to create the tables required for source finding and access control. The tables are summarised below:

...

Provenance

Provenance metadata allows for the results of the execution of the AusSRC WALLABY post-processing pipeline to be reproduced. This table, called run_metadata, links the Run with the metadata. The metadata that we include are:

  • repository (location of pipeline code)
  • branch (github branch to specify the component(s) of the pipeline that was executed)
  • version (github version or nextflow revision of the pipeline that was executed)
  • configuration (content of the nextflow.config file for the pipeline run)
  • parameters (content of the params.yaml for the pipeline run)
  • datetime (when the pipeline was executed)

Deployment

NOTE You will need to update passwords in the create.sql file for any sort of security. Currently there are default passwords which are not appropriate for a production environment. You need to change them for either of the following deployment approaches.

Docker

The easiest method for deploying the WALLABY database is with the docker. We have provided a Dockerfile that creates an PostgreSQL image with the scripts in the src subdirectory. To get this up and running:

docker-compose up --build

Manual

You can also install the schema on an existing PostgreSQL instance. You will also need to install dependencies postgis and pg_sphere. For Ubuntu and PostgreSQL 12, this can be done with the following command:

sudo apt install postgresql postgresql-contrib postgis sudo apt-get install postgresql-12-pgsphere

Once you have the dependencies you will need to initialise the database with the SQL scripts in the src/ subdirectory. To do this you can run the line below for each of the scripts in that directory. This is also how you can update an existing instance of the WALLABY database when there have been changes to the repository.

psql -h localhost -U admin -d wallabydb -f src/01-users.sql

orm

We also provide an object relational mapper with SQLAlchemy that will be compatible with the database.

Tests

We have written some unit tests to ensure that the SQLAlchemy ORM works with the database schema. You can run them from the orm/ directory with the following commands

python -m unittest tests/tests_wallaby_run.py
python -m unittest tests/tests_wallaby_instance.py

Operations

The file src/06-operations.sql contains WALLABY tables used for managing the operations of the survey. Below are the tables and their descriptions:

Table Description
observation tracking observations and their quality (WALLABY footprint check) in CASDA
tile pre-defined WALLABY tiles in the sky
postprocessing tracking any post-processing jobs (WALLABY pipeline) run on the tiles
mosaic many to many mapping from tiles to a postprocessing job
prerequisite TBA
prerequisite_identifier TBA

Important fields

quality

The quality field refers to whether or not the footprint has passed the manual quality control step. This step involves running the WALLABY footprint check pipeline on the data once it becomes available through CASDA. These are the allows values for quality:

quality Description
NULL no quality check pipeline has been executed on this data
PENDING quality check pipeline has been triggered for this footprint and is awaiting input from WALLABY
PASSED passed quality check and can be used for tiles
FAILED failed quality check and should not be used for creating tiles

status

The status field refers to the status of the pipeline executed on the data. This can either be the quality check pipeline that we discuss in the subsection above, or the WALLABY post-processing pipeline. The allowed values are in the table below.

status Description
NULL Pipeline has not been executed for this data
QUEUED Slurm job for the pipeline has been submitted and is waiting to run
RUNNING Pipeline is currently running
COMPLETED Pipeline has run to completion
FAILED Pipeline failed before running to completion

wallaby_database's People

Contributors

axshen avatar manuparra avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.