GithubHelp home page GithubHelp logo

climsoft-web's Introduction

Climsoft

Climsoft is a software suite for storing climatic data in a secure and flexible manner and for extracting useful information from the data. Climsoft is a free open-source project, licensed under GPL3. It is widely used by the National Meteorological and Hydrological Services of developing countries.

This repository contains Climsoft Desktop for Windows. We recommend that this application is only used over a secure local area network (LAN) at a particular site, or over a virtual private network (VPN) when used from an external site connecting over the internet.

Further information is available from climsoft.org and the Climsoft wiki. Here you can find information about Climsoft, including links to downloads of the guides and tutorials, discussion forums, the Climsoft road map and information about the history and governance of the project. The Climsoft project is controlled by a Steering Group, with a Technical Advisory Group, Project Coordinator and Lead Developers.

The Climsoft project welcomes Contributors to get involved and help us. If you wish to become a Climsoft Contributor you can fill in and submit a Climsoft Contributor's Agreement form from our Met eLearning site. If you have not already done so, you will need to set up a user profile on the Moodle site, and enrol yourself in the Climsoft "course". The form will be checked, and you will then be given access to the Contributors' discussion forum.

climsoft-web's People

Contributors

conlooptechnologies avatar patowhiz avatar

Watchers

 avatar  avatar  avatar

climsoft-web's Issues

Quality Control Implementation

Overview

After reviewing the WMO CDMS specifications, I suggest developing the following quality control (QC) submodules to enhance our climate data management system:

  1. Duplicate Data Check: To eliminate duplicate entries during data ingestion, preventing unnecessary redundancy.

  2. Limits Check: During data ingestion, values outside the acceptable range will be flagged for review.

  3. Source Check: To differentiate and validate identical data from various sources, designating the most reliable source as final.

  4. Missing Data Check: To detect data gaps, facilitating informed decisions on handling these absences for subsequent analysis.

  5. Internal Consistency Check: To verify the coherence of related data points within the dataset, such as temperature and dew point correlations. This check will include Same value, Jump value and Interelement checks.

  6. Temporal Consistency Check: To identify abrupt temporal changes, distinguishing between potential errors and actual environmental shifts.

  7. Spatial Consistency Check: To assess data across various locations, identifying spatial anomalies that may indicate localized discrepancies.

  8. Extreme Value Check: To scrutinize and authenticate any extreme values or statistical outliers beyond the normal range.

  9. Data Homogeneity Check: To correct biases from changes in observational methods or locations, especially vital for long-term climate studies.

  10. Metadata Check: To investigate metadata for additional insights that may elucidate detected anomalies or inconsistencies.

I recommend constructing a QC workflow that processes these checks in a logical and efficient sequence, starting with simpler tasks and advancing to more complex analyses. While some steps may occur concurrently, the overall process should be iterative, ensuring a comprehensive and nuanced data quality assessment.

Furthermore, each QC step will be systematically logged in the observation model, which is specifically designed to accommodate these checks, enhancing transparency and traceability in data quality control.

Some of these checks could be user driven (manual) or system driven (automated) or semi automated depending on the nature of the quality control check.

Request for Comments

I invite feedback on this proposal. Your insights and suggestions will be invaluable.

Dialogs vs Pages

When should we be using dialogs instead of pages.

I think by default, we should prefer pages instead of dialogs. Some controls may have the functionality of showing dialogs, for instance, an elements drop down may have advanced search or options in a dialog that it launches.

Climsoft Architecture, Technology Stack and Deployment Setup

Summary:

This feature proposal suggests a modernization of Climsoft's technology stack to adopt a microservices architecture and containerized deployment. The goal is to enhance scalability, maintainability, and performance through a modular setup of independent services and progressive web application (PWA) client.

Detailed Description:

Server-side Components:

  • Database Layer:

    • Utilize PostgreSQL as the primary data storage solution for Climsoft, ensuring robust data management and relational capabilities.
  • API Layer:

    • Implement a TypeScript-based Rest API using NestJS framework to facilitate input/output operations and serve as the bridge between the database and the client-side PWA.
  • Other services Layer:

    • Develop a suite of other services as microservices, each running in its own environment and performing distinct, performance-critical tasks.
    • Microservices will interact with the Rest API and handle specific operations, potentially integrating third-party modules for enhanced functionality.

Client-side Components:

  • PWA Client:
    • Construct a client-side PWA using TypeScript, HTML5, CSS, and Angular, enabling an app-like experience with offline capabilities and responsive design.

Architectural Notes:

  • Input/Output Operations:

    • The Rest API will predominantly manage IO operations, following standard CRUD patterns.
  • Other services Environment and Scalability:

    • Each service will be tailored to operate independently, allowing for isolated development, deployment, and scaling.
  • Containerized Deployment:

    • Embrace containerization for the deployment of all the services, using tools like Docker for consistency across development, testing, and production environments.

Proposal for Implementation:

  1. Design and Develop the Microservices:

    • Map out the Climsoft functionalities to be modularized into microservices.
    • Ensure each microservice is well-defined with clear interfaces and responsibilities.
  2. Rest API and Database Integration:

    • Develop the Rest API with robust endpoints, authentication, and authorization mechanisms.
    • Set up the PostgreSQL database schemas and ensure efficient data access patterns are established.
  3. PWA Client Development:

    • Craft the PWA client with user-centric design principles, focusing on a seamless, intuitive user experience that leverages modern web standards.
  4. Containerization Strategy:

    • Establish a containerization strategy that includes Dockerfiles and Docker Compose scripts for local development and testing.
    • Define a deployment pipeline that integrates with CI/CD workflows for smooth rollouts and updates.

Rationale:

By transitioning to a microservices architecture and embracing containerization, Climsoft can achieve greater agility in its development process and operational efficiency. This proposal aims to ensure that Climsoft's software infrastructure is future-proof, scalable, and robust, enabling the team to respond swiftly to changing requirements and growth.

Request for Team Feedback:

The Climsoft Development Team's input on this proposal is invaluable. We seek your thoughts on this modernization approach, and any insights or concerns you may have about its implementation.

Technology Stack:

1

GitHub Repository:

1

Development Deployment Stack:

1

Production Deployment Stack:

1

Adoption of ISO 8601 Date Time Format in API Communication

Overview
I propose we standardise the Date Time format used in API communication to the ISO 8601 format.
All Date Time parameters must be converted to UTC and follow the YYYY-MM-DDTHH:mm:ss.sssZ format (e.g., 2024-03-10T21:00:00.000Z) for both sending to and receiving from the API.

Proposed Structure

  1. Date Time Format: All API routes(those that relate to date time ) expect Date Time parameters in UTC, specifically in the ISO 8601 format.
  2. Underlying Implementation: The API utilizes TypeORM for CRUD operations, which converts string DateTime parameters into JavaScript Date objects for timestamptz column operations.
  3. Frontend and Backend Communication: Both frontend and backend, developed in JavaScript, will adhere to this standardized Date Time format for ease of data manipulation and communication.
  4. Database Interaction: When interacting directly with the PostgreSQL database, especially for raw queries, the extended ISO 8601 SQL standard format (YYYY-MM-DD HH:mm:ss.sss ±[hh]) should be used, acknowledging PostgreSQL's format requirements for timestamptz columns.

Rationale
The decision to use the mentioned ISO 8601 format over the SQL date time standard is influenced by:

  1. Global Standardization: ISO 8601 is a globally recognized standard, facilitating easier integration and consistent formatting across different systems and technologies.
  2. Ecosystem Cohesion: The exclusive use of JavaScript across our stack simplifies DateTime handling and conversion, reducing the risk of format-related errors.
  3. Compatibility with TypeORM: TypeORM's reliance on JavaScript Date objects makes ISO 8601 the more compatible choice for our operations.
  4. UTC settings: ISO 8601 format will ensure consistency with our reliance on UTC settings for date time operations as outlined in issue #20.

Expected Benefits

  • Consistency: A unified Date Time format across the API ensures consistent data handling and interpretation.
  • Simplicity: Simplifies the conversion and comparison of Date Time values across different time zones.
  • Interoperability: Enhances the API's compatibility with external systems and services by adhering to a widely accepted standard.
  • Efficiency: Streamlines backend and frontend communication, reducing the need for repetitive format conversion.

Request for Comments
I invite all team members to review this proposal and provide feedback or concerns regarding this adoption for our API communications. Your insights are invaluable in ensuring that we choose the most effective and efficient approach to Date Time handling in our system.

Data Sources Servers or Connectors

Combine Automatic stations with API sources as a concept of servers for data sources or should we call them connectors?
That means metadata for a connectors will have to define the communication protocol, for instance, FTP or HTTP. They will then use the sources structure to determine data ETL operations.

Database Technology

Overview

Climsoft desktop utilises MariaDB for managing climate data. However, to accommodate our evolving requirements, including the need to store data properties in JSON format and the anticipated increase in spatial data related to climate and hydrology, we are considering a migration to PostgreSQL with the PostGIS extension. This shift is aimed at harnessing the superior spatial and JSON data handling capabilities of PostgreSQL and PostGIS.

MariaDB, while effective for time series data needs, falls short in managing complex JSON and spatial data structures that the database will be expected to store. The advanced indexing and querying capabilities needed for climate and hydrology datasets necessitate a more robust solution.

Rationale and Benefits

I propose that we migrate our database management system to PostgreSQL with the PostGIS extension. This change would bring the following benefits:

  1. Advanced JSON Handling: PostgreSQL's JSONB data type offers superior capabilities for storing, indexing, and querying JSON data, making it ideal for handling the structured and semi-structured data prevalent in our records, especially for tracking changes.

  2. Superior Spatial Capabilities: PostGIS extends PostgreSQL with comprehensive support for geographic objects, facilitating sophisticated spatial queries and analyses. This enhancement is crucial for effectively managing the spatial aspects of our climate and hydrology data, streamlining processes, and eliminating the need for additional spatial data handling solutions.

  3. Robust Indexing Options: The extensive indexing options provided by PostgreSQL and PostGIS, such as GIST and GIN indexes, are tailor-made for efficient retrieval of spatial and JSON data. This improvement will directly enhance the performance and responsiveness of our data querying and analytical operations.

  4. Community and Ecosystem: PostgreSQL, being one of the most popular open-source relational database systems, has a large and active community. This ensures continuous improvements and support, along with a plethora of tools and integrations available, which can be beneficial for our system's maintainability.

Request for Comments

I invite feedback on this proposal. Your insights and suggestions will be invaluable.

Forms Data Entry Operations

Overview:

This proposal outlines Climsoft data ingestion process via user-friendly data entry forms, designed to accommodate the varying operational workflows of users like National Meteorological and Hydrological Services (NMHS). It introduces a flexible forms-based data entry system to meet diverse user needs.

User Stories and Scenarios:

  1. User Story 1 (Hourly Data Entry for Multiple Elements):

    • As an observer at an official station, I need to enter data for multiple elements hourly, directly from instruments or record slips.
  2. User Story 2 (Daily Data Entry for a Single Element):

    • As an observer at a volunteer station, I want to enter data for a single element at a specific hour each day.
  3. User Story 3 (Bulk Monthly Data Entry for Multiple Stations):

    • Responsible for data entry across 10 or more stations, I need a system that allows for the input of multiple elements from monthly sheets.
1

Common Data Entry Case Scenarios:

  • Daily or monthly data entry for multiple elements at a specific hour.
  • Daily or monthly data entry for a single element at a specific hour.
  • Synoptic hour data entry for multiple elements, done daily or monthly.
  • Hourly data entry for multiple elements, done daily or monthly.

Proposal for Data Entry Process:

Administrator Responsibilities:

  1. Set Element Thresholds:

    • Define global and, optionally, monthly upper and lower limits for elements to assist in data validation during entry.
  2. Define Data Entry Forms:

    • Create and configure forms that dynamically generate the required metadata for building the entry interface.
  3. User and Station Assignment:

    • Allocate stations to users to allow access rights.
  4. Station and Element Assignment:

    • Allocate elements measured to at the station to stations .
  5. Station and Form Assignment:

    • Allocate forms to be used to enter data to stations.

Dynamic Form Creation:

  • Administrators can craft forms that mirror the physical layout of record slips or return forms at stations by specifying the structure definitions.
1

User Data Entry Workflow:

  1. Station Selection:

    • Users access stations based on their assigned stations.
  2. Form Access:

    • Forms associated with a station are made available to the user, with form accessibility hinging on station form metadata.
  3. Use of Entry Selectors:

    • Users make selections for data entry using controls defined by the form's metadata.
  4. Data Entry Fields:

    • Users input data via fields that are again defined by the form's metadata.
  5. Data Saving:

    • Once data entry is complete, the user can save the data, with the system validating entries against the predefined element limits and other form's validation parameters.
1

Rationale:

The proposed feature addresses the real-world workflows of Climsoft users, like within NMHS organizations, by creating a customizable, role-specific data entry experience that enhances data integrity and simplifies the entry process.

Request for Team Feedback:

Feedback and suggestions from the Climsoft development team are welcomed and encouraged to refine this proposal and ensure that the final implementation meets user requirements effectively.

Implement Tooltips for Input Fields in Specific Modules

To enhance user experience and provide immediate context, we need to add tooltips to the input fields within specific modules such as 'sources'. These tooltips will serve as quick reminders of the purpose and requirements of each input field, aiding users in accurately completing forms and navigating the interface more efficiently.

Please identify the modules and corresponding input fields that would benefit most from the addition of tooltips. Contributions are needed to design and implement these tooltips effectively. Suggestions on tooltip content that clearly conveys field purposes are also welcome.

UTC Settings for API and Database

Overview
As we strive for uniformity and precision in handling date and time operations across our application, it's crucial we align on the standards and configurations that best suit our development and operational practices, especially regarding the use of Coordinated Universal Time (UTC). This prompts me to raise a discussion to evaluate the necessity of explicitly configuring UTC Settings for Our API and Database.

Here's a brief overview of our current setup and some inherent behaviors:

  • Date and Time Operations: Our expectation is that all date and time CRUD operations, including all requests to the API, operate in UTC to maintain consistency and accuracy.

  • PostgreSQL's Handling of Timestamps: By default, PostgreSQL stores timestamp with time zone values in UTC. Additionally, PostgreSQL sessions default to UTC when the session time zone isn't explicitly specified. Note also, PostgreSQL session time zone determines whether date time conversion during retrieval and saving(when time zone is part of the date time value) .

  • Node.js and TypeORM Behavior: In the context of Node.js, the default time zone setting is considered to be UTC. Correspondingly, TypeORM, through the pg driver, sends date-time values as UTC times, derived from JavaScript Date objects. It's important to note that JavaScript Date objects are inherently time zone agnostic*, representing specific points in time without direct association to any time zones.

  • Current Time Zone Configuration: In our API's database connection settings, we haven't altered the default time zone configuration. This implicitly means we operate PostgreSQL under UTC settings. Similarly, we haven't modified the Node.js time zone setting from its default.

Given this context, I'd like us to consider and discuss the necessity of explicitly configuring UTC settings within our API and database connections. Some points to ponder:

  • Considering the default behaviors mentioned, is explicitly setting everything to UTC redundant or a necessary step for clarity and consistency across different environments?

  • Could there be scenarios or edge cases where not explicitly setting UTC might lead to discrepancies or bugs, particularly as our application scales or in distributed environments?

  • Is there additional maintenance overhead associated with explicitly configuring these settings, and does it outweigh the benefits of ensuring consistency and clarity in our date and time handling practices?

I believe having a clear and consistent approach to managing time zones, especially committing to UTC, could help mitigate potential issues down the line. However, I'm eager to hear your thoughts, insights, and any experiences you might have had related to this topic.

Challenges with Docker Containerization in Development Setup

Overview

I've encountered several challenges while attempting to containerize our full application stack for development deployment. These issues have significantly impacted the efficiency of our development process, particularly regarding our Angular and NestJS applications running within Docker containers. Below, I detail the specific problems faced:

Challenges Encountered

  1. Angular Development Server Accessibility:

    • The Angular development server does not respond when accessed from the host machine, despite running inside a Docker container.
  2. Angular Hot Reloading:

    • Hot reloading for the Angular application does not work, even after mounting the Docker volume to the host directory. Changes made to the codebase do not trigger a live reload, hindering efficient development.
  3. NestJS Hot Reloading:

    • Similarly, hot reloading for the NestJS application fails to function as expected. While there is a workaround mentioned on Stack Overflow, it introduces significant delays in compilation times following file changes, making it impractical for a smooth development workflow.

Temporary Resolution

Given these substantial hurdles, I've decided to limit Docker containerization solely to the PostgreSQL instance within our development deployment setup. This approach circumvents the issues encountered with the Angular and NestJS applications but is not an ideal solution.

Request for Assistance

I am seeking advice or solutions from the community that might help address these challenges, enabling full-stack containerization without compromising on development efficiency. Any insights, alternative approaches, or updates that could alleviate these issues would be greatly appreciated.

You can have a look at the attempted docker development set up in the docker-backup-2 branch

User Management and Audit Control Implementation

Overview:
This proposal outlines a user management structure and audit control mechanisms for Climsoft, tailored to suit the hierarchical operational needs of users like National Meteorological and Hydrological Services (NMHS) organization. It introduces rigorous permissions linked to user roles and robust logging for database interactions.

User Management Implementation:

  1. User Roles:
    • Administrator:
      • Expected to operate at the headquarter level.
      • Granted full permissions for all types of database writes within Climsoft.
    • Approver:
      • Expected to operate at the headquarter or regional level.
      • Typically responsible for quality control; will have the authority to write to the database, adhering to strict guidelines.
      • Data access depends on station access permissions.
    • Entry Clerk:
      • Expected to operate at the station level.
      • Typically an observer or personnel responsible for data entry; authorized to write observation data into the database.
      • Data access depends on station access permissions.
    • Viewer:
      • Expected to operate at any level.
      • Typically an analytics personnel or consultant responsible for data analysis.
      • Data access depends on station access permissions.
1
  1. Audit Trails and User ID Tracking:

    • All database write operations will log the user ID of the individual performing the action.
    • Observation data entries, conducted by the Climsoft 'process', will be an exception to this rule.
    • The intent is to enable comprehensive future auditing and maintain integrity and traceability of data.
  2. Data Analysis and Access Rights:

    • All user roles, including Viewers, will be granted rights to perform advanced data analysis.
    • Access rights and analysis capabilities will be dependent on station access permissions.
    • An analysis module, operating in a sandboxed environment within Climsoft, will be developed to ensure data security while allowing complex analytical operations.
    • This strategy aims to permit users outside the organization to conduct in-depth analyses within Climsoft’s security framework.
  3. Database Export Oversight:

    • Database exports may require Administrator approval, subject to the organization’s chosen Climsoft configuration.
    • Export activities will be recorded in logs to facilitate audit trails and uphold data governance standards.
  4. Future Enhancements to Role-Based Access:

    • In the long term, Climsoft user roles will be augmented with optional, fine-grained control policies.
    • These policies will provide additional data access controls at the module level, offering a more tailored and secure user experience.

Rationale:
This proposal is motivated by the need for a secure, scalable, and auditable user management system within Climsoft that reflects the operational hierarchy and responsibilities of organizations like NMHS . It seeks to establish a structured environment where data integrity, security, and traceability are the cornerstones of the system while maintaining flexible access to data for analysis.

Request for Team Feedback:
I invite the development team’s thoughtful feedback, comments, and suggestions on this proposal. Your expertise and insights will be helpful to the successful design and implementation of these user management and audit control features.

Climsoft Web First Release

As discussed in our annual catch-up meeting regarding the aims for 2024/5, I suggest we target the completion of the following modules in the first release of the web application.

1

Moving forward, we should delve into specifics about what needs to be included in these modules for the initial release.

I am currently refining some implementation ideas for the quality control module, so expect some changes in the sub-modules shown in the diagram.

The first release will be designed to work in conjunction with the desktop database model via the export module. We anticipate making some modifications to the desktop application to facilitate seamless integration with the web application.

Furthermore, we need to discuss the web model and application in terms of its utility for regional services. Key features should include seamless data transfer between regional services and their member institutes. We also need to address the user management aspects of such integration.

Element Metadata Implementation

Overview

I propose we adopt a hierarchical modeling approach for the storage of elements metadata, aligning with the structure outlined by the World Meteorological Organization's Global Climate Observing System (WMO GCOS) Essential Climate Variables (ECVs). This decision is inspired by the comprehensive table available at GCOS Essential Climate Variables. This approach will involve structuring our metadata around four key levels: domains, subdomains, types, and elements.

Proposed Structure

  • Domains: The highest categorization level, representing the broadest classification of climate variables; Atmosphere, Land and Ocean.
  • Subdomains: Subcategories within domains that offer more specific classification (e.g. Surface, Upper-air etc.).
  • Types: Even more specific classification within subdomains, grouping variables by their nature (e.g. Temperature etc.).
  • Elements: The most granular level, representing individual climate variables (e.g. Maximum Temperature etc.).

For example, the element "Maximum Temperature" will be categorized as follows:

  • Domain: Atmosphere
  • Subdomain: Surface
  • Type: Temperature
  • Element: Maximum Temperature

Rationale

The adoption of the WMO GCOS structure for our metadata organization is expected to significantly streamline and enhance the user experience in terms of quality control and analysis of climate data elements. This structured approach ensures a clear, consistent, and logical hierarchy of climate variables, facilitating easier navigation, understanding, and manipulation of climate data. It also allows us to adopt standard element naming conventions.

Element Setup, Addition and Editing

  • We intend to populate the elements table with standard element names during the setup phase. This initial setup ensures consistency and adherence to global standards from the outset.
  • Upper limit, lower limit and activated are the only fields that are expected to be editable by users.
  • Editing default elements is restricted. That default elements are created by Climsoft as part of its setup.
  • Adding new element is allowed. User added elements will start from a specific element id range.
  • Adding element type is allowed. This should be carefully evaluated.

Element Naming Conventions

These conventions will guide our naming process, ensuring our names and descriptions are interoperable and consistent with international standards.

Element Units

  • All observation data will be stored using standard units. Conversion to the element standard unit will occur during ingestion. Data sources differing from the standard units must provide unit conversion metadata to ensure accuracy and consistency across our datasets.

Temporal Resolution Handling

It's important to note that the temporal resolution aspect will not be directly associated with the elements' metadata. Instead, temporal resolution will be associated with the observation itself. This means we will not differentiate elements in the metadata based on their temporal resolution (e.g., "Maximum Temperature Daily" vs. "Maximum Temperature Hourly"). The period aspect of the observation data table will take care of temporal resolution. This approach ensures a streamlined elements table, avoiding unnecessary duplication and simplifying the structure for users.

Expected Benefits:

  • Simplistic and Consistent User Functionality: Users will benefit from a straightforward and uniform interface that logically organizes climate variables, making data more accessible and easier to analyze.
  • Enhanced Quality Control: The clear definitions of relationships among climate variables, combined with a separate handling of temporal resolution, will aid in the development of more targeted and effective quality control measures.
  • Efficient Data Analysis: Analysts will be able to quickly identify and correlate related climate variables across different levels of the hierarchy, improving the efficiency and depth of climate data analysis, while also easily adapting to the specific temporal resolution of observations.

Implementation Considerations

  • We will need to carefully map all current desktop and institution's data elements to the new hierarchical structure, ensuring accuracy and consistency.
  • The design of the observation data table must accommodate the separation of temporal resolution from the elements' metadata, ensuring flexibility and clarity in data handling.
  • User interface and experience design will play a crucial role in making the hierarchical structure and the approach to temporal resolution intuitive and user-friendly.
  • Training and documentation will be essential to help users understand and leverage the new structure and temporal resolution handling for quality control and analysis.

Request for Comments

I invite all team members to review this proposed approach, including our strategy for handling temporal resolution, and provide feedback, suggestions, or concerns. Your input is crucial to refining our implementation strategy and ensuring that we fully leverage the potential of the GCOS structure to enhance our climate data management and analysis capabilities.

1

Station Metadata Implementation

Overview

I propose the adoption of a station metadata model inspired by the OSCAR/Surface observations metadata framework. This model will categorize station metadata into eight essential categories, facilitating comprehensive and structured data representation that aligns with global standards.

Proposed Structure

The station metadata will be organized into the following categories:

  1. Observation Processing Method: Determines how observations are processed at the station, categorized as Manual, Automatic, or Hybrid.
  2. Observation Environment: Specifies the monitoring environment type, including various settings like Air (fixed/mobile), Lake/River (fixed/mobile), Land (fixed/mobile/on ice), Sea (fixed/mobile/on ice), and Underwater (fixed/mobile).
  3. Observation Focus: Identifies the station's primary observation purpose, with options ranging from Agricultural to Space Weather stations.
  4. Drainage Basin: Indicates the geographical drainage area associated with the station.
  5. Climate Zone: Classifies the station's location according to the Köppen climate classification.
  6. Administration Unit: Defines the local governance entity the station falls under.
  7. Organisation: Identifies the owning organization of the station.
  8. Network Affiliation: Specifies the network(s) with which the station shares data.

Rationale

Adopting a structured and standardized approach to station metadata ensures consistency, facilitates data integration and interoperability, and aligns with international standards for meteorological and hydrological data processing and sharing.

Expected Benefits

Implementing this model will enhance data quality, improve metadata management, and support more accurate and reliable climate and hydrological analyses. It will also promote collaboration and data sharing across different meteorological and hydrological networks.

Implementation Considerations

  • Authorisation Handling: Data access will be primarily controlled at the station level, considering factors like climate zone, drainage basin, administrative unit, organisation, and network affiliation .
  • Spatial Handling: The station's location will be crucial for spatial quality control and analyses.
  • Elevation Handling: The default elevation value used for observations will be the station's elevation above sea level, unless specified otherwise during data ingestion.

Request for Comments

I invite feedback on this proposed metadata structure to ensure it meets the users needs and aligns with global best practices. Your insights and suggestions will be invaluable in refining and implementing this model effectively.

Data Flow

Overview:

This proposal aims to standardise the data flow within Climsoft from initial entry to the generation of final products, emphasizing the need for consistent data source identification, unified storage, robust QC checks, and transparent logging for auditability.

Detailed Description:

Data Ingestion:
Data ingestion is done through 3 source types that define the data ingestion methods.

  • Forms: Allow users to manually input data via forms, capturing real-time observations.
  • Machine: Enable automated data capture from instruments and sensors.
  • Import: Provide functionality for batch imports of data from external sources.

Each entry method must clearly document the source of the data to ensure traceability. Each data source is associated with the source type.

Observations Table:

  • Centralise data storage by saving entries from all sources into one Observations table, maintaining data in its original form.
  • Ensure that the Observations table structure is conducive to identifying and querying the data source.

Quality Control (QC) Protocol:

  • Establish a comprehensive QC protocol that scrutinises data for accuracy and consistency.
  • Make corrections within the Observations table, allowing for real-time data integrity enhancement.

Logging and Audit Trails:

  • Create a robust logging system that captures every action taken on the data, including QC checks and edits.
  • Ensure that data change logs and QC test logs are transparent and easily retrievable for audit purposes.

Final Product Generation:

  • Define a clear pathway for data to be classified as 'final' post-QC for use in Climsoft's product generation.
  • Emphasize that final products are based on the highest quality, QC-verified observations.

Proposal for Enhancements:

  1. Streamlined Data Entry:

    • Formalize data entry procedures that require source identification for every data input.
  2. Quality Control Reinforcement:

    • Implement a unified QC system that is both rigorous and standardized across all data types.
  3. Auditability and Transparency:

    • Develop an enhanced logging system for full transparency and accountability of data modifications and QC results.
  4. Finality in Product Creation:

    • Introduce criteria within Climsoft to determine and label data as 'final' for the production of climatological outputs.

Rationale:

The integrity of Climsoft's data and the trust in its climatological products hinge on a clear, accountable, and verifiable data management process. This proposal seeks to reinforce these aspects, ensuring Climsoft remains a reliable and authoritative tool for meteorological and hydrological data processing.

Request for Team Feedback:

I request feedback from the development community to refine this proposal. Contributions from the development team are essential to the successful enhancement of Climsoft's data workflow.

1

User Account Creation and Password Reset Processes

Overview:
This proposal aims to introduce an improved process for user account creation and password management for both Cloud/Internet Users and Local Network Users of Climsoft. The goal is to enhance security, streamline user onboarding, and provide a seamless password reset experience.

Cloud/Internet Users:

  1. Account Creation Process:

    • Upon the creation of a user account by an administrator, Climsoft will generate a temporary, random password for the new user.
    • The system will then send an email to the user with a web link containing the temporary password credentials.
    • Clicking on the link will redirect the user to a Climsoft verification page, prompting them to set a new password by entering and confirming it.
    • Once the new password is set, the user will gain access to Climsoft.
  2. Password Reset Process:

    • Users forgetting their password can initiate a reset by entering their email.
    • Climsoft resets the password and sends a web link to the user's email for password credentials reset.
    • Following the link leads to a verification page where the user is prompted to set a new password.
    • After setting the new password, the user regains access to Climsoft.

Local Network Users:

  1. Account Creation Process:

    • For user accounts created by an administrator within a local network environment (where the server is not internet-exposed), the backend will generate a temporary, random password.
    • This password is then sent back to the frontend portal used by the administrator, who gets one-time access to this password.
    • The administrator can then securely pass this temporary password to the user.
  2. Password Reset Process:

    • Users requiring a password reset are prompted to enter their email on a local network interface.
    • The backend processes this request and directs the user to a verification page without sending an email. On this page, the user is prompted to enter and confirm a new password.
    • Upon setting the new password, the user is granted access to Climsoft.

Rationale:
This proposal is motivated by the need for a secure, user-friendly process for managing Climsoft access for users across different environments (Cloud/Internet vs. Local Network). The consideration of different environments ensure user onboarding and password resets are smooth, efficient and secure.

Implementation Consideration:

  • Ensure the process is secure by using strong, temporary passwords and secure email communication.
  • For local network users, provide clear instructions to administrators on how to securely communicate the temporary password to the user.
  • Implement audit trails and logs for account creation and password reset activities for security and compliance purposes.

Additional Security Measures for Cloud/Internet Environment Users:
For users accessing Climsoft in a Cloud/Internet environment, it's important to note that while these enhancements are aimed to improve security within Climsoft, users are also expected to implement broader security measures to protect against other exploits and vulnerabilities beyond Climsoft's scope. This includes using up-to-date anti-virus software, implementing strong network security protocols, and ensuring regular security training for all users.

Request for Comments:
I invite all team members comments, suggestions, and feedback on these proposal. Your insights are valuable to refining and ensuring the effective implementation of these features.

Choosing an ETL/ELT Framework for Modules

Overview

Our project includes modules that will require common Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) operations. We have the option to use the Arquero JavaScript library or rely on languages that offer powerful ETL/ELT tools, such as SQL, Python, or R.

I propose we rely on SQL for our ETL/ELT operations for the following reasons:

  1. Platform Agnosticism: SQL is supported by numerous powerful analytical engines, ensuring our solution remains platform-agnostic.
  2. Ease of Use: SQL is relatively easier to learn and use compared to more programmatic approaches. This allows more team members, including those with lower coding skills, to contribute effectively.
  3. Performance and Efficiency: Analytical engines are optimized for handling large datasets, and their implementations use tools written in lower-level languages that guarantee optimal performance and efficiency.

After evaluating multiple analytical engines such as Apache Spark, Presto, and DuckDB, I recommend using DuckDB for most of our ETL/ELT operations. DuckDB is in-process OLAP database that excels in a single-node environment, offering high performance for our current user needs. DuckDB is also very well supported by NodeJS, Python and R. This means that we can integrate it in our API and in future in our analytics and visualisations processes that may use Python or R.

In the future, if our users require analysis that can only be efficiently performed in distributed environments, we can transition to Apache Spark, which also provides robust support for ETL/ELT operations in distributed settings using SQL.

Request for Comments

I invite feedback on this proposal. Your insights and suggestions will be invaluable.

Refactor Enforcement of Station Elements Metadata

We need to address the practicality of enforcing station elements metadata during data ingestion. The metadata in question includes the limits of the elements and the specific elements associated with each station. Currently, I have postponed the enforcement of these metadata rules to gather user feedback on their practicality. This approach allows us to adjust our strategy based on practical user insights before fully implementing these constraints.

Feedback is needed to determine the most practical point in the data ingestion process to enforce these metadata rules without disrupting user workflows. We must assess whether immediate enforcement hinders usability or if delayed enforcement compromises data integrity. Please share your experiences and suggestions on how to balance these considerations effectively.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.