GithubHelp home page GithubHelp logo

asfhyp3 / its-live-monitoring Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 1.0 2.01 MB

Monitoring stack for low-latency production of ITS_LIVE velocity granules

License: BSD 3-Clause "New" or "Revised" License

Python 96.13% Makefile 3.87%

its-live-monitoring's Introduction

HyP3 documentation

DOI

HyP3 documentation is built using MkDocs and the ASF Theme.

How to

Setting up a development environment

In order to automatically document some of our APIs, we use a conda environment with our APIs installed. You can get Miniconda (recommended) here:

https://docs.conda.io/en/latest/miniconda.html

Once conda is installed, from the repository root, you can create and activate a conda environment with all the necessary dependencies

conda env create -f environment.yml
conda activate hyp3-docs

Later, you can update the environment's dependencies with

conda env update -f environment.yml

Build and view the documentation site

With the hyp3-docs conda environment activated, run

mkdocs serve

to generate the documentation. This will allow you to view it at http://127.0.0.1:8000/. MkDocs will automatically watch for new/changed files in this directory and rebuild the website so you can see your changes live (just refresh the webpage!).

Note: mkdocs serve captures your terminal; use crtl+c to exit. It is recommended you use a second/dedicated terminal so you can keep this command running.

Deploy

This documentation site is deployed as a GitHub Organization website with a CNAME so that it's viewable at https://hyp3-docs.asf.alaska.edu/. The website is served out of the special https://github.com/ASFHyP3/ASFHyP3.github.io repository. Deployment is handled automatically with the .github/workflows/deploy_to_github_io.yml GitHub Action for any merge to main.

There is also a test site deployed to https://hyp3-docs.asf.alaska.edu/hyp3-docs, which tracks the develop branch of this repo and is served out of the gh-pages branch of this repo.

Enable or disable the announcement banner

We can display a site-wide banner for important announcements. The content of this banner is specified in overrides/main.html, which should contain the following placeholder text when the banner is not in use:

{% extends "partials/main.html" %}

{# Uncomment this block to enable the announcement banner:
{% block announce %}
<div id="announcement-content">
    ⚠️ TODO: Your announcement here.<br />
    <a class="announcement-link" href="TODO">Read the full announcement.</a>
</div>
{% endblock %}
#}

In order to enable the banner, uncomment the announce block and fill in the TODOs. Below is an example of an enabled announcement banner (taken from here):

{% extends "partials/main.html" %}

{% block announce %}
<div id="announcement-content">
    ⚠️ Monthly processing quotas were replaced by a credit system on April 1st.<br />
    <a class="announcement-link" href="/using/credits">Read the full announcement.</a>
</div>
{% endblock %}

When the announcement is no longer needed, restore the file to the placeholder text in order to disable the banner.

If you are building and viewing the site locally, you will need to exit with ctrl+c and then re-run mkdocs serve in order to re-render any changes you make to this file.

Markdown formatting

The way MkDocs and GitHub parse the markdown documents are slightly different. Some compatibility tips:

  • Raw links should be wrapped in angle brackets: <https://example.com>

  • MkDocs is pickier about whitespace between types (e.g., headers, paragraphs, lists) and seems to expect indents to be 4 spaces. So to get a representation like:


    • A list item

      A sub list heading
      • A sub-list item

    in MkDocs, you'll want to write it like:

    Good

    - A list item
    
        ##### A sub list heading
        - A sub-list item
    

    Bad

    - A list item
      ##### A sub list heading
      - A sub-list item
    
    - A list item
        ##### A sub list heading
        - A sub-list item
    
    - A list item
    
      ##### A sub list heading
      - A sub-list item
    

its-live-monitoring's People

Contributors

andrewplayer3 avatar asjohnston-asf avatar cirrusasf avatar jacquelynsmale avatar jhkennedy avatar jtherrmann avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

has12zen

its-live-monitoring's Issues

RT scenes end up in DLQ due to repeated "scene not found in STAC catalog" exceptions

Landsat real-time scenes ending in _RT should not be processed. We currently exclude them by restricting to T1 and T2 scenes in _qualifies_for_processing. However, these RT scenes generally aren't available in the STAC catalog when published. This leads to a "scene not found in STAC catalog" exception being raised at https://github.com/ASFHyP3/its-live-monitoring/blob/main/landsat/src/main.py#L160 before we can check whether the scene qualifies for processing. These scenes are correctly not processed, but we'd like them dismissed quietly like other non-qualifying scenes, rather than repeatedly throwing exceptions and ending up in the dead letter queue.

Handle Off-nadir scenes

In the intial ITS_LIVE pair picking scripts, off-nadir scenes have special handling

and right now, we restrict to only processing nadir scenes.

We should follow the methods in the script to enable processing of off-nadir scenes.

Move Landsat CloudFormation resources to a sub-stack

We renamed landsat/cloudformation.yml to cloudformation.yml as part of #41 because it had already been deployed as the its-live-monitoring-prod and its-live-monitoring-test CloudFormation stacks. Ideally we want the Landsat stack to be a sub-stack under the top-level CF template, but we didn't want to deal with renaming resources as part of that PR. I'm not sure how much of a hassle it will be to re-deploy those resources under a new sub-stack.

Adjust SQS redrive policy for production

The SQS is currently configured to quickly send messages that cannot be processed to the dead letter queue. We'll want to adjust those settings to messages are retried more times over a longer period as appropriate for an operational workload.

https://github.com/ASFHyP3/its-live-monitoring/blob/develop/landsat/cloudformation.yml#L33

  • VisibilityTimeout is the time between attempts
  • maxReceiveCount is the total number of attempts

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sqs-queue.html

Setup deploy workflow

We should set up a deploy workflows for at least a test stack.

For a test stack, it should do everything except submit jobs to HyP3 and instead just print/log the prepared jobs and/or the pairs GeoDataFrame.

Add tests

Currently, there are no tests 🤠 . We should flesh some out.

Implement partial batch responses

Right now, our Lambda consumes 10 SQS messages, and

  1. if any scene doesn't match the requirements in _check_scene, an AssertionError is raised
  2. if any issue after _check_scene raises an exception
    and all SQS messages go back into the queue for re-processing.

Only if processing all 10 scenes is successful are the messages removed from the queue.

We should instead implement partial batch responses:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting

So that (1)s are considered successful and only (2)s are sent back to the queue for re-processing.

When implement partial batch responses, we should follow the best practices and implement a dead letter queue to avoid snowball anti-patterns:
https://docs.aws.amazon.com/prescriptive-guidance/latest/lambda-event-filtering-partial-batch-responses-for-sqs/best-practices-partial-batch-responses.html

alert operators when SQS messages fail to be processed

We only attempt to process each SQS message a certain number of times before it is sent to the dead letter queue, as specified at https://github.com/ASFHyP3/its-live-monitoring/blob/develop/landsat/cloudformation.yml#L29

Any messages that end up in the dead letter queue will require manual review and will disappear after 14 days. Operators should receive some kind of push notification whenever any messages are available in the DLQ that require review.

That's probably a cloudwatch alarm on the number of messages in the DLQ plus an SNS email subscription, similar to HyP3's monitoring alarm: https://github.com/ASFHyP3/hyp3/blob/develop/apps/monitoring-cf.yml.j2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.