GithubHelp home page GithubHelp logo

linz / geostore Goto Github PK

View Code? Open in Web Editor NEW
33.0 7.0 2.0 4.15 MB

Central storage, management and access for important geospatial datasets

License: MIT License

Python 97.09% Shell 1.01% Dockerfile 0.10% Nix 1.80%
data-lake-store geospatial-data metadata-management geospatial-datasets linz

geostore's Introduction

Geostore

Deploy Total alerts Language grade: Python CodeQL Analysis Code style: black Coverage: 100% branches Dependabot Status hadolint: passing Imports: isort Kodiak Checked with mypy code style: prettier pylint: passing Python: 3.9 shellcheck: passing Conventional Commits

Central storage, management and access solution for important geospatial datasets. Developed by Toitū Te Whenua Land Information New Zealand.

Prerequisites

Geostore VPC

A Geostore VPC must exist in your AWS account before deploying this application. At Toitū Te Whenua LINZ, VPCs are managed internally by the IT team. If you are deploying this application outside Toitū Te Whenua LINZ, you will need to create a VPC with the following tags:

  • "ApplicationName": "geostore"
  • "ApplicationLayer": "networking"

You can achieve this by adding the networking_stack (infrastructure/networking_stack.py) into app.py before deployment as a dependency of application_stack (infrastructure/application_stack.py).

Verify infrastructure settings

This infrastructure by default includes some Toitū Te Whenua LINZ-specific parts, controlled by settings in cdk.json. To disable these, simply remove the context entries or set them to false. The settings are:

  • enableLDSAccess: if true, gives Toitū Te Whenua LINZ Data Service/Koordinates read access to the storage bucket.
  • enableOpenTopographyAccess: if true, gives OpenTopography read access to the storage bucket.

Development setup

One-time setup which generally assumes that you're in the project directory.

Common

  1. Install Docker

  2. Configure Docker:

    1. Add yourself to the "docker" group: sudo usermod --append --groups=docker "$USER"
    2. Log out and back in to enable the new group
  3. Set up an AWS Azure login shortcut like this in your .bashrc:

    aws-azure-login() {
        docker run --interactive --rm --tty --volume="${HOME}/.aws:/root/.aws" sportradar/aws-azure-login:2021062807125386530a "$@"
    }

Ubuntu

  1. Install nvm:

    cd "$(mktemp --directory)"
    wget https://raw.githubusercontent.com/nvm-sh/nvm/master/install.sh
    echo 'b674516f001d331c517be63c1baeaf71de6cbb6d68a44112bf2cff39a6bc246a install.sh' | sha256sum --check && bash install.sh
  2. Install Poetry:

    cd "$(mktemp --directory)"
    wget https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py
    echo 'b35d059be6f343ac1f05ae56e8eaaaebb34da8c92424ee00133821d7f11e3a9c install-poetry.py' | sha256sum --check && python3 install-poetry.py
  3. Install Pyenv:

    sudo apt-get update
    sudo apt-get install --no-install-recommends build-essential curl libbz2-dev libffi-dev liblzma-dev libncurses5-dev libreadline-dev libsqlite3-dev libssl-dev libxml2-dev libxmlsec1-dev llvm make tk-dev wget xz-utils zlib1g-dev
    cd "$(mktemp --directory)"
    wget https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer
    echo '3aa49f2b3b77556272a80a01fe44d46733f4862dbbbc956002dc944c428bebd8 pyenv-installer' | sha256sum --check && bash pyenv-installer
  4. Enable the above by adding the following to your ~/.bashrc:

    if [[ -e "${HOME}/.local/bin" ]]
    then
        PATH="${HOME}/.local/bin:${PATH}"
    fi
    
    # nvm <https://github.com/nvm-sh/nvm>
    if [[ -d "${HOME}/.nvm" ]]
    then
        export NVM_DIR="${HOME}/.nvm"
        # shellcheck source=/dev/null
        [[ -s "${NVM_DIR}/nvm.sh" ]] && . "${NVM_DIR}/nvm.sh"
        # shellcheck source=/dev/null
        [[ -s "${NVM_DIR}/bash_completion" ]] && . "${NVM_DIR}/bash_completion"
    fi
    
    # Pyenv <https://github.com/pyenv/pyenv>
    if [[ -e "${HOME}/.pyenv" ]]
    then
        PATH="${HOME}/.pyenv/bin:${PATH}"
        eval "$(pyenv init --path)"
        eval "$(pyenv init -)"
        eval "$(pyenv virtualenv-init -)"
    fi
  5. Configure Docker:

    1. Add yourself to the "docker" group: sudo usermod --append --groups=docker "$USER"
    2. Log out and back in to enable the new group
  6. Install project Node.js: nvm install

  7. Install Go. This is required for running pre-commit (shfmt hook)

  8. Run ./reset-dev-env.bash --all to install packages.

  9. Enable the dev environment: . activate-dev-env.bash.

  10. Optional: Enable Dependabot alerts by email. (This is optional since it currently can't be set per repository or organisation, so it affects any repos where you have access to Dependabot alerts.)

Re-run ./reset-dev-env.bash when packages change. One easy way to use it pretty much seamlessly is to run it before every workday, with a crontab entry like this template:

HOME='/home/USERNAME'
0 2 * * 1-5 export PATH="${HOME}/.pyenv/shims:${HOME}/.pyenv/bin:${HOME}/.poetry/bin:/root/bin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/run/current-system/sw/bin" && cd "PATH_TO_GEOSTORE" && ./reset-dev-env.bash --all

Replace USERNAME and PATH_TO_GEOSTORE with your values, resulting in something like this:

HOME='/home/jdoe'
0 2 * * 1-5 export PATH="${HOME}/.pyenv/shims:${HOME}/.pyenv/bin:${HOME}/.poetry/bin:/root/bin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/run/current-system/sw/bin" && cd "${HOME}/dev/geostore" && ./reset-dev-env.bash --all

Re-run . activate-dev-env.bash in each shell.

Nix

  1. Run nix-shell.
  2. Optional: Install and configure lorri and run direnv allow . to load the Nix shell whenever you cd into the project.

Restart your nix-shell when packages change.

When setting up the project SDK point it to .venv/bin/python, which is a symlink to the latest Nix shell Python executable.

Optional

Enable Dependabot alerts by email. (This is optional since it currently can't be set per repository or organisation, so it affects any repos where you have access to Dependabot alerts.)

AWS Infrastructure deployment

  1. Configure a named AWS profile with permission to deploy stacks

  2. Environment variables

    • GEOSTORE_ENV_NAME: set deployment environment. For your personal development stack: set GEOSTORE_ENV_NAME to your username.

      export GEOSTORE_ENV_NAME="$USER"

      Other values used by CI pipelines include: prod, nonprod, ci, dev or any string without spaces. Default: test.

    • AWS_DEFAULT_REGION: The region to deploy to. For practical reasons this is the nearest region.

      export AWS_DEFAULT_REGION=ap-southeast-2
    • RESOURCE_REMOVAL_POLICY: determines if resources containing user content like Geostore Storage S3 bucket or application database tables will be preserved even if they are removed from stack or stack is deleted. Supported values:

      • DESTROY: destroy resource when removed from stack or stack is deleted (default)
      • RETAIN: retain orphaned resource when removed from stack or stack is deleted
    • GEOSTORE_SAML_IDENTITY_PROVIDER_ARN: SAML identity provider AWS ARN.

  3. Bootstrap CDK (only once per profile)

    cdk --profile=<AWS-PROFILE-NAME> bootstrap aws://unknown-account/ap-southeast-2
  4. Deploy CDK stack

    cdk --profile=<AWS-PROFILE-NAME> deploy --all

    Once comfortable with CDK you can add --require-approval=never above to deploy non-interactively.

If you export AWS_PROFILE=<AWS-PROFILE-NAME> you won't need the --profile=<AWS-PROFILE-NAME> arguments above.

Development

Third party updates

When Dependabot updates any Python dependencies in pip requirements files (*.txt), make sure to run ./generate-requirements-files.bash with the relevant path to update the version of all its dependencies. Sometimes this will revert the file to the previous state, which means that specific dependency update is not compatible with the rest of the packages in the same file. For example, say geostore/pip.txt lists a package foo, which depends on bar~=1.0. This information is not part of the requirements file, so Dependabot might update bar to version 2.0, not being aware that it's incompatible with the current version of foo. generate-requirements-files.bash effectively re-checks this, creating a file with a compatible set of dependencies, which may mean reverting the update done by Dependabot. In this case, simply close the Dependabot PR.

We're using poetry2nix to generate a Nix derivation from the poetry.lock file, to allow people to develop this project with either Nix or Poetry1. Sometimes package updates will break the Nix shell, usually because Python packages don't list all their build dependencies. These need to be set up as a poetry2nix override. First try upgrading nixpkgs using niv update and re-running nix-shell; maybe the latest stable poetry2nix already has an override for this package. If not, you either have to work one out yourself (see upstream overrides) or report it.

Adding or updating Python dependencies

To add a development-only package: poetry add --dev --lock PACKAGE='*'

To add a production package:

  1. Add the package using poetry add --lock --optional PACKAGE='*'.
  2. Put the package in alphabetical order within the list.
  3. Mention the package in the relevant lists in [tool.poetry.extras].
  • Make sure to update packages separately from adding packages. Basically, follow this process before running poetry add, and do the equivalent when updating Node.js packages or changing Docker base images:

    1. Check out a new branch on top of origin/master: git checkout -b update-python-packages origin/master.
    2. Update the Python packages: poetry update --lock. The rest of the steps are only necessary if this step changes poetry.lock. Otherwise you can just change back to the original branch and delete "update-python-packages".
    3. Commit, push and create pull request.
    4. Check out the branch where you originally wanted to run poetry add.
    5. Rebase the branch onto the package update branch: git rebase update-python-packages.

    At this point any poetry add commands should not result in any package updates other than those necessary to fulfil the new packages' dependencies.

    Rationale: Keeping upgrades and other packages changes apart is useful when reading/bisecting history. It also makes code review easier.

  • When there's a merge conflict in poetry.lock, first check whether either or both commits contain a package upgrade:

    • If neither of them do, simply git checkout --ours -- poetry.lock && poetry lock --no-update.
    • If one of them does, check out that file (git checkout --ours -- poetry.lock or git checkout --theirs -- poetry.lock) and run poetry lock --no-update to regenerate poetry.lock with the current package versions.
    • If both of them do, manually merge poetry.lock and run poetry lock --no-update.

    Rationale: This should avoid accidentally down- or upgrading when resolving a merge conflict.

  • Update the code coverage minimum in pyproject.toml and the badge above on branches which increase it.

    Rationale: By updating this continuously we avoid missing test regressions in new branches.

Upgrading Python version

To minimise the chance of discrepancies between environments it is important to run the same (or as close as possible) version of Python in the development environment, in the pipeline, and in deployed instances. At the moment the available versions are constrained by the following:

When updating Python versions you have to check that all of the above can be kept at the same minor version, and ideally at the same patch level.

Running tests

Prerequisites:

  • Authenticated to a profile which has access to a deployed Geostore.

To launch full test suite, run pytest.

Debugging

To start debugging at a specific line, insert import ipdb; ipdb.set_trace().

To debug a test run, add --capture=no to the pytest arguments. You can also automatically start debugging at a test failure point with --pdb --pdbcls=IPython.terminal.debugger:Pdb.

Upgrading CI runner

jobs.<job_id>.runs-on in .github sets the runner type per job. We should make sure all of these use the latest specific ("ubuntu-YY.MM" as opposed to "ubuntu-latest") Ubuntu LTS version, to make sure the version changes only when we're ready for it.

GitHub Actions cache clearing

To throw away the current cache (for example in case of a cache corruption), simply change the CACHE_SEED repository "secret", for example to the current timestamp (date +%s). Subsequent jobs will then ignore the existing cache.

Manual admin

Delete dataset with versions

To do this, you'll need the dataset title and ID.

Once a dataset has some files in it, it's much harder to delete. This is intentional, to avoid accidental loss of important and costly data. The following should be a complete set of actions to delete a dataset, with template values in UPPERCASE. Note the trailing slashes to make sure we limit the commands to the specific dataset!

  1. Remove reference to the dataset from the top-level catalog.json:
    1. Download the file: aws s3 cp s3://linz-geostore/catalog.json .
    2. Manually edit the file and delete the object with the deleted dataset
    3. Re-upload the file: aws s3 cp catalog.json s3://linz-geostore/catalog.json
  2. Remove the dataset from DynamoDB:
    1. Run geostore dataset delete --id=DATASET_ID
  3. Delete dataset files from S3:
    1. Run aws s3 rm --recursive s3://linz-geostore/DATASET_TITLE/.
    2. Ask AWS support to remove the delete markers returned by aws s3api list-object-versions --bucket=linz-geostore --prefix=DATASET_TITLE/ | jq .DeleteMarkers.

Geostore Release

Versioning and Frequency

We aim to release at the end of each agile sprint (fortnightly), or whenever required (e.g. bugfix, feature rollout). Each release triggers a production deployment via GitHub Actions.

Geostore follows semantic versioning. The release is tagged with release-major.minor.patch (e.g. release-0.11.0).

Steps

The simplest way to deploy a release is to follow the process recommended by GitHub. Release notes can be automatically generated from GitHub. This is optional and provides a list of commit titles since the last release. Commits from dependabot are excluded from automatically generated release notes, as specified in .github/release.yml. You should always check the release notes and update accordingly as needed.

Note: Geostore has no rollback process. Any fixes will need to be carried out in a roll forward basis.

Footnotes

  1. When using Nix, make sure to remove the .venv directory. Mixing Nix and Poetry leads to weird behaviour.

geostore's People

Contributors

amrouemad avatar billgeo avatar dependabot[bot] avatar dwsilk avatar github-actions[bot] avatar imincik avatar jimlinz avatar l0b0 avatar mitchellpaff avatar strk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

geostore's Issues

Use Python 3.8 across the board

We've already encountered some issues with Python 3.6 (cattrs no longer supporting it and less support for type annotations), and we have no reason to support multiple Python versions in production.

To do:

  • Set Python version in .python-version.
  • Use a single version in CI, removing the need for the strategy matrix.
  • Use the PYTHON_3_8 runtime for AWS Lambda jobs.
  • Change the version in pyproject.toml to ^3.8,<3.9.

Optimize S3 file read chunk size

botocore.response.StreamingBody.iter_chunks has a default chunk size of 1024. This may not be optimal for the file sizes we have to deal with. We should check whether different chunk sizes make a big difference to file processing time. Test process suggestion:

  1. Change the checksum Lambda function to take an optional chunk size parameter.
  2. Deploy the Lambda.
  3. Create files in S3 with representative sizes with content from /dev/random.
  4. Write a small separate Lambda function to profile the first one (to avoid a big HTTP overhead in the timing info) using different chunk sizes on the files created in the last step.

Cache pip/Poetry downloads

See Caching dependencies to speed up workflows and pip examples, bearing in mind some complexities:

  • We might want to cache the requirements for different endpoints separately, to minimize the amount of copying per job.
  • Alternatively, we might want to cache the non-development (pip install --no-dev) and full (pip install) dependencies separately.

Both of these should be doable using PIP_CACHE_DIR.

Convert metadata to ISO 19115/19139

User Story

In order to share my metadata with users via the LINZ Data Service, as a Data Maintainer, I want to convert the metadata I have already recorded in the Data Lake (STAC standard) to ISO 19115/19139 standard.

Acceptance Criteria

  • Given a Data Lake dataset has valid STAC metadata, When a Data Analyst or a connected system extracts the dataset, They optionally receive a valid ISO 19115/19139 metadata XML file.
  • Given a LINZ Data Service data source is created for a Data Lake dataset, when the Data Analyst adds metadata from the data source, they can successfully import metadata from the data source.
  • Given valid required STAC metadata content, when the metadata is converted, content is converted/copied to ISO valid content.
  • Given valid optional STAC metadata congtent, when the metadata is converted, content is converted/copied to ISO valid content.

Additional context

Destination:
Target
Content mapping

Definition of Ready

Definition of Done

Subtasks

Rename repo?

I've been calling the product 'LINZ Geospatial Data Lake' to make it clear it's not for non-geo data.
We should consider renaming this repo I think to encapsulate what it does.

Option could be linz-geospatial-data-lake or linz-geo-data-lake? Alternatively we could come up with a completely new name for it?

Should also remove the '3' at some point and rename https://github.com/linz/linz-data-lake to something that indicates it has been replaced.

aws service accounts required for CD

For CD via GitHub Actions to deploy the Data-Lake infrastructure aws service accounts for AWS LI Datalake NonProd and AWS LI Datalake Prod are required

  • I will contact Terrence to ensure the process of getting such roles is still the same (i.e. go through service desk)
  • Start the process of getting these
  • Document these roles in confluence

Dataset Space: allow space deletion only if empty

User Story

So that I don't accidentally lose my important data, as Data Maintainer, I want to only delete datasets with no data in them.

Note: datasets can be altered in every way, so there should never be a need to delete them. We may implement an 'archive' feature later if needed.

Accpetance Criteria

  • Given a dataset with 1 or more versions, when a dataset DELETE is requested, then a message is returned and the dataset is not deleted
  • Given a datasets with no versions created, when a dataset DELETE is requested, then the dataset is deleted and no longer appears in the list of datasets

Tasks

  • [ ]

Ready

  • This story is ready

Done

  • This story is done.

Make Geospatial Data Lake repo public

There are organisations outside of LINZ that are interested in what we are doing. We should make the repo public as soon as possible. I think it's fine to do this while it's a work in progress as long as we indicate that somehow.

Tasks

  • LGTM
  • check source code for non public content
  • open and close tickets

Set global tags per AWS stack

  • app.py
core.Tag.add(app, "CostCentre", "1050")
core.Tag.add(app, "ApplicationName", "geospatial-data-lake")
core.Tag.add(app, "Owner", "Bill M. Nelson")
  • data_stores/data_lake_stack.py
core.Tag.add(self, "EnvironmentType", env)

Add Batch job to verify file checksum

Input: STAC asset with checksum.

Output:

Infra: slow launch of AWS Batch containers

There is a significant latency (more than 10 seconds) in AWS Batch containers launch for each new container in array.

ECS agent is configured to cache Docker images by following configuration, but it might not work correctly.

ECS_IMAGE_PULL_BEHAVIOR=prefer-cached >> /etc/ecs/ecs.config

More investigation is needed. Requires ssh to ECS instance.

Verify test coverage

Goals:

  • Make sure we don't miss untested code in PRs.
  • Improve overall quality.
  • Ensure a strong final coverage.
  • Encourage writing more unit tests rather than higher-level, slower and more brittle tests.

Tasks:

  • Calculate test coverage in CI.
  • Produce a report of how much coverage there is per file.
  • If possible, save/publish the coverage report during CI runs for ease of use.
  • Fail CI if coverage is below the minimum.
  • Make it easy to update the coverage minimum (ideally a single number in the code base), and document how to do it.

Create the Datalake S3 bucket

A s3 bucket is required to store datalake data

  • Bucket should be deployable via CDK to both prod and nonprod environments
  • This bucket (for now) should be private and only accessible to those within the data-lake-prod/non-prod roles

Validate the 'LINZ' top-level metadata extension

So that I can ensure my metadata has all the LINZ required metadata elements, as Data Maintainer, I want to validate the LINZ top-level metadata extension

LINZ metadata extension profile

Acceptance Criteria

  • LINZ top-level extension is validated as a 'required' (mandatory) extension
  • If the LINZ top-level extension is missing then an error is returned to the user and import is aborted
  • Validation errors are returned to the user
  • Validation rules are well-tested including negative tests (in the source STAC json schema repo?)

Tasks

  • Add LINZ STAC spec schema to Geostore validation
  • Make sure 'user friendly' error messages are returned to the user

Use specific Ubuntu version for CI

We currently run everything on "ubuntu-latest". If that image changes at an impractical time, such as at release, that could cause CI blockage for a while. We should probably use a fixed Ubuntu version and document how and when to upgrade it.

Write tests for the datalake s3 bucket

Tests are required to validate the state of the data-lake environment and src code

Discussion required:

@imincik @billgeo it would be good to discuss test strategy.

I note that the latest CDK testing docs state:

currently, TypeScript is the only supported language for testing AWS CDK infrastructure, though 
we intend to eventually make this capability available in all languages supported by the AWS CDK. 

I am therefore expecting we

  • Use the PyTest framework for unit level tests that are executed locally and via CD
  • Have tests that run in CD after deployment to ensure the bucket is accessible and permissions are as expected

Verify code complexity

Cyclomatic complexity is a good proxy for detecting code which is difficult to reason about.

In an earlier Python project we started with a maximum cyclomatic complexity of four and ended up with a maximum of six after 30 months (~8 developer-years). This project should be able to stay well within that.

Store supplementary files

User Story

In order to provide context for my data, as a Data Maintainer, I want to store some supplementary files with my data and I don't want these validated.

Acceptance Criteria

  • ...
  • ...
  • ...

Additional context

Examples:

  1. Thumbnails (or a derived dataset?)
  2. Documents such as reports, spreadsheets, specifications, data dictionaries, plans etc
  3. Index data, such as vector data with extents of raster data tiles (or should these be a different dataset?)

Definition of Ready

Definition of Done

Subtasks

Lineage/linkages between datasets

User Story

In order to know where my data came from and find upstream data I may be interested in, as a Data Maintainer, I want to get the lineage, or linkages, between the datasets.

Acceptance Criteria

  • ...
  • ...
  • ...

Additional context

E.g. in hydro things like:

raw hydro survey data -> thick point clouds -> bathy grids

Definition of Ready

Definition of Done

Subtasks

Cache NPM downloads

This should speed up the CI pipelines at least somewhat, and avoid hitting the fairly unreliable npm.org on every CI run.

Decide on development environment (personal AWS accounts vs Localstack)

Personal AWS accounts

Pros

  • ready out-of-box
  • it is exactly the same place where prod environment will be deployed

Cons

  • AWS stack update deadlocks occur from time to time during development and they usually last couple of hours
  • slow deployment of new code (for example when developing Lambda f.)
  • difficult to run unit tests using Github Actions CI and AWS account
  • conflicts between global resource names deployed to multiple dev's accounts (S3 bucket names)
  • full AWS CI/CD Pipeline might be needed (CodePipeline, CodeDeploy)

Localstack running on dev's machine

Pros

  • nice Pytest integration
  • fast execution of Pytest unit tests against Localstack
  • nice Pytest integration with Github Actions CI
  • easier ways of code debugging on local machine
  • no infrastructure cost
  • in case of troubles, there is paid Localstack Pro edition - https://localstack.cloud/

Cons

  • Localstack's AWS emulation might not be (is not) 100% complete and troubles free

Verify STAC format of metadata file

Should be able to use the STAC JSON schema directly.

Questions:

  • How are the catalog, collection and item schemas related?

    STAC Collections share the same fields with Catalogs and therefore every Collection is also a valid Catalog.

    Catalogs and collections can link to items, catalogs and collections.

  • Can all STAC JSON files be validated against the top-level collection schema file? If not, how do we detect whether a file should be validated against the catalog, collection or item schema?

  • Are we comfortable with transforming JSON into Python dicts before validating them? They can't be exactly equivalent, since JSON parsing isn't consistent. jsonschema.validate seems to take a Python list or dict, so presumably the transformation is expected to be reliable enough. We are also going to be using Python as the main language in this project, so only some truly gnarly JSON should be able to cause issues.

  • Which links do we have to follow to verify an entire dataset? Should we follow all .links[] | .href, including .rel == "self" and .rel == "parent"? Are there other links we need to follow?

Notes:

  1. Use a Git submodule for the STAC JSON schema repo. This avoids independently tracking their content, and makes it trivial to upgrade the schema version when we want to.
  2. As an MVP, just let any validate error propagate all the way.
  3. Write a function which takes an S3 URL and validates the file behind it.
  4. Change the function to validate each link it hasn't yet visited.
  5. Given a link structure like A → B, A → C, B → D and C → D, make sure each file is only validated once. This means we can't use a naive recursive implementation, since D would be validated after each of B and C.
  6. Make sure to install and verify optional format validators. Includes at least date-time and uri.

Document s3 storage

As part of the Store Topo Historic Imagery data epic s3 data storage is to be delivered.

Relevant software documentation must be also delivered

Discussion required

What degree of detail is required?

  • Docs already include running cdk deploy
  • Do we require as-builts that outline names and permissions? though it can be easily argued that the cdk code is self-documenting in this sense.
  • What other docs do we require as part of the Store Topo Historic Imagery data documentation deliverable?

Create initial Data Lake CD

Task

  • deploy all changes from master branch to AWS nonprod account
  • when release tag is create in release-x.y branch, deploy to AWS prod account

Acceptance Criteria

  • automated deployment to prod and non prod AWS is tested and is working

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.