GithubHelp home page GithubHelp logo

nasa-pds / data-upload-manager Goto Github PK

View Code? Open in Web Editor NEW
0.0 10.0 0.0 6.14 MB

Data Upload Manager (DUM) component for managing the interface for data uploads to the Planetary Data Cloud from Data Providers and PDS Nodes.

Home Page: https://nasa-pds.github.io/data-upload-manager

License: Apache License 2.0

Python 73.72% JavaScript 3.16% HCL 23.12%
s3-storage upload

data-upload-manager's Introduction

PDS Data Upload Manager

The PDS Data Upload Manager provides the client application and server interface for managing data deliveries and retrievals from the Data Providers to and from the Planetary Data Cloud.

Prerequisites

The PDS Data Delivery Manager has the following prerequisties:

  • python3 for running the client application and unit tests
  • awscli (optional) for deploying the service components to AWS (TBD)

User Quickstart

Install with:

pip install pds-data-upload-manager

To deploy the service components to an AWS environment:

TBD

To execute the client, run:

pds-ingress-client.py <ingress path> [<ingress_path> ...]

Code of Conduct

All users and developers of the NASA-PDS software are expected to abide by our Code of Conduct. Please read this to ensure you understand the expectations of our community.

Development

To develop this project, use your favorite text editor, or an integrated development environment with Python support, such as PyCharm.

Contributing

For information on how to contribute to NASA-PDS codebases please take a look at our Contributing guidelines.

Installation

Install in editable mode and with extra developer dependencies into your virtual environment of choice:

pip install --editable '.[dev]'

Configure the pre-commit hooks:

pre-commit install && pre-commit install -t pre-push

Packaging

To isolate and be able to re-produce the environment for this package, you should use a Python Virtual Environment. To do so, run:

python -m venv venv

Then exclusively use venv/bin/python, venv/bin/pip, etc. (It is no longer recommended to use venv/bin/activate.)

If you have tox installed and would like it to create your environment and install dependencies for you run:

tox --devenv <name you'd like for env> -e dev

Dependencies for development are specified as the dev extras_require in setup.cfg; they are installed into the virtual environment as follows:

pip install --editable '.[dev]'

Tooling

The dev extras_require included in this repo installs black, flake8 (plus some plugins), and mypy along with default configuration for all of them. You can run all of these (and more!) with:

tox -e lint

Tests

A complete "build" including test execution, linting (mypy, black, flake8, etc.), and documentation build is executed via:

tox

Unit tests

Our unit tests are launched with command:

pytest

Documentation

You can build this projects' docs with:

sphinx-build docs/source docs/build

You can access the build files in the following directory relative to the project root:

build/sphinx/html/

data-upload-manager's People

Contributors

collinss-jpl avatar dependabot[bot] avatar jordanpadams avatar nutjob4life avatar pdsen-ci avatar tloubrieu-jpl avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-upload-manager's Issues

As a user, I want to upload only data products that have not been previously ingested

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can upload only data I do not already have in Planetary Data Cloud.

๐Ÿ“– Additional Details

No response

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

No response

As a user, I want to parallelize upload of data products to PDC

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

No response

๐Ÿ’ช Motivation

...so that I can [why do you want to do this?]

๐Ÿ“– Additional Details

No response

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

No response

Implement automatic refresh of Cognito authentication token

๐Ÿ’ก Description

The authentication token returned from Cognito has a default expiration of 1 hour, which is typically shorter than what is expected for large file transfers. Cognito authentication tokens can be refreshed by providing the "refresh" token that is supplied after initial authentication.

The DUM client script needs to be updated to support an automatic refresh of the authentication token based on when the token is expected to expire. This should allow long running transfers to complete without interruption.

โš”๏ธ Parent Epic / Related Tickets

No response

As an admin, I want access to buckets to be restricted by subnet

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Cloud Admin / Operator

๐Ÿ’ช Motivation

...so that I can add another layer of security to access to S3 buckets

๐Ÿ“– Additional Details

No response

Acceptance Criteria

Given a bucket that I have write access policy to a bucket with data upload manager, and within a set IP subnet
When I perform a DUM load
Then I expect the data to upload successfully

Given a bucket that I have write access policy to a bucket with data upload manager, and outside the expected IP subnet
When I perform a DUM load
Then I expect the data to upload successfully

โš™๏ธ Engineering Details

No response

DUM Client does not properly sanitize double-quotes from INI config

Checked for duplicates

No - I haven't checked

๐Ÿ› Describe the bug

When double-quotes are present in the string values within the INI config utilized by the DUM client, they end up being erroneously included in the JSON-serialized payload sent to API Gateway. This causes escaped double-quotes (\") to appear in the HTML header that defines the CloudWatch log group to submit client logs to. The CloudWatch API then rejects the log stream creation request with a SeriallizationError, causing client logs to not appear in CloudWatch.

๐Ÿ•ต๏ธ Expected behavior

The DUM Client INI config parser should be properly sanitizing double-quotes from parsed strings to ensure they are not escaped when serializing a payload to JSON. This will ensure that DUM client logs will populate in CloudWatch as-expected.

๐Ÿ“œ To Reproduce

Run the DUM pds-ingress-client using an INI config that surrounds the value for log_group_name in double-quotes:

...
[OTHER]
log_level = DEBUG
log_format = %(levelname)s %(threadName)s %(name)s:%(funcName)s %(message)s
log_group_name = "/pds/nucleus/dum/client-log-group"

After completing an ingest to S3, there will not be a corresponding log in the CloudWatch log group specified by the INI config. If the double-quotes are removed and the ingress client is rerun, then logs should appear in CloudWatch.

๐Ÿ–ฅ Environment Info

  • Version of this software [e.g. vX.Y.Z]
  • Operating System: [e.g. MacOSX with Docker Desktop vX.Y]
    ...

๐Ÿ“š Version of Software Used

v1.2.0

๐Ÿฉบ Test Data / Additional context

No response

๐Ÿฆ„ Related requirements

๐Ÿฆ„ #xyz

โš™๏ธ Engineering Details

No response

๐ŸŽ‰ Integration & Test

No response

Upload test data set with manual trigger of Nucleus

๐Ÿ’ก Description

  • Needs new IAM roles for DUM. To get help from @sjoshi-jpl @viviant100
  • Test out deployment to MCP
  • Test uploading data from internal pdsmcp-dev EC2 that can reach private API Gateway
    • Ask SAs to help setup EC2 and give access to specific EN operator user group
  • Test uploading data from on-prem EC2 instance to public API Gateway
    • Ask SAs to setup IP whitelist
  • Move onto deploying and running with SBN: #32

As a user, I want to skip upload of files already in S3 (nucleus staging bucket)

Checked for duplicates

No - I haven't checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can avoid duplicate copies for data.

๐Ÿ“– Additional Details

Current design is to overwrite the data when user upload data via DUM. Propose to add capability to verify if the file modification time and size remain unchanged, thereby allowing the copying to S3 to be skipped. Additionally, providing an optional flag (e.g., --force-overwrite) would enable users to overwrite the file when needed.

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Note: Per #91, rclone handles this functionality for us.

The user should also have the ability to force overwrite data that is already out there.

Log upload to Cloudwatch fails during batch upload

Checked for duplicates

No - I haven't checked

๐Ÿ› Describe the bug

When testing upload of CSS sample data to DUM, the following warning is generated on every ingest:

`WARNING:root:Unable to submit to CloudWatch Logs, reason: 'LogRecord' object has no attribute 'message'

๐Ÿ•ต๏ธ Expected behavior

The LogRecords should still have a message field assigned, and all logs should be uploaded to CloudWatch without issue.

๐Ÿ“œ To Reproduce

  1. Configure an instance of DUM on an EC2 instance that can communicate with the (currently Private) API gateway
  2. Use the pds-ingress-client.py script to upload a file
  3. Verify the above warning is reproduced in the output log

๐Ÿ–ฅ Environment Info

  • Version of this software [e.g. vX.Y.Z]
  • Operating System: [e.g. MacOSX with Docker Desktop vX.Y]
    ...

๐Ÿ“š Version of Software Used

No response

๐Ÿฉบ Test Data / Additional context

No response

๐Ÿฆ„ Related requirements

๐Ÿฆ„ #xyz

โš™๏ธ Engineering Details

No response

As a user, I want timestamps to the ongoing logs that are printed to stdout while running the job

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can have a better gauge of the execution time of the application.

๐Ÿ“– Additional Details

From @mdrum :

  1. Add timestamps to the ongoing logs that are printed to stdout while running the job. We ran four separate jobs, and we had it configured to only print warnings, so we didn't get to see how far it got in the third job before pausing.

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

No response

๐ŸŽ‰ I&T

No response

Develop Cost Model

๐Ÿ’ก Description

Cost model for data upload manager components. Should go hand-in-hand with Design Doc but this will be managed and maintained in a secure location. This Epic will also include consideration of deployment strategies, as needed.

verify the node of the user against Cognito

๐Ÿ’ก Description

  1. we need groups for each node
  2. The cognito user will be assigned node groups (at least one).
  3. The client will forward the access token to the API gateway
  4. The lambda authorizer will decode the groups, and check that it matches the node as specified in the header of the request.

Motivation

So that the PDS users have a single login and password for all the PDS services.

Develop Initial Proof-of-Concept

๐Ÿ’ก Description

Per discussions with team, looking at producing a POC for a few different architectures.

Some options:

  • Data provider stages data in their own S3 bucket, publish CNM to DAAC SNS topic
  • Data provider stages data in DAAC S3 bucket, publishes CNM to DAAC SNS topic
  • Data provider stages data in DAAC S3 bucket, DAAC ingest themselves using some crawler tool one time (for one-off collections)
  • Data provider stages data in DAAC S3 bucket, DAAC continuously ingest data as it comes in (new mechanism that is not yet completed)

Per discussion with the team, going to pursue an approach similar to what @collinss-jpl has proposed:

The approach I had been thinking about utilizes AWS API gateway connected to a Lambda (similar to TEA) to allow a client application on the SBN host to request an upload/sync of a local file or files. The Lambda uses information from the request to determine where in our S3 bucket hierarchy the requested files should get uploaded to (based on product type, PDS submitter node, or whatever other criteria we derive). The S3 URI(s) are then returned back through the API gateway to the client. The SBN client application then uses the returned URI(s) to perform the sync using the CLI or boto library. Eventually we could work in a job queue on the SBN client app so the uploads can be performed asynchronously from the upload requests. We would also be able to use the built-in throttling capability on API Gateway to control how much data or how many requests weโ€™ll allow within a window of time etcโ€ฆ

DUM Lambda Service can return pre-signed S3 URL's to non-existing buckets

Checked for duplicates

No - I haven't checked

๐Ÿ› Describe the bug

The DUM Lambda Service function utilizes a bucket map to determine the correct bucket that incoming data should be routed to, however, the function does not currently check if said bucket has actually be created in S3. This results in an invalid pre-signed URL being returned to the client script, which will encounter a error when attempting to use the URL to push data to S3.

๐Ÿ•ต๏ธ Expected behavior

The DUM Lambda Service should be checking for existence of the bucket read from the bucket map file, and return an error to the client if the bucket does not already exist.

๐Ÿ“œ To Reproduce

Configure a bucket-map.yaml file for use with the DUM Lambda service that routes files to a non-existing bucket for one of the PDS groups (ex: eng, sbn, img). Then use the client script to submit a file for ingress as a user assigned to said PDS group.

๐Ÿ–ฅ Environment Info

  • Version of this software [e.g. vX.Y.Z]
  • Operating System: [e.g. MacOSX with Docker Desktop vX.Y]
    ...

๐Ÿ“š Version of Software Used

1.2.0

๐Ÿฉบ Test Data / Additional context

No response

๐Ÿฆ„ Related requirements

๐Ÿฆ„ #xyz

โš™๏ธ Engineering Details

No response

๐ŸŽ‰ Integration & Test

No response

Develop Ingress Client Interface

๐Ÿ’ก Description

The current command-line interface for the Ingress client script only allows a user to provide a single file path, as well as an (arbitrary) node ID. This interface needs to be developed to allow at a minimum:

  • Validation of the provided node ID against the standard set of PDS identifiers
  • Support for specifying multiple input paths
  • Support for distinguishing paths to files vs. paths to directories and performing the appropriate S3 sync logic

DUM client is unable to create CloudWatch Log Stream pds-ingress-client-sbn-* when upload data to cloud

Checked for duplicates

No - I haven't checked

๐Ÿ› Describe the bug

When the css data was uploaded to cloud via DUM client, the following error occured:

WARNING:root:Unable to submit to CloudWatch Logs, reason: Failed to create CloudWatch Log Stream pds-ingress-client-sbn-1709312322, reason: 403 Client Error: Forbidden for url: https://yofdsuex7g.execute-api.us-west-2.amazonaws.com/prod/createstream

๐Ÿ•ต๏ธ Expected behavior

NO errors. And logs get pushed to the cloud

๐Ÿ“œ To Reproduce

Run DUM on any data and push data to the cloud.

๐Ÿ–ฅ Environment Info

Linux OS

๐Ÿ“š Version of Software Used

0.3.0

๐Ÿฉบ Test Data / Additional context

Any PDS4 data

๐Ÿฆ„ Related requirements

โš™๏ธ Engineering Details

As a workaround, let's plan to comment out the code that is causing this for the time being. Logging into AWS is a lower priority ("should") requirement.

As a user, I want status summary reports during a long running execution (batching functionality)

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can regular check the overall status of an execution in either file upload "chunks"

๐Ÿ“– Additional Details

From @mdrum

  1. Perhaps allow for batching of some sort. Rather than accepting all files and generating one big report at the end, you would want to split it up into groups of 1000 files or something and create mid-reports along the way, with a final tally being generated at the end. We could do this on our end, of course, but then it would mean we would have to combine the mid-reports manually ourselves. Something to think about.

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Initial thoughts: new flag(s) to allow someone to specify reporting type: (cumulative, batch, both), and select the batching "chunks" (default: 1000 files)

๐ŸŽ‰ I&T

No response

Develop initial design doc

๐Ÿ’ก Description

After initial rapid prototyping has completed, develop a design and architecture diagram/document.

Ideally this document would be posted as part of the online documentation for this repository.

As a user, I want an end summary report in logs to show statistics of files uploaded

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can have a summary of how things were uploaded

๐Ÿ“– Additional Details

  • files uploaded
  • files skipped
  • files overwritten

Acceptance Criteria

Given a set of files to be uploaded to S3
When I perform a nominal upload
Then I expect a final report to be output show metrics of files read, files successfully uploaded, files skipped, and files overwritten

โš™๏ธ Engineering Details

No response

Backoff/Retry logic masks errors from urllib3 exceptions

Checked for duplicates

No - I haven't checked

๐Ÿ› Describe the bug

Within the DUM client script, backoff/retry decorators are used to capture and inspect exceptions and from the requests.exceptions package to see if the error is recoverable (i.e. intermittent outtage). However, it's been observed that exceptions from the urllib3 package can also trigger the backoff/retry handlers, which results in the following error since the format of the exception is different from what is expected:

Ingress failed, reason: 'NoneType' object has no attribute 'status_code'"

This essentially "masks" the true error, which makes debugging the underlying issue much more difficult.

๐Ÿ•ต๏ธ Expected behavior

The backoff/retry decorator code should gracefully handle exception classes that trigger the logic, but do conform to the expected structure (ex: missing a response field with the HTTP status code)

๐Ÿ“œ To Reproduce

This bug can be reproduced when usage of the pre-signed S3 URL returned from the DUM lambda service results in a transfer failure that raises an exception from the urllib3 package, such as SSLError. This will likely need to be simulated using a mock function for the requests.put call made in pds_ingress_client.ingress_file_to_s3()

๐Ÿ–ฅ Environment Info

  • Version of this software [e.g. vX.Y.Z]
  • Operating System: [e.g. MacOSX with Docker Desktop vX.Y]
    ...

๐Ÿ“š Version of Software Used

1.2.0

๐Ÿฉบ Test Data / Additional context

No response

๐Ÿฆ„ Related requirements

๐Ÿฆ„ #xyz

โš™๏ธ Engineering Details

No response

๐ŸŽ‰ Integration & Test

No response

As a user, I want to skip upload of files that are already in the Registry

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I do not try to reload the data

๐Ÿ“– Additional Details

No response

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

The easiest way to do this would be search the registry either for the file path OR by checksum OR both? We could do this with the LID/LIDVID but I think that will add some significant overhead.

Do we want to figure out some sort of auto-generated UUID for every file we upload to the cloud and add this as metadata? Maybe this is something we could actually store then in the Nucleus database and eventually in the registry. It could link throughout the whole system, agnostic of the LIDVID for the products themselves.

Add argument to client script to follow symlinks

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

The current behavior of the pds-ingress-client.py script is to ignore an symbolic links encountered when traversing paths to be uploaded. Going forward, it could be useful to add a command-line option to instruct the client to follow encountered symlinks, rather than ignore them by default.

๐Ÿ’ช Motivation

Would allow pds-ingress-client.py to be used with datasets that are compiled from pre-existing data via symlink to avoid data duplication.

๐Ÿ“– Additional Details

No response

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

No response

As Nucleus, I want to use a lock file to know when DUM is writing to a S3 bucket folder

Checked for duplicates

No - I haven't checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

No response

๐Ÿ’ช Motivation

...so that I can know when a directory has completed reading, and we can fully evaluate all the products (XML + data files) in the directory + sub-directories.

๐Ÿ“– Additional Details

  • Crawl file system
  • Each directory you come across, write a dum.lock file with TBD information in it
  • Continue to crawl and write data, as you complete a directory, and all it's sub-directories, remove the dum.lock file

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

No response

As a user, I want to include the modification datetime in the the user-defined object metadata being sent in the upload payload

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Archivist

๐Ÿ’ช Motivation

...so that I can match the modification datetime from the source system where the data is being copied.

๐Ÿ“– Additional Details

No response

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

No response

Develop IaC for Deployment

๐Ÿ’ก Description

Develop the necessary documentation, terraform scripts and/or other definitions/scripts needed to deploy the app on a user system and deploying to the cloud.

Add External Config Support to Ingress Client Script

๐Ÿ’ก Description

The current Ingress client script contains a number of hardcoded constants related to AWS configuration (such as API Gateway ID and region) that should be refactored into an external .ini config (or similar) to allow easy customization without requiring code redeployment.

Populate Sphinx documentation for entire DUM service

๐Ÿ’ก Description

Ticket to add Sphinx documentation for the entire DUM repository. Topics covered should include:

  • Installation instructions
  • Terraform deployment procedure
  • Cognito account creation
  • INI config format
  • Client script usage

Deploy v1.2.0 DUM to Production

๐Ÿ’ก Description

  • Tag new DUM with v1.2.0
    • new summary report
    • new logging group /pds/nucleus/dum/client-log-group
    • token refresh
  • Update DUM client to test new capabilities
  • Debug session with SA
  • Request SBN to upgrade

โš”๏ธ Parent Epic / Related Tickets

No response

Upgrade SBN to latest and Rename Bucket Folder

๐Ÿ’ก Description

  • Tag new DUM within logging fixed
  • Deploy to production
  • Request SBN to upgrade
  • Rename root folder in S3 bucket from SBN to sbn (created sbn folder, move gbo... to sbn folder.

โš”๏ธ Parent Epic / Related Tickets

No response

Update lambda function to lowercase the node prefix in buckets

๐Ÿ’ก Description

Right now data is being pushed to buckets like for SBN, /SBN/my/data/here. the /SBN seems a bit redundant, but we can leave it for now. However, definitely want this lowercase.

โš”๏ธ Parent Epic / Related Tickets

No response

As a user, I want to force an upload of file that is already in S3 or the Registry

Checked for duplicates

Yes - I've already checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Node Operator

๐Ÿ’ช Motivation

...so that I can overwrite data that has already been loaded into the PDC

๐Ÿ“– Additional Details

No response

Acceptance Criteria

Given a file that has already been loaded into the registry
When I perform a DUM upload with the --overwrite flag enabled
Then I expect the data to overwrite the existing file in the system, and DUM to note that this occurred in the logs

โš™๏ธ Engineering Details

No response

Develop Ingress Client Logging Capabilities

๐Ÿ’ก Description

Ticket to develop the logging capabilities of the pds-ingress-client.py script. Added capabilities should include:

  • Addition of a logging or log utility module to control initialization of the global logger object
  • Implement features to allow control of the logging level/format from the command line
  • Addition of an API Gateway endpoint to submit client logs to a CloudWatch log group
  • Implement submission of all logged information during client execution to the API Gateway endpoint

Motivation

To help to support the discipline nodes.

As a user, I want to be able to take no more than X seconds per product to upload to AWS

Checked for duplicates

No - I haven't checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

One potential issue was raised for awareness: currently it seems to take about 5-10 seconds per css product to upload. That's going to need to be improved somewhere in the chain, because at that rate it will take more than 24 hours to upload the number of products that are generated every 24 hours.

๐Ÿ’ช Motivation

...so that I can [why do you want to do this?]

๐Ÿ“– Additional Details

No response

Acceptance Criteria

Given
When I perform
Then I expect

โš™๏ธ Engineering Details

Variables we need to take into account:

  • Bandwidth to AWS - is there a way we could improve upload throughput?
  • Size of each file / product
  • Number of files
  • Not currently generating checksums

Develop Ingress Service Routing Logic

๐Ÿ’ก Description

The current prototype of the Data ingress lambda function contains some dummy logic for determining the product type from the provided file path/node ID. This ticket is to track the planning and implementation for the initial logic for determining the s3 path convention from the data payload provided by the client script.

Also within scope for this ticket is defining the input payload schema itself that will determine what is sent by the client.

As a result, a back-end service component is developed for the Data-Upload-Manager. It received the upload request from the client and returns a s3 path (or eventually a presigned S3 URL) where the data should be uploaded by the client.

Add support for presigned upload URL usage

๐Ÿ’ก Description

To further secure the data upload process, the ingress service Lambda needs to incorporate generation of pre-signed S3 URL's that the client can use to securely upload files to S3. This will allow all the PDS buckets in Nucleus to be private (so trying to guess an S3 upload URI should not work), while still providing a means for outside users to push to S3 without additional credentials or permissions.

Develop Ingress Lambda Logging Conventions

๐Ÿ’ก Description

The Ingress Lambda function can log messages directly to AWS CloudWatch via the built-in logging library. This will likely be the primary mechanism for tracking incoming requests, so we need to define exactly what we would like to see logged for each request.

As a user, I want to include a MD5 checksum in the the user-defined object metadata being sent in the upload payload

Checked for duplicates

No - I haven't checked

๐Ÿง‘โ€๐Ÿ”ฌ User Persona(s)

Archivist

๐Ÿ’ช Motivation

...so that I can include a checksum with the files being uploaded to ensure data integrity as the files flow through the system

๐Ÿ“– Additional Details

No response

Acceptance Criteria

Given a file to be uploaded to S3
When I perform data upload manager execution on that file
Then I expect data upload manager to generate a checksum and add to the object metadata for the S3 object

โš™๏ธ Engineering Details

https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.