GithubHelp home page GithubHelp logo

global19-atlassian-net / storage-sdrs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from googlecloudplatform/storage-sdrs

0.0 2.0 0.0 1.5 MB

Data retention tool for Google Cloud Storage

License: Apache License 2.0

Dockerfile 0.17% Python 3.72% Shell 0.82% HTML 2.01% Java 93.29%

storage-sdrs's Introduction

Google Cloud Storage - Supplementary Data Retention Service (SDRS)

SDRS allows an organization to manage the Time to Live (TTL) for objects in Google Cloud Storage (GCS)
according to retention policies based off of the creation time encoded in partition prefixes.

For example, the official age of an object could exist as follows:

bucketX/datasetY/{yyyy}/{mm}/{dd}/{hh}/log.txt

In this example, the information encoded in the object name rather than the GCS object metadata
creation time serves to define its age. An organization can define a TTL for datasets and thereby
reliably enforce object retention based on the encoded creation time.

At the most fundamental level, SDRS enforces object retention by mapping policy rules defining the time-to-live (TTL) for datasets existing in GCS buckets. Note, for scenarios where GCS object retention management can rely solely on object creation time rather than an encoded prefix, please see: Object Lifecycle Management

Releases

See the latest release and other releases for the details

High Level Architecture

SDRS is an open-source GCP GitHub project

SDRS exists in two main parts:

  1. A server side service exposing functionality through a RESTful API, see Server Components
  2. A sample client demonstrating interaction with the server side services, see Client Components

SDRS is primarily written in the Java 8 and Python 3 programming languages.
Maven is used as the Java build management tool.
Deployment Manager is used as the DevOps Cloud Orchestration tool.

Key GCP Technologies Utilized in SDRS

Managed Instance Groups (MIGs)

Cloud Functions

Cloud Pub/Sub

Cloud Endpoints

Google Stackdriver

Cloud SQL

Cloud Scheduler

Storage Transfer Service (STS)

Cloud Deployment Manager

Getting Started with SDRS

To get started, clone the project from Google Cloud Platform's Github site here
The full source code for both the server along with a sample client are included in the project. Build and deployment instructions are included as well.

Local Development/Build Steps

The instructions in this section describe how to quickly get started and deploy SDRS to a DEV GCP environment.

  1. Ensure your local environment compiles and builds using Maven:
    mvn clean install package 
  1. Create a CloudSQL instance.

  2. Run MySQL DDL or mods to create/update a database schema in the Cloud SQL instance created above. Note, set log_bin_trust_function_creators to true to overcome a possible error you may encounter when creating the db trigger.

  3. Create Pub/Sub infrastructure for SDRS to publish messages.

  4. Build the SDRS Docker image.

  5. Deploy the SDRS Docker Image you just built into a Compute Engine VM.

  6. Run docker image on the VM:

    • SSH to VM instance.
    • Download your service account credential.json, which is used by SDRS.
    • Create your env.txt, which sets application settings (i.e. database connection).
    • Stop container sudo docker container stop [your_container_id]
    • Run the following to start SDRS

    docker run --detach -v [crendential_json_dir_on_host]:[docker_mount] --name=sdrs --env-file=[your_env_txt] --publish=8080:8080 [your_docker_image]

Note, the application is configured by two key files found in the src/main/resources directory:

  1. ApplicationConfiguration file
  2. Hibernate Configuration file

The sample appConfig.xml file contains example settings that can be leveraged for a development deployment. In general, values that are well known at compile/build/package time can be directly set in the applicationConfig file. For more details on these settings see, Configurable Values

However, values that need to be injected post build (during deployment) are set by token replacement environment variables.
See this sample environment file.

Enterprise Deployment Steps to Google Cloud Platform (GCP)

The instructions in this section serve as an example for deploying SDRS to a full production like GCP environment. For details see the main DevOps Deployment README.
In general, deploying SDRS to a production like environment should occur in the following order:

Deploying the Server Side Components

  1. Cloud SQL Infrastructure Deployment see the Cloud SQL Deployment README.
  2. MySQL DDL or mods execution see the MySQL Schema and mods.
  3. Pub/Sub Infrastructure Deployment the Server Pub/Sub Deployment README.
  4. MIG Deployment the MIG Deployment README.

Deploying the Sample Client Side Components

  1. Cloud Function Deployment (Includes Client side Pub/Sub triggers) see the Client Cloud Functions README.
  2. Cloud Scheduler Crontab creation by way of the GCP Console UI see the Cloud Scheduler README.

Server Components

Configuration Service Details

The Configuration Service is a server side component that is responsible for exposing a RESTful API that handles CRUD operations for the retention policies.

The Configuration Service is the key touchpoint to SDRS when provisioning or updating retention policies. For more details, see the Configuration Service README.

Execution Service Details

The Execution Service is a server side component that is responsible for exposing a RESTful API that manages the execution of retention policy enforcement (i.e. the deletion of objects). The Execution Service is capable of enforcing object retention for three specific use cases:

  1. Retention Policies - dataset specific policies provisioned by way of the configuration service
  2. Default/Global Policy - a global dataset rule that serves as a catch-all for datasets not already covered by specific retention policies
  3. On-demand Delete Markers - ad hoc requests to delete specific datasets immediately

For more details, see the Execution Service README.

Validation Service Details

The Validation Service is a server side component that is responsible for exposing a RESTful API that manages the execution of jobs that serve to validate the completion of already requested enforcement processes. For more details, see the Validation Service README.

Notification Service Details

The Notification Service is a server side component that is responsible for broadcasting notifications of SDRS events to interested parties by way of Pub/Sub

Client Components

Sample Cloud Functions Deployment & Details

The Cloud Functions serve as an example client demonstrating how to interact with the server side SDRS RESTful API. Included in the code base are Cloud Functions that invoke the Configuration, Execution, Validation, and Notification services. For more details, see the Client Cloud Functions README.

Sample Cloud Scheduler Details

SDRS has several functional areas that can be scheduled on a recurring frequency. The scheduler strategy in this sample uses Cloud Scheduler as a decoupled, externally managed crontab service that invokes a Pub/Sub topic that invokes a Cloud Function to invoke SDRS Execution and Validation functionality on a scheduled basis.

For more details, see the Cloud Scheduler README.

Contributing

See the contributing instructions to get started contributing.

License

All solutions within this repository are provided under the Apache 2.0 license. Please see the LICENSE file for more detailed terms and conditions.

storage-sdrs's People

Contributors

eshen1991 avatar guptarhl avatar jtkersten avatar matt-gen avatar psiso avatar salguerod avatar tomflenniken avatar viperan avatar xiaoyangm55 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.