Samples to help you get started with the Amazon Redshift Data API

License: MIT No Attribution

Java 7.37% Python 19.46% JavaScript 16.40% Go 6.08% TypeScript 4.00% HTML 4.90% Shell 2.68% Jupyter Notebook 32.17% PLpgSQL 6.36% CSS 0.59%

getting-started-with-amazon-redshift-data-api's Introduction

Getting Started with Redshift Data API

In this repo we’ll be leveraging AWS Lambda to access Redshift Data API. You are able to access the Data API from other platforms such as Amazon EC2, AWS Glue, Amazon SageMaker, and from your on-premises resources. Each language has its own code sample with the ultimate aim of complete language parity (same subset of examples exist in each language).

These examples provide a demonstration on common sql operations such as create, copy, update, delete and select implementation in either synchronous or asynchronous mode that will be useful in building modern apps such as event-driven applications.

Introduction

The Amazon Redshift Data API enables you to efficiently access data from Amazon Redshift with all types of traditional, cloud-native, and containerized, serverless web services-based applications and event-driven applications. This includes, but not limited to, AWS Lambda, Amazon SageMaker, AWS Cloud9 and many other AWS services.

Amazon Redshift Data API simplifies data access, ingest, and egress from the languages supported with AWS SDK such as Python, Go, Java, Node.js, PHP, Ruby, and C++.

Tutorial Overview

In this tutorial, we’ll demonstrate how to get started with Amazon Redshift Data API in different languages. We'll be leveraging AWS Lambda to access Redshift Data API. However, with little modifications of the code, you are able to access the Data API from other platforms such as Amazon EC2, AWS Glue, Amazon SageMaker, and from your on-premises resources.

We'll also demonstrate some common use-cases customers are using Redshift Data API to solve. We'll provide code examples along with CloudFormation templates for these use-cases.

Architectural Diagram

About Redshift Data API

Amazon Redshift Data API by default is asynchronous. This means when executing SQL commands, you can retrieve your results later for up to 24 hours with a generated Query ID. However, you can also configure your code to have your SQL commands run synchronously. The implementation is demonstrated within the code samples.

Others

Considerations when calling the Amazon Redshift Data API: Link

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

getting-started-with-amazon-redshift-data-api's People

Contributors

Stargazers

Watchers

getting-started-with-amazon-redshift-data-api's Issues

Task timed out

Hello, we use redshift serverless and I modified the RedShiftDataAPI.py as below but I always get task timed out even though I've tried increasing the lambda timeout to 3 minutes

import time
import traceback
import boto3
import logging
from collections import OrderedDict
from aws_lambda_powertools import Logger

logger = Logger()


def lambda_handler(event, context):
    # input parameters passed from the caller event
    # cluster identifier for the Amazon Redshift cluster
    redshift_workgroup_name = event["redshift_workgroup_name"]
    # database name for the Amazon Redshift cluster
    redshift_database_name = event["redshift_database"]
    # IAM Role of Amazon Redshift cluster having access to S3
    redshift_iam_role = event["redshift_iam_role"]
    # run_type can be either asynchronous or synchronous; try tweaking based on your requirement
    run_type = event["run_type"]

    sql_statements = OrderedDict()
    res = OrderedDict()

    if run_type != "synchronous" and run_type != "asynchronous":
        raise Exception(
            "Invalid Event run_type. \n run_type has to be synchronous or asynchronous."
        )

    isSynchronous = True if run_type == "synchronous" else False

    # initiate redshift-data redshift_data_api_client in boto3
    redshift_data_api_client = boto3.client("redshift-data")

    sql_statements["CREATE"] = (
        "CREATE TABLE IF NOT EXISTS public.region ("
        + "R_REGIONKEY bigint NOT NULL,"
        + "R_NAME varchar(25),"
        + "R_COMMENT varchar(152))"
        + "diststyle all;"
    )
    sql_statements["COPY"] = (
        "COPY public.region FROM 's3://redshift-immersionday-labs/data/region/region.tbl.lzo' "
        + "iam_role '"
        + redshift_iam_role
        + "' region 'us-west-2' lzop delimiter '|' COMPUPDATE PRESET;"
    )
    sql_statements[
        "UPDATE"
    ] = "UPDATE public.region set r_regionkey = 5 where r_name ='AFRICA';"
    sql_statements["DELETE"] = "DELETE From public.region where r_name = 'MIDDLE EAST';"
    sql_statements["SELECT"] = "SELECT r_regionkey,r_name from public.region;"
    logger.info("Running sql queries in {} mode!\n".format(run_type))

    try:
        for command, query in sql_statements.items():
            logging.info("Example of {} command :".format(command))
            res[command + " STATUS: "] = execute_sql_data_api(
                redshift_data_api_client,
                redshift_database_name,
                command,
                query,
                redshift_workgroup_name,
                isSynchronous,
            )

    except Exception as e:
        raise Exception(str(e) + "\n" + traceback.format_exc())
    return res


def execute_sql_data_api(
    redshift_data_api_client,
    redshift_database_name,
    command,
    query,
    redshift_workgroup_name,
    isSynchronous,
):

    MAX_WAIT_CYCLES = 20
    attempts = 0
    # Calling Redshift Data API with executeStatement()
    res = redshift_data_api_client.execute_statement(
        Database=redshift_database_name,
        Sql=query,
        WorkgroupName=redshift_workgroup_name,
    )
    query_id = res["Id"]
    desc = redshift_data_api_client.describe_statement(Id=query_id)
    query_status = desc["Status"]
    logger.info("Query status: {} .... for query-->{}".format(query_status, query))
    done = False

    # Wait until query is finished or max cycles limit has been reached.
    while not done and isSynchronous and attempts < MAX_WAIT_CYCLES:
        attempts += 1
        time.sleep(1)
        desc = redshift_data_api_client.describe_statement(Id=query_id)
        query_status = desc["Status"]

        if query_status == "FAILED":
            raise Exception("SQL query failed:" + query_id + ": " + desc["Error"])

        elif query_status == "FINISHED":
            logger.info(
                "query status is: {} for query id: {} and command: {}".format(
                    query_status, query_id, command
                )
            )
            done = True
            # print result if there is a result (typically from Select statement)
            if desc["HasResultSet"]:
                response = redshift_data_api_client.get_statement_result(Id=query_id)
                logger.info(
                    "Printing response of {} query --> {}".format(
                        command, response["Records"]
                    )
                )
        else:
            logger.info("Current working... query status is: {} ".format(query_status))

    # Timeout Precaution
    if done == False and attempts >= MAX_WAIT_CYCLES and isSynchronous:
        logger.info(
            "Limit for MAX_WAIT_CYCLES has been reached before the query was able to finish. We have exited out of the while-loop. You may increase the limit accordingly. \n"
        )
        raise Exception(
            "query status is: {} for query id: {} and command: {}".format(
                query_status, query_id, command
            )
        )

    return query_status

Parameters not working in @aws-sdk/client-redshift-data

The following command in a lambda function throws a ValidationError when sent to the client.

var _clientRedshiftData = require("@aws-sdk/client-redshift-data");

const client = new _clientRedshiftData.RedshiftDataClient({
  region: 'us-east-1'
});

const UPDATE_QUERY = "\
  UPDATE schema.user \
  SET \
    email = @email \
  WHERE id = @id;";

exports.handler = async (event, context) => {
  const results = await Promise.all(events.Records.map(async (record) => {
    const command = new _clientRedshiftData.ExecuteStatementCommand({
      ClusterIdentifier: 'cluster-test',
      Database: 'database',
      DbUser: 'user',
      Sql: UPDATE_QUERY,
      Parameters: [
        { name: "@id", value: 'sampleId' },
        { name: "@email", value: '[email protected]' },
      ]
    });
    return (await client.send(command));
  }
}

{
    "errorType": "ValidationException",
    "errorMessage": "ValidationException",
    "name": "ValidationException",
    "$fault": "client",
    "$metadata": {
        "httpStatusCode": 400,
        "requestId": "d866b77e-184c-4da9-b4cf-45bc7fd3d556",
        "attempts": 1,
        "totalRetryDelay": 0
    },
    "stack": [
        "ValidationException: ValidationException",
        "    at deserializeAws_json1_1ExecuteStatementCommandError (/opt/nodejs/node_modules/@aws-sdk/client-redshift-data/dist-cjs/protocols/Aws_json1_1.js:411:41)",
        "    at processTicksAndRejections (internal/process/task_queues.js:97:5)",
        "    at async /opt/nodejs/node_modules/@aws-sdk/middleware-serde/dist-cjs/deserializerMiddleware.js:6:20",
        "    at async /opt/nodejs/node_modules/@aws-sdk/middleware-signing/dist-cjs/middleware.js:11:20",
        "    at async StandardRetryStrategy.retry (/opt/nodejs/node_modules/@aws-sdk/middleware-retry/dist-cjs/StandardRetryStrategy.js:51:46)",
        "    at async /opt/nodejs/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:6:22",
        "    at async /var/task/index.js:156:13",
        "    at async Promise.all (index 0)",
        "    at async Runtime.exports.handler (/var/task/index.js:66:19)"
    ]
}

However, when I take out Parameters, the query works fine.

Any chance you could upload some PHP examples?

You have an official client for PHP, but you're missing examples for that language. That would be really helpful.

getting-started-with-amazon-redshift-data-api

Hi,

I was going through this use case for my WEB UI. While deploying the cloudformation template, I'm facing issue with Initialization steps resource creation mentioned in cloudformation template. It is failing to create but every other resources will be created as we hit create stack.

Suggest some steps to overcome that particular resource failure.

Regards,
Aditya

InitializationSteps	ED-POA-InitializationSteps-1ARPZV595URQR	Custom::SetupRedshiftLambdaFunction	CREATE_FAILED	CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [468a0bdb-b825-4b12-b735-e33615bda333]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version.	-

scripts folder is missing

hi, can you please push scripts folder under stepfunction use-case folder?

https://github.com/aws-samples/getting-started-with-amazon-redshift-data-api/blob/main/use-cases/etl-orchestration-with-step-functions/scripts/setup_sales_data_pipeline.sql

returns 404

Parameterized query not working with unload operation

When running a parameterized query wrapped over an "unload" operation the "execute statement" results in a 400 validation.

com.amazonaws.services.redshiftdataapi.model.ValidationException: Parameters list contains unused parameters. Unused parameters: [id] (Service: AWSRedshiftDataAPI; Status Code: 400; Error Code: ValidationException

The unload query is of the pattern

unload('select * from table where id=:id') to '<s3location>' iam_role '<arn>' maxfilesize 6 GB CSV parallel off

Code snippet

    executeStatementRequest.setSql(sql);

    SqlParameter parameter = new SqlParameter();

    parameter.setName("id");

    parameter.setValue(id);

Artifact: com.amazonaws:aws-java-sdk-redshiftdataapi:1.12.122

On excluding unload and just using the query then execution goes fine.

Slight error

Hello, good day. I have found a slight error in your code in the 'analytical-reporting-event-driven-web-app' use case in the 'script.sh' on line 46. It checks the status of output.json instead of output1.json. I am sending it to you in case you want to correct it because, with this issue, the Websockets were not being created correctly for me.

while [ "$status" != "CREATE_COMPLETE" ]
do
aws cloudformation describe-stacks --stack-name BackendSetup > output1.json
status=$(jq '.Stacks[].StackStatus' output1.json | sed -e 's/^"//' -e 's/"$//')
done
aws cloudformation describe-stacks --stack-name BackendSetup > output1.json

Interesting solution.
Greetings, thank you.

aws-samples / getting-started-with-amazon-redshift-data-api Goto Github PK