GithubHelp home page GithubHelp logo

integration-sample-lambda-msk's Introduction

This is a sample Lambda function written in Java that illustrates the integration between Amazon MSK and AWS Lambda and is intended to be used with the Amazon MSK AWS Lambda Integration Lab. The Lambda function processes Clickstream events from an Amazon MSK Apache Kafka topic and batches and sends the events to Amazon S3 for backup and long term storage.

Install

  • Clone the repository

    git clone https://github.com/aws-samples/integration-sample-lambda-msk.git
    cd integration-sample-lambda-msk
    
  • Run the deploy.sh script. It is intended to run on linux and Mac. The script does the following: Note: The script requires some parameters. See Setup Lambda for details.

    1. It builds a jar file for the Lambda function.
    2. It creates an Amazon S3 bucket to be used for uploaded artifacts with a random prefix in its name.
    3. It uses CloudFormation to build the Lambda function and package its resources.
    4. It deploys the sam template and creates a CloudFormation stack with multiple resources. The resources include:
      • An IAM policy to be used by the Lambda function.
      • An IAM role with the policy to be used by the Lambda function.
      • An IAM role to be used by Amazon Kinesis Data Firehose.
      • A Kinesis Data Firehose delivery stream.
      • The Lambda function which will process records from a topic in Amazon MSK and send to Kinesis Data Firehose.
      • An EventSourceMapping mapping the Lambda function to the Amazon MSK Apache Kafka topic.

integration-sample-lambda-msk's People

Contributors

amazon-auto avatar blacktooth avatar dependabot[bot] avatar farbruno avatar rcchakr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

integration-sample-lambda-msk's Issues

Parallelization Factor for MSK Event Source Mapping

Hi,

Is there a way to set concurrency of the lambda function so that multiple batches can be processed simultaneouly?

From the aws lambda cli docs I see there is a --parallelization-factor which I think is supported for Kinesis streams, but not MSK?

Are there any plans to support this for MSK?

If I understood the MSK event source correctly, it polls a target MSK topic across partitions and invokes a lambda function with the obtained batch synchronously? So in order to poll for another batch, the invoked lambda function must complete execution before a subsequent poll can begin?

Therefore, the throughput that can be handled is a function of:

  • The number of topic partitions (more partitions allows for higher throughput)
  • The batch size of the MSK event source (10,000 is currently the max)
  • The execution time of the lambda function.

For example, if the execution takes 100ms vs. 10secs this makes a difference to the consumer lag in high-throughput topics?

Many thanks,

David

Can the lambda function be invoked locally

Hi. I tried to generate an msk event using sam local generate-event, but could not find msk in the supported event list. I'd like to know if It is possible to use sam local invoke to test the lambda function locally. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.