acquia / fifo2kinesis Goto Github PK

View Code? Open in Web Editor NEW

10.0 8.0 1.0 10.89 MB

Continuously reads data from a named pipe and publishes it to a Kinesis stream.

License: MIT License

Go 98.55% Shell 1.45%

fifo2kinesis's Introduction

FIFO to Kinesis Pipeline

This app continuously reads data from a named pipe (FIFO) and publishes it to a Kinesis stream.

Why?

FIFOs are a great way to send data from one application to another. Having an open pipe that ships data to Kinesis facilitates a lot of interesting use cases. One such example is using the named pipe support in rsyslog and syslog-ng to send log streams to Kinesis.

Admittedly, it would be really easy to write a handful of lines of code in a bash script using the AWS CLI to achieve the same result, however the fifo2kinesis app is designed to reliably handle large volumes of data. It achieves this by making good use of Go's concurrency primitives, buffering and batch publishing data read from the fifo, and handling failures in a way that can tolerate network and AWS outages.

Installation

Either download the latest binary for your platform, or run the following command in the project's root to build the aws-proxy binary from source:

GOPATH=$PWD go build -o ./bin/fifo2kinesis fifo2kinesis

Usage

Create a named pipe:

mkfifo ./kinesis.pipe

Run the app:

./bin/fifo2kinesis --fifo-name=$(pwd)/kinesis.pipe --stream-name=my-stream

Write to the FIFO:

echo "Streamed at $(date)" > kinesis.pipe

The line will be published to the my-stream Kinesis stream within the default flush interval of 5 seconds.

Quick start for the impatient among us

If you are impatient like me and want your oompa loompa now, modify the --buffer-queue-limit, --flush-interval, and --flush-handler options so that what you send to the FIFO is written to STDOUT immediately instead of a buffered write to Kinesis. This doesn't do much, but it provides immediate gratification and shows how the app works when you play with the options.

./bin/fifo2kinesis --fifo-name=$(pwd)/kinesis.pipe --buffer-queue-limit=1 --flush-interval=0 --flush-handler=logger

Configuration

Configuration is read from command line options and environment variables in that order of precedence. The following options and env variables are available:

--fifo-name, FIFO2KINESIS_FIFO_NAME: The absolute path of the named pipe.
--stream-name, FIFO2KINESIS_STREAM_NAME: The name of the Kinesis stream.
--partition-key, FIFO2KINESIS_PARTITION_KEY: The partition key, a random string if omitted.
--buffer-queue-limit, FIFO2KINESIS_BUFFER_QUEUE_LIMIT: The number of items that trigger a buffer flush.
--failed-attempts-dir, FIFO2KINESIS_FAILED_ATTEMPTS_DIR: The directory that logs failed attempts for retry.
--flush-interval, FIFO2KINESIS_FLUSH_INTERVAL: The number of seconds before the buffer is flushed.
--flush-handler, FIFO2KINESIS_FLUSH_HANDLER: Defaults to "kinesis", use "logger" for debugging.
--region, FIFO2KINESIS_REGION: The AWS region that the Kinesis stream is provisioned in.
--role-arn, FIFO2KINESIS_ROLE_ARN: The ARN of the AWS role being assumed.
--role-session-name, FIFO2KINESIS_ROLE_SESSION_NAME: The session name used when assuming a role.
--debug, FIFO2KINESIS_DEBUG: Show debug level log messages.

The application also requires credentials to publish to the specified Kinesis stream. It uses the same configuration mechanism as the AWS CLI tool, minus the command line options.

Running With Upstart

Use Upstart to start fifo2kinesis during boot and supervise it while the system is running. Add a file to /etc/init with the following contents, replacing /path/to and my-stream according to your environment.

description "FIFO to Kinesis Pipeline"
start on runlevel [2345]

respawn
respawn limit 3 30
post-stop exec sleep 5

exec /path/to/fifo2kinesis --fifo-name=/path/to/named.pipe --stream-name=my-stream --region=us-east-1

Publishing Logs From Syslog NG

NOTE: You might also want to check out fluentd and the Amazon Kinesis Agent. You won't find an argument in this README as to why you should choose one over the other, I just want to make sure you have all the options in front of you so that you can make the best decision for your specific use case.

Syslog NG provides the capability to use a named pipe as a destination. Use fifo2kinesis to read log messages from the FIFO and publish them Kenisis.

Make a FIFO:

mkfifo /var/syslog.pipe

Modify syslog-ng configuration to send the logs to the named pipe. For example, on Ubuntu 14.04 create a file named /etc/syslog-ng/conf.d/01-kinesis.conf with the following configration:

destination d_pipe { pipe("/var/syslog.pipe"); };
log { source(s_src); destination(d_pipe); };

Start the app:

./bin/fifo2kinesis --fifo-name=/var/syslog.pipe --stream-name=my-stream

Restart syslog-ng:

service syslog-ng restart

The log stream will now be published to Kinesis.

Development

AWS Proxy uses Glide to manage dependencies.

Tests

Run the following commands to run tests and generate a coverage report:

GOPATH=$PWD go test -coverprofile=build/coverage.out fifo2kinesis
GOPATH=$PWD go tool cover -html=build/coverage.out

fifo2kinesis's People

Contributors

Stargazers

Watchers

Forkers

mafonso

fifo2kinesis's Issues

Add an assumed role option

Sometimes the Kinesis stream you want to publish to is in a different AWS account. You therefore need to assume a role in order to access it. This app should provide an option to be able to assume a role in order to support this use case.

Implement a logging strategy

App hangs when it encounters a read error

The app should shut down since can no longer read from the fifo, however it throws a CRIT and then hangs until ctrl+c is pressed. To replicate, pass in a non-existent fifo.

Options cannot be used with `gb test`, conflicts with application options

For some reason we cannot pass options to gb because the app's options system throws error when trying to use options that are not defined by the application.

Implement unit tests

The architecture has solidified. Pending #11, the time has come to focus more on tests.

Add support for gathering metrics in CloudWatch

Implement a file buffer

We currently only have a memory buffer. A file buffer would consume more resources, but it would have less chance of losing data if the app or system crashes before the buffer is flushed.

Implement a buffer size limit

This app reads from the fifo as fast as it can so that applications writing to it aren't blocked. The channel that stores buffer chunks has a buffer size of 100, so theoretically we store up to ~ 500MB in memory, assuming that the fifo is flooded with the maximum size of requests (5MB max for 500 records, see #10 for more details on how we get these numbers).

There should be a --buffer-size-limit, that is a multiple of 5MB, that would make this value configurable.

Add the ability to fetch the stack name from a URL

Because you cannot increase the number of shards in a Kinesis stream, scaling up requires creating a new resource. In order to facilitate autodiscovery of the new resource, it might help to be able to specify a URL that fifo2kinesis can check periodically to get the stream name. Thinking of a --stream-name-url options or something.

Flush interval values other than 0 and 5 don't behave as expected

Here is the relevant code snippet:

if w.FlushInterval > 0 {
    go func() {
        for {
            time.Sleep(time.Second * 5)
            forceFlush <- true

            // Send a flush command to unblock the fifo read in case no
            // lines are being written to the fifo. This command is
            // ignored below, the forceFlush channel is what matters.
            w.Fifo.SendCommand("flush")
        }
    }()
}

Notice the time.Sleep(time.Second * 5).

Document configuration options

Short option for fifo name is set as the default

  -d, --debug                Show debug level log messages
      --fifo-name string     The absolute path of the named pipe, e.g. /var/test.pipe (default "f")
  -s, --stream-name string   The name of the Kinesis stream

Define and document error handling

The docs say:

The application exits immediately with a non-zero exit code on all AWS errors

This is no longer true. We should define the error handling policy and document accordingly.

Make separate write requests once we have 5MB of data

Follow-up to #2. The buffer will flush when 500 lines have been received, however the data could exceed 5MB which would make the request fail. This is an unlikely scenario for our use case, however we should keep it on our radar to harden this app.

Add duplicate file detection when creating retry files

The filename is constructed from a timestamp and random string. Therefore collisions should practically never happen. However, it is pretty easy to create a loop of a fixed amount and re-generate the random string to ensure that the file is created. This is similar to how go creates temp files.

On AWS errors, send the data that failed somewhere so that it is re-processed

There is the potential for log messages being lost since lines are read from the FIFO and then never added back in on AWS errors. This is no good. We should find some way to get messages back into the FIFO so that nothing is lost if there are AWS errors.

Funky shutdown behavior

More details coming...

Add a --region option

You can configure the region via the AWS_REGION environment variable, but it would be good to be able to configure it via the command line as the environment's region might not match the region that the stream is in.

Switch from gb to glide for vendoring

Implement a partition key strategy

Right new we have a dummy partition key for testing, but we should implement some strategy so that this library can be used with multiple shards. The primary use case that spawned this library is to send log messages to Kinesis, so maybe the following logs would split into 3 partition keys:

Jan 11 17:26:04 localhost jenkins: INFO: plexus-rds-hourly-backup #1448 main build action completed: SUCCESS
Jan 11 17:27:19 localhost dhclient: DHCPREQUEST of 10.40.10.4 on eth0 to 10.40.10.1 port 67 (xid=0x7697c2c5)
Jan 11 17:27:19 localhost dhclient: DHCPACK of 10.40.10.4 from 10.40.10.1
Jan 11 17:27:19 localhost dhclient: bound to 10.40.10.4 -- renewal in 1620 seconds.
Jan 11 17:29:10 localhost sudo: pam_unix(sudo:session): session closed for user root

localhost:jenkins, localhost:dhclient, and localhost:sudo. Obviously localhost needs to change.

Determine how to handle lines that exceed 1MB

The maximum size for a record (plus partition key) is 1MB. We should figure out how to handle lines that exceed 1MB so that we don't send a big request that we know will fail (and thus be captured in our retry system).

Race condition during shutdown, potential for losing one line of data

In the ReadFifo method, the wg.Add(1) comes after the scanner.Scan(). If shutdown happens within that split second, then that line will be read but not processed. Add a time.Sleep(time.Second * 5) in between to prove the condition.

Use gb to manage dependencies

Since this produces a binary, let's use gb to vendor stuff.

Implement a write buffer

Right now, one message = one put request to Kenisis. This is fine for our use case, but as the volume of logs increase we will likely need to buffer put requests so that we can send them in batch.

Make the max number of retry attempts configurable

Currently the app retries attempts in batches of 3 files. This number should be configurable.

Make the retry interval configurable

Currently, retries kick off every 30 seconds. We should make this value configurable so that retries can happen more or less frequently.

Accept options via command line arguments

Right now only config files and environment variables are supported.

Wait for retry operations to gracefully end before shutting down

Shutdown operations can stop a retry operation while it is being processed. This could lead to duplicate messages since the retry file could stick around even if some or all of the itms have been written back to the fifo.

A better way to close your bufio.Scanner

I noticed this:

https://github.com/acquia/fifo2kinesis/blob/master/src/fifo2kinesis/fifo.go#L72-L78

So, the way I've handled this is to open a file and launch a goroutine to which you pass that filehandle. The bufio.Scanner is then created around that filehandle (e.g. scanner := bufio.NewScanner(filehandle)) inside the child goroutine. When the parent routine sees that it's time to shutdown (via context.Context or whatever), the parent routine closes the filehandle and that will cause the scanner.Scan() in the child goroutine to abort.

Something like this:

file, err := os.OpenFile(f.Name, os.O_RDONLY, os.ModeNamedPipe)
if err != nil {
	return err
}

sigs := make(chan os.Signal, 1)
readerDone := make(chan struct{}, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)

go readLog(file, readerDone)

select {
   case <-sigs:
        // We recieved a SIGINT or SIGTERM, so we cancel our context and break out
        // of this loop to wait for the workers to finish
        log.Println("Exiting...")
        tr.Close()
        cancel()
   case <-readerDone:
         // Our reader finished up (hit EOF or got an error) so we cancel our context
         // and break out of the loop to wait for workers to finish
        cancel()
}



func readLog(f *os.File, readerDone chan struct{}) {
        defer close(readerDone)
        scanner := bufio.NewScanner(tr)

        for scanner.Scan() {
                line := scanner.Text()
                // Do something with the line
        }

        if err := scanner.Err(); err != nil {
                log.Println("error reading:", err)
        }
}

Add better error handling to retry mechanism

For example:

What if writing certain lines back to the fifo fails?
What if there are scanner errors, and the file was only partially read?
What if we cannot remove the retry file?

Marking as an enhancement, since we made a decision to accept these risks as the rewards of retry handling without handling the what if's is worth it.