GithubHelp home page GithubHelp logo

justinazoff / flow-indexer Goto Github PK

View Code? Open in Web Editor NEW
45.0 5.0 6.0 218 KB

Flow-Indexer indexes flows found in chunked log files from bro,nfdump,syslog, or pcap files

Go 98.75% Makefile 0.33% Shell 0.75% Nix 0.17%
bro netflow index search-engine pcap

flow-indexer's Introduction

Flow Indexer Build Status

flow-indexer indexes flows

Usage: 
  flow-indexer [command]

Available Commands: 
  compact     Compact the database
  daemon      Start daemon
  expandcidr  Expand a CIDR range from those seen in the database
  index       Index flows
  search      Search flows
  help        Help about any command

Flags:
      --dbpath="flows.db": Database path
  -h, --help[=false]: help for flow-indexer


Use "flow-indexer [command] --help" for more information about a command.

Quickstart

Install

$ export GOPATH=~/go
$ go get github.com/JustinAzoff/flow-indexer

Create configuration

$ cp ~/go/src/github.com/JustinAzoff/flow-indexer/example_config.json config.json
$ vi config.json # Adjust log paths and database paths.

The indexer configuration is as follows:

  • name - The name of the indexer. Keep this short and lowercase, as you will use it as an http query param.
  • backend - The backend log ip extractor to use. Choices: bro, bro_json, nfdump, syslog, pcap, and argus.
  • file_glob - The shell globbing pattern that should match all of your log files.
  • recent_file_glob - The strftime+shell globbing pattern that should match todays log files.
  • filename_to_database_regex - A regular expression applied to each filename used to extract information used to name the database.
  • database_root - Where databases will be written to. Should be indexer specific.
  • datapath_path - The name of an individual database. This can contain $variables set in filename_to_database_regex.

The deciding factor for how to partition the databases is how many unique ips you see per day. I suggest starting with monthly indexes. If the indexing performance takes a huge hit by the end of the month, switch to daily indexes.

Run initial index

the indexall command will expand file_glob and index any log file that matches.

$ ~/go/bin/flow-indexer indexall

Start Daemon

Once the initial index is complete, start the daemon. Starting the daemon will expand recent_file_glob and index any recently created log file that matches.

$ ~/go/bin/flow-indexer daemon

It will do this in a 60 second loop to keep itself up to date.

Query API

$ curl -s 'localhost:8080/search?i=conn&q=1.2.3.0/24'
$ curl -s 'localhost:8080/dump?i=conn&q=1.2.3.0/24'
$ curl -s 'localhost:8080/stats?i=conn&q=1.2.3.0/24'

Service Configuration

Running flow-indexer as a service

systemd

To run flow-indexer as a service on a system using systemd, you can use the provided flow-indexer.service file.

upstart

If you are planning to run flow-indexer as a service on a system that uses upstart, you may want to consider a conf file like the following in order to properly syslog stdout and stderr from flow-indexer, and to run as a non-root user.

# flow-indexer - Flow Indexer
#
# flow-indexer is a service that indexes and allows retrieval of flows using bro logs

description     "Flow Indexer Daemon"

start on runlevel [345]
stop on runlevel [!345]

setuid flowindexer
setgid flowindexer

exec /path/to/bin/flow-indexer daemon --config /path/to/flow-indexer/config.json 2>&1 | logger -t flow-indexer

Common Issues

In order to avoid too many open files errors, you may want to increase the number of open files you allow the user that flow-indexer runs as to have access to. This can be done by changing your nofile setting in /etc/security/limits.conf as shown below.

flowindexer soft nofile 65535
flowindexer hard nofile 65535

Lower level commands example

Not really used anymore in practice, the daemon is the recommended way to use flow-indexer. But these commands can be useful for testing and development.

Index flows

./flow-indexer --dbpath /tmp/f/flows.db index /tmp/f/conn*
2016/02/06 23:36:51 /tmp/f/conn.00:00:00-01:00:00.log.gz: Read 4260 lines in 24.392765ms
2016/02/06 23:36:51 /tmp/f/conn.00:00:00-01:00:00.log.gz: Wrote 281 unique ips in 2.215219ms
2016/02/06 23:36:51 /tmp/f/conn.01:00:00-02:00:00.log.gz: Read 4376 lines in 24.186168ms
2016/02/06 23:36:51 /tmp/f/conn.01:00:00-02:00:00.log.gz: Wrote 310 unique ips in 1.495277ms
[...]
2016/02/06 23:36:51 /tmp/f/conn.22:00:00-23:00:00.log.gz: Read 7799 lines in 18.350788ms
2016/02/06 23:36:51 /tmp/f/conn.22:00:00-23:00:00.log.gz: Wrote 775 unique ips in 5.155262ms
2016/02/06 23:36:51 /tmp/f/conn.23:00:00-00:00:00.log.gz: Read 5255 lines in 15.296847ms
2016/02/06 23:36:51 /tmp/f/conn.23:00:00-00:00:00.log.gz: Wrote 400 unique ips in 2.910344ms

Re-Index flows

./flow-indexer --dbpath /tmp/f/flows.db index /tmp/f/conn*
2016/02/06 23:37:36 /tmp/f/conn.00:00:00-01:00:00.log.gz Already indexed
2016/02/06 23:37:36 /tmp/f/conn.01:00:00-02:00:00.log.gz Already indexed
2016/02/06 23:37:36 /tmp/f/conn.02:00:00-03:00:00.log.gz Already indexed
2016/02/06 23:37:36 /tmp/f/conn.03:00:00-04:00:00.log.gz Already indexed
[...]
2016/02/06 23:37:36 /tmp/f/conn.20:00:00-21:00:00.log.gz Already indexed
2016/02/06 23:37:36 /tmp/f/conn.21:00:00-22:00:00.log.gz Already indexed
2016/02/06 23:37:36 /tmp/f/conn.22:00:00-23:00:00.log.gz Already indexed
2016/02/06 23:37:36 /tmp/f/conn.23:00:00-00:00:00.log.gz Already indexed

Expand CIDR Range

./flow-indexer --dbpath /tmp/f/flows.db expandcidr 192.30.252.0/24
192.30.252.86
192.30.252.87
192.30.252.92
192.30.252.124
192.30.252.125
192.30.252.126
192.30.252.127
192.30.252.128
192.30.252.129
192.30.252.130
192.30.252.131
192.30.252.141

Search

./flow-indexer --dbpath /tmp/f/flows.db search 192.30.252.0/24
/tmp/f/conn.03:00:00-04:00:00.log.gz
/tmp/f/conn.04:00:00-05:00:00.log.gz
/tmp/f/conn.06:00:00-07:00:00.log.gz
/tmp/f/conn.14:00:00-15:00:00.log.gz
/tmp/f/conn.18:00:00-19:00:00.log.gz
/tmp/f/conn.20:00:00-21:00:00.log.gz
/tmp/f/conn.22:00:00-23:00:00.log.gz

flow-indexer's People

Contributors

grigorescu avatar jonzeolla avatar justinazoff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

flow-indexer's Issues

fix build on travis

the nfdump that comes with ubuntu on travis seems to be too old or broken to pass the tests. It's odd that it fails silently too.

Easiest fix is to probably to download+configure+make install the latest version from https://github.com/phaag/nfdump

search fails quietly

If you run a search against flow-indexer, and one of the files that are returned by the query fails to properly match the format provided in your filename_to_time_regex, flow-indexer will stop outputting results without giving an error message. I found this because it appeared that I was getting a limited number of files in response to my queries, and when I made a query that should match every log, I only got a subset of sequential logs that stopped at a specific date. I did troubleshooting by leveraging stats queries, which does properly fail with a helpful error message.

Happy to provide more details, if necessary. Chatted with @JustinAzoff in the #bro IRC channel about this on 2017-07-07.

Make leveldb options configurable

opts.BlockSize = 32768
opts.WriteBuffer = 67108864
opts.BlockCacheCapacity = 524288000
opts.OpenFilesCacheCapacity = 1024
opts.CompactionTableSize = 32 * 1024 * 1024
opts.WriteL0SlowdownTrigger = 16
opts.WriteL0PauseTrigger = 64
opts.Filter = filter.NewBloomFilter(10)

nfdump output is broken for ipv6

By default, nfdump will truncate v6 addresses, which will cause errors such as:

Non fatal read error: Invalid IP Address "2610:a5..:c05::f"

I'm not 100% sure if it's related or not, but we're seeing a ton of defunct processes:

32683 ? Z 0:00 [nfdump]
32687 ? Z 0:00 [nfdump]
32692 ? Z 0:00 [nfdump]
32694 ? Z 0:00 [nfdump]
32697 ? Z 0:00 [nfdump]
32698 ? Z 0:00 [nfdump]
32702 ? Z 0:00 [nfdump]
32703 ? Z 0:00 [nfdump]
32704 ? Z 0:00 [nfdump]
32712 ? Z 0:00 [nfdump]
32721 ? Z 0:00 [nfdump]
32724 ? Z 0:00 [nfdump]

From nfdump's man page:

To make the output more readable, IPv6 addresses are shrinked down to 16 characters. The seven most and seven least digits connected with two dots '..' are displayed in any normal output formats. To display the full IPv6 address, use the appropriate long format, which is the format name followed by a 6.

Example: -o line displays an IPv6 address as 2001:23..80:d01e where as the format -o line6 displays the IPv6 address in full length 2001:234:aabb::211:24ff:fe80:d01e. The combination of -o line -6 is equivalent to -o line6.

help me

Can you give me an example_config.json that handles pcap?
include Query API and Search

Search boundaries

Is there a way to limit the search or dump to a specific date/time range. This can be useful when working on specific incidents where we understand the timeline of the event. I did not see anything specific in the examples that would allow me to do that.

I tried to follow the source code by I am not a Go programmer.

Thank you,

José.

update nfdump backend to not use -o csv

nfdump -o csv is sloooooow compared to nfdump -o pipe.

[netflow@netflow ~]$ time nfdump  -R netdata/2018/03/26  -o pipe|wc -l
7986537

real    0m13.181s
user    0m12.447s
sys     0m0.548s
[netflow@netflow ~]$ time nfdump  -R netdata/2018/03/26  -o csv|wc -l
7986537

real    2m22.035s
user    1m38.702s
sys     0m41.532s

parsing nfdump files in go would likely be the fastest, but I don't want to have to support that code.

Simply configuration

Minimally now that I have

        "file_glob": "/bro/logs/*/notice.*gz",
        "recent_file_glob": "/bro/logs/%Y-%m-%d/notice.*gz",

Currently if recent_file_glob is missing, it can just default to file_glob. However, if only recent_file_glob is present, one could just convert the %X to * giving /bro/logs/*-*-*/notice.*gz which would also work in place of file_glob, so both are probably not needed.

Audit byte -> string conversions

I'm pretty sure I am converting between byte -> string -> byte a few times. The interfaces between backend/ipset/store should be possibly simplified to reduce conversion

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.