GithubHelp home page GithubHelp logo

brimdata / zed Goto Github PK

View Code? Open in Web Editor NEW
1.4K 22.0 67.0 39.6 MB

A novel data lake based on super-structured data

Home Page: https://zed.brimdata.io/

License: BSD 3-Clause "New" or "Revised" License

Go 99.04% Makefile 0.19% Shell 0.40% Python 0.37%

zed's Introduction

Zed Tests GoPkg

Zed offers a new approach to data that makes it easier to manipulate and manage your data.

With Zed's new super-structured data model, messy JSON data can easily be given the fully-typed precision of relational tables without giving up JSON's uncanny ability to represent eclectic data.

Trying out Zed is easy: just install the command-line tool zq.

zq is a lot like jq but is built from the ground up as a search and analytics engine based on the Zed data model. Since Zed data is a proper superset of JSON, zq also works natively with JSON.

While zq and the Zed data formats are production quality, the Zed project's Zed data lake is a bit earlier in development.

For a non-technical user, Zed is as easy to use as web search while for a technical user, Zed exposes its technical underpinnings in a gradual slope, providing as much detail as desired, packaged up in the easy-to-understand ZSON data format and Zed language.

Why?

We think data is hard and it should be much, much easier.

While schemas are a great way to model and organize your data, they often get in the way when you are just trying to store or transmit your semi-structured data.

Also, why should you have to set up one system for search and another completely different system for historical analytics? And the same unified search/analytics system that works at cloud scale should run easily as a lightweight command-line tool on your laptop.

And rather than having to set up complex ETL pipelines with brittle transformation logic, managing your data lake should be as easy as git.

Finally, we believe a lightweight data store that provides easy search and analytics would be a great place to store data sets for data science and data engineering experiments running in Python and providing easy integration with your favorite Python libraries.

How?

Zed solves all these problems with a new foundational data format called ZSON, which is a superset of JSON and the relational models. ZSON is syntax-compatible with JSON but it has a comprehensive type system that you can use as little or as much as you like. Zed types can be used as schemas.

The Zed language offers a gentle learning curve, which spans the gamut from simple keyword search to powerful data-transformation operators like lateral sub-queries and shaping.

Zed also has a cloud-based object design that was modeled after the git design pattern. Commits to the lake are transactional and consistent.

Quick Start

Check out the installation page for a quick and easy install.

Detailed documentation for the entire Zed system and language is available on the Zed docs site.

Zui

The Zui app is an Electron-based desktop app to explore, query, and shape data in your Zed lake.

We originally developed Zui for security-oriented use cases (having tight integration with Zeek, Suricata, and Wireshark), but we are actively extending Zui with UX for handling generic data sets to support data science, data engineering, and ETL use cases.

Contributing

See the contributing guide on how you can help improve Zed!

Join the Community

Join our public Slack workspace for announcements, Q&A, and to trade tips!

Acknowledgment

We modeled this README after Philip O'Toole's brilliantly succinct description of rqlite.

zed's People

Contributors

alfred-landrum avatar dianetc avatar henridf avatar iloveitaly avatar jameskerr avatar jamii avatar jasondavies avatar marktwallace avatar mason-fish avatar mattnibs avatar mccanne avatar mikesbrown avatar nwt avatar philrz avatar stevesmoot avatar zmajeed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zed's Issues

Improve sorting used during ingest

Brim v0 ensures records are sorted by timestamp, starting from the TSV logs generated by Zeek and then invoking the sort processor using a limit:

sort -r -limit 10000000 ts

We should use a different technique that's efficient & doesn't have a hardcoded maximum record cap.

Case-insensitive searches

The zq engine (and hence the searches performed in the Brim app that rely on it) has up until now treated all searches as case sensitive. So for example:

$ cat hostnames.ndjson 
{"hostname": "Facebook.com"}
{"hostname": "facebook.com"}

$ zq -f table "facebook" hostnames.ndjson 
HOSTNAME
facebook.com

We recognize that most search tools users are already familiar with are case-insensitive by default, so this issue captures our intent to change our behavior to match that expectation.

Once we flip the default behavior, we should likely offer a way that users can specify if they want some portion of their search to still be matched in a case-sensitive way.

Whatever approach we offer users to do that should be designed to accommodate a related enhancement we may add in the future (doesn't have to be now, as the demand for it has not been as strong): Case-insensitive aggregations. Revisiting the example above:

$ zq -f table "count() by hostname" hostnames.ndjson 
HOSTNAME     COUNT
Facebook.com 1
facebook.com 1

We may want to similarly flip the default behavior to get a count of 2 here, and hopefully the syntax we come up to invoke case-sensitivity in the search context could be repurposed here when that becomes a priority.

Log ingest error/warning handling

Log ingest (the /space/{space}/log POST endpoint) needs to be changed to:

  • if an input file can't be read/autodetected, send a warning but keep going for the others
  • send warnings back from the per-file readers

This is driven by Brim app requirements. More design details will be added to this issue as it comes together.

add compression type code and framing support

In bzng reader and writer, add support for in-line compressed data, by using a type code to indicate a compressed section:
<comp-type-code><comp-kind><uvarint-len><len bytes of compressed zng data>

tool to create directory hierarchy of zng files

Following the zar prototype work in #482, we need a tool that would read zng records and distribute them into a directory hierarchy of bzng files. zar would then be able to create indices for each bzng file in its directory.

"index out of range" crash when reading bad input file

@henridf stumbled onto an example where zq crashed when he accidentally asked it to read a bad input file. The specific input file happened to be a prior revision of zq's own changelog. I can repro this currently with zq version 280f671 and the attached input file.

# zq -version
Version: v0.9.0-18-g280f671

# zq CHANGELOG.txt | zq "*" -
panic: runtime error: index out of range [0] with length 0

goroutine 1 [running]:
github.com/brimsec/zq/zio/zngio.(*Reader).ReadPayload(0xc00007caf0, 0x1858320, 0x20be108, 0x0, 0xc0001e1a50, 0x100e388, 0x30)
	/Users/phil/work/zq/zio/zngio/reader.go:92 +0x22f
github.com/brimsec/zq/zio/zngio.(*Reader).Read(0xc00007caf0, 0xc00007caf0, 0x10000, 0xc000258000)
	/Users/phil/work/zq/zio/zngio/reader.go:67 +0x32
github.com/brimsec/zq/zio/detector.match(...)
	/Users/phil/work/zq/zio/detector/reader.go:58
github.com/brimsec/zq/zio/detector.NewReader(0x152b900, 0xc0001ecf30, 0xc0001878f0, 0xc0001ecf30, 0xc0000100d8, 0x0, 0x0)
	/Users/phil/work/zq/zio/detector/reader.go:21 +0x27a
main.(*Command).inputReaders(0xc00019e640, 0xc00000e090, 0x1, 0x1, 0x0, 0x0, 0xc00007a980, 0x40, 0x40)
	/Users/phil/work/zq/cmd/zq/zq.go:275 +0x3be
main.(*Command).Run(0xc00019e640, 0xc00000e090, 0x1, 0x1, 0x0, 0x0)
	/Users/phil/work/zq/cmd/zq/zq.go:198 +0x24a
github.com/mccanne/charm.(*instance).run(0xc00000e960, 0xc00000e090, 0x1, 0x1, 0x0, 0x0)
	/Users/phil/.go/pkg/mod/github.com/mccanne/[email protected]/instance.go:53 +0x28c
github.com/mccanne/charm.(*Spec).ExecRoot(0x184ff20, 0xc00000e090, 0x1, 0x1, 0xc000157f78, 0x10074cf, 0xc00007e058, 0x0)
	/Users/phil/.go/pkg/mod/github.com/mccanne/[email protected]/charm.go:77 +0x96
main.main()
	/Users/phil/work/zq/cmd/zq/main.go:9 +0x78

CHANGELOG.txt

It's an elusive rascal, though. If you try this with the current rev of zq's CHANGELOG.md, you instead get the more appropriate malformed input.

zqd: send scanner stats when running a flowgraph

zqd currently sends zero-valued api.ScannerStats when running a flowgraph. We should switch it to sending actual scanner stats at some point. TBD if these should also be sent during ingest or only during search (ingest also does run scanner(s) on log files). this would also be a good time to revisit the question raised in this thread: #549 (comment).

Consider a new zql operator for string pattern matching

Pattern matching of strings is currently written as field = /pattern/.
We should introduce a new operator for this, something like field =~ /pattern/. In this way, we can also make glob matching explict: name = *.html would match the literal string *.html while name =~ *.html would do a glob.

Improve auto-detection errors

Autodetection is working as intended in terms of autodetection. However, it isn't providing very helpful feedback upon failures. The reason for this is that it tries each format in turn and returns a "malformed input" if none succeed.

Here's an example of how autodetection occludes the useful error message:

19:03 ~/work/looky/zq(no-dupe-record-fields)
$ cat !$
cat ~/tmp/stuff.zson
#0:record[foo:record[bar:string,bar:string]]
0:[["1";"2";]]
19:03 ~/work/looky/zq(no-dupe-record-fields)
$ zq ~/tmp/stuff.zson 
malformed input
19:03 ~/work/looky/zq(no-dupe-record-fields)
$ zq -i zng ~/tmp/stuff.zson 
line 1: duplicate fields in record type

A possible solution here would be:

  • Each reader needs to distinguish between syntax errors (signaled via the SyntaxError err) and higher-level semantic errors such as duplicate fields (above), or a value not matching its record type, etc. We may already be doing this perfectly but should check as part of this issue.
  • The autodetector should keep each reader's error, and upon failure of all readers, if any error is a non-Syntax error, that one should be used.

Other options:

  • Collect all errors, and output a summary (“ndjson error: x, zeek error: y, …”)
  • Use the file extension as a hint of which candidate reader’s error to return. For example, if the filename is foo.zng, then use the zng reader’s error message. This works better for some formats than others, for example .json could be ndjson or zjson; .log could be ndjson or zeek.

These are non-exclusive: a good solution might combine elements of more than one of these. And there might of course be other ways to improve this.

Error make install zq.

Hello,

echo $GOPATH
/home/xxx/go

$ make install or sudo make install

cmd/pcap/main.go:7:2: cannot find package "github.com/brimsec/zq/cmd/pcap/cut" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/pcap/cut (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/pcap/cut (from $GOPATH)
cmd/pcap/main.go:8:2: cannot find package "github.com/brimsec/zq/cmd/pcap/index" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/pcap/index (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/pcap/index (from $GOPATH)
cmd/pcap/main.go:9:2: cannot find package "github.com/brimsec/zq/cmd/pcap/root" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/pcap/root (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/pcap/root (from $GOPATH)
cmd/pcap/main.go:10:2: cannot find package "github.com/brimsec/zq/cmd/pcap/slice" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/pcap/slice (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/pcap/slice (from $GOPATH)
cmd/pcap/main.go:11:2: cannot find package "github.com/brimsec/zq/cmd/pcap/ts" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/pcap/ts (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/pcap/ts (from $GOPATH)
cmd/pcap/cut/command.go:14:2: cannot find package "github.com/brimsec/zq/pcap/pcapio" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/pcap/pcapio (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/pcap/pcapio (from $GOPATH)
cmd/pcap/cut/command.go:15:2: cannot find package "github.com/mccanne/charm" in any of:
/usr/lib/go-1.10/src/github.com/mccanne/charm (from $GOROOT)
/home/xxx/go/src/github.com/mccanne/charm (from $GOPATH)
cmd/pcap/index/command.go:12:2: cannot find package "github.com/brimsec/zq/pcap" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/pcap (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/pcap (from $GOPATH)
cmd/pcap/slice/command.go:15:2: cannot find package "github.com/brimsec/zq/pkg/nano" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/pkg/nano (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/pkg/nano (from $GOPATH)
cmd/zar/main.go:7:2: cannot find package "github.com/brimsec/zq/cmd/zar/chop" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zar/chop (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zar/chop (from $GOPATH)
cmd/zar/main.go:8:2: cannot find package "github.com/brimsec/zq/cmd/zar/find" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zar/find (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zar/find (from $GOPATH)
cmd/zar/main.go:9:2: cannot find package "github.com/brimsec/zq/cmd/zar/index" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zar/index (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zar/index (from $GOPATH)
cmd/zar/main.go:10:2: cannot find package "github.com/brimsec/zq/cmd/zar/root" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zar/root (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zar/root (from $GOPATH)
cmd/zar/chop/command.go:12:2: cannot find package "github.com/brimsec/zq/pkg/bufwriter" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/pkg/bufwriter (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/pkg/bufwriter (from $GOPATH)
cmd/zar/chop/command.go:14:2: cannot find package "github.com/brimsec/zq/zbuf" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zbuf (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zbuf (from $GOPATH)
cmd/zar/chop/command.go:15:2: cannot find package "github.com/brimsec/zq/zio" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zio (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zio (from $GOPATH)
cmd/zar/chop/command.go:16:2: cannot find package "github.com/brimsec/zq/zio/bzngio" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zio/bzngio (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zio/bzngio (from $GOPATH)
cmd/zar/chop/command.go:17:2: cannot find package "github.com/brimsec/zq/zng/resolver" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zng/resolver (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zng/resolver (from $GOPATH)
cmd/zar/find/command.go:9:2: cannot find package "github.com/brimsec/zq/archive" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/archive (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/archive (from $GOPATH)
cmd/zdx/main.go:9:2: cannot find package "github.com/brimsec/zq/cmd/zdx/create" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zdx/create (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zdx/create (from $GOPATH)
cmd/zdx/main.go:10:2: cannot find package "github.com/brimsec/zq/cmd/zdx/dump" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zdx/dump (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zdx/dump (from $GOPATH)
cmd/zdx/main.go:11:2: cannot find package "github.com/brimsec/zq/cmd/zdx/lookup" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zdx/lookup (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zdx/lookup (from $GOPATH)
cmd/zdx/main.go:12:2: cannot find package "github.com/brimsec/zq/cmd/zdx/merge" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zdx/merge (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zdx/merge (from $GOPATH)
cmd/zdx/main.go:13:2: cannot find package "github.com/brimsec/zq/cmd/zdx/root" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zdx/root (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zdx/root (from $GOPATH)
cmd/zdx/create/command.go:9:2: cannot find package "github.com/brimsec/zq/zdx" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zdx (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zdx (from $GOPATH)
cmd/zq/zq.go:14:2: cannot find package "github.com/brimsec/zq/ast" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/ast (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/ast (from $GOPATH)
cmd/zq/zq.go:15:2: cannot find package "github.com/brimsec/zq/driver" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/driver (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/driver (from $GOPATH)
cmd/zq/zq.go:16:2: cannot find package "github.com/brimsec/zq/emitter" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/emitter (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/emitter (from $GOPATH)
cmd/zq/zq.go:18:2: cannot find package "github.com/brimsec/zq/scanner" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/scanner (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/scanner (from $GOPATH)
cmd/zq/zq.go:21:2: cannot find package "github.com/brimsec/zq/zio/detector" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zio/detector (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zio/detector (from $GOPATH)
cmd/zq/zq.go:22:2: cannot find package "github.com/brimsec/zq/zio/ndjsonio" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zio/ndjsonio (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zio/ndjsonio (from $GOPATH)
cmd/zq/zq.go:24:2: cannot find package "github.com/brimsec/zq/zqd/ingest" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zqd/ingest (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zqd/ingest (from $GOPATH)
cmd/zq/zq.go:25:2: cannot find package "github.com/brimsec/zq/zql" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zql (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zql (from $GOPATH)
cmd/zq/zq.go:27:2: cannot find package "go.uber.org/zap" in any of:
/usr/lib/go-1.10/src/go.uber.org/zap (from $GOROOT)
/home/xxx/go/src/go.uber.org/zap (from $GOPATH)
cmd/zqd/main.go:7:2: cannot find package "github.com/brimsec/zq/cmd/zqd/listen" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zqd/listen (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zqd/listen (from $GOPATH)
cmd/zqd/main.go:8:2: cannot find package "github.com/brimsec/zq/cmd/zqd/root" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zqd/root (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zqd/root (from $GOPATH)
cmd/zqd/main.go:9:2: cannot find package "github.com/brimsec/zq/zqd" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zqd (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zqd (from $GOPATH)
cmd/zqd/listen/command.go:14:2: cannot find package "github.com/brimsec/zq/cmd/zqd/logger" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/cmd/zqd/logger (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/cmd/zqd/logger (from $GOPATH)
cmd/zqd/listen/command.go:16:2: cannot find package "github.com/brimsec/zq/pkg/httpd" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/pkg/httpd (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/pkg/httpd (from $GOPATH)
cmd/zqd/listen/command.go:18:2: cannot find package "github.com/brimsec/zq/zqd/zeek" in any of:
/usr/lib/go-1.10/src/github.com/brimsec/zq/zqd/zeek (from $GOROOT)
/home/xxx/go/src/github.com/brimsec/zq/zqd/zeek (from $GOPATH)
cmd/zqd/listen/command.go:21:2: cannot find package "gopkg.in/yaml.v3" in any of:
/usr/lib/go-1.10/src/gopkg.in/yaml.v3 (from $GOROOT)
/home/xxx/go/src/gopkg.in/yaml.v3 (from $GOPATH)
cmd/zqd/logger/waterfall.go:4:2: cannot find package "go.uber.org/multierr" in any of:
/usr/lib/go-1.10/src/go.uber.org/multierr (from $GOROOT)
/home/xxx/go/src/go.uber.org/multierr (from $GOPATH)
cmd/zqd/logger/file.go:9:2: cannot find package "go.uber.org/zap/zapcore" in any of:
/usr/lib/go-1.10/src/go.uber.org/zap/zapcore (from $GOROOT)
/home/xxx/go/src/go.uber.org/zap/zapcore (from $GOPATH)
cmd/zqd/logger/file.go:10:2: cannot find package "gopkg.in/natefinch/lumberjack.v2" in any of:
/usr/lib/go-1.10/src/gopkg.in/natefinch/lumberjack.v2 (from $GOROOT)
/home/xxx/go/src/gopkg.in/natefinch/lumberjack.v2 (from $GOPATH)
make: *** [install] Error 1

Best Regard,

pcap SearchReader: Fix empty section headers

Currently the pcap.SearchReader will every section header it finds when reading through a pcap file. Instead the SearchReader should only write the headers that are needed in order to generate a valid pcap file from the search result. This especially means that calls to SearchReader.Read on "no results searches" should get an io.EOF on the first call.

Additionally when creating a SearchReader, the reader should scan the pcap file it will search to ensure that packets capture data will be returned. If the reader finds the pcap file does not have any packets that match the search, NewSearchReader should return an error.

zq crashes when the zql char count exceeds 255 chars

I built a longish search that looked like uid=foo or uid=bar or ... or uid=baz. The search caused zq to crash. The crash occurs once the zql character count exceeds 255 characters.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x132d84b]

goroutine 1 [running]:
main.fileExists(0x7ffeefbff473, 0x100, 0xc0000000d4)
        zq/cmd/zq/zq.go:172 +0xab
main.(*Command).Run(0xc0000ea120, 0xc000020070, 0x2, 0x2, 0x0, 0x0)
        zq/cmd/zq/zq.go:216 +0x172
github.com/mccanne/charm.(*instance).run(0xc00000e6a0, 0xc000020070, 0x2, 0x2, 0x0, 0x0)
        go/pkg/mod/github.com/mccanne/[email protected]/instance.go:53 +0x28c
github.com/mccanne/charm.(*Spec).ExecRoot(0x16fef20, 0xc000020070, 0x2, 0x2, 0xc000115f50, 0x100741f, 0xc0000a6058, 0x0)
        go/pkg/mod/github.com/mccanne/[email protected]/charm.go:77 +0x96
main.main()
        zq/cmd/zq/main.go:9 +0x78

The trace above and the code snippet below is for zq v0.8.0.

164 func fileExists(path string) bool {
165     if path == "-" {
166         return true
167     }
168     info, err := os.Stat(path)
169     if os.IsNotExist(err) {
170         return false
171     }
172     return !info.IsDir()
173 }

The problem is fileExists() isn't handling the case of err on L168 being anything other than "file does not exist". In my case I'm getting an effective ENAMETOOLONG, at which point L169 is false and the crash happens on L172.

Expressing non-numeric float values

The Go implementation of floating point used by zq (https://golang.org/pkg/math/) includes the concepts of +/- Infinity and NaN. Since ZNG uses this for its underlying storage, it's technically feasible for these non-numeric values to be written/read as ZNG today. However, due to lack of full support all the way into the upper layers of the Zed language, the following gaps remain:

  • There are no float literals in the Zed language that match the underlying Go constructs (i.e. no way for a user to check if a field contains one of these non-numeric values).
  • There's no conventions in ZSON for presenting these values. (i.e. no way for data to take a "round trip" from binary ZNG to ZSON presentation and then back to being stored as binary ZNG again).

pcap tests with index file

We need to be able to do tests with index files. There is no test case for the bug fixed in commit #424.

In particular, we need to run a command that creates an index file, then run a subsequent command that uses the index file created by the first command.

We might be able to use the existing shell test framework to do this, but we have to look into it, and if not figure out how to make it work.

Convert pkg/test.Internal tests to structured file tests

@nwt says:

Now that #342 is done, we should be able to convert tests that use pkg/test.Internal to the new structured file approach. Those tests happen today via a call to make test-system; as part of this effort, these tests should no longer need the "system" build tag to executed.

@henridf adds:

I think the format tests (that use test.Exec) would also be a good candidate for porting over.

Improve "bad pcap file" error message

While we're certain to encounter truly corrupt pcap files in the wild, we've also had situations where we failed to extract flows successfully due to our own bugs or lack of full support for all pcap/pcapng variations. It'd therefore probably be appropriate to adjust the error message to something a little less definite, e.g. "unable to extract from pcap file" or whatever is most accurate based on the thing we're trying & failing, rather than pronouncing judgement on the quality of the file.

While I was looking at this, I noticed that in the code the ErrCorruptPcap that results in bad pcap file is used in several places. Would it make sense to break these out into separate, more granular errors that would help us narrow down the root cause if a user reports one of these problems to us?

Handle sorted vs unsorted inputs

zq currently has some half-baked pieces for dealing with sorted inputs:

  • when scanner.Combiner is merging multiple files it peeks at the timestamps of the next record from each file and returns the lowest one. This only produces a useful result when all inputs have ascending timestamps.
  • The proc.Context object includes a Reverse flag to indicate that records are sorted with descending timestamps rather than ascending. This is a global property of a query and it has no way to indicate that a particular stream is unsorted.
  • The groupby proc when it is time-binned assumes that timestamps are sorted (using the Reverse flag described above to distinguish ascending or descending). If this flag is wrong (either because the stream isn't sorted at all or because it isn't set correctly), groupby generates incorrect results.

This is working well enough right now in the app since all.bzng is always sorted by descending timestamps during ingest and the app always passes "dir":-1 in its queries (incidentally, we should remove that from the zqd api since any other value will produce incorrect results).

The zql spec includes ordering hints which are directives that can be inserted in a (b)zng file to indicate if/how the contents of the file/stream are sorted. This issue is to generalize the current "Reverse" flag into a more complete solution that handles streams that can be in one of 3 states: sorted ascending, sorted descending, or unsorted. Note that this property might be different at different points in a query graph (e.g., if points are sorted by timestamp and then they enter a sort foo proc, they are no longer sorted by timestamp). For now, it is probably sufficient to just implement this for the timestamp field and ignore zng sorting hints if they indicate sorting by any other field.

When this was discussed in the past, there was some disagreement about exactly how to implement it. The controversial aspect was how to handle procs that might change the sorting order as described above.

@aswan impelemented a solution in which sorting information was communicated from readers to procs and between procs dynamically at runtime. @nwt, @henridf, and @mccanne all disliked this and proposed a different solution: the introduction of a separate static analysis phase that would analyze an entire proc graph and determine the sorting properties at each point in the graph. It is unclear how this would work when the sortedness of a (b)zng input is unknown until the stream has been read.

Incorrect formatting of negative times with text output

12:27 ~/work/brim/zq(master)
$ cat negtime.zng 
#0:record[_path:string,ts:time]
0:[conn;1425565514.419939;]
0:[conn;-1425565514.419939;]
12:27 ~/work/brim/zq(master)
$ zq -f text negtime.zng 
conn	2015-03-05T14:25:14.419939Z
conn	1924-10-29T09:34:45.580061Z

For context, we handle negative times because zeek has been observed to output them. It is unclear what a proper ISO-ish formatting should be; maybe just ISO-format the positive timestamp and prefix it with -? That would be -2015-03-05T14:25:14.419939Z in the above.

Handle search Dir parameters properly

The zqd search api takes a "Dir" parameter which is meant to specify whether results should be from oldest-to-newest or from newest-to-oldest. But the only value that works right now is -1 to specify newest-to-oldest. Passing 1 (to request oldest-to-newest) just silently breaks some queries (such as those that include time-binned groupby procs).

At the very least we should validate Dir and return an error if a value other than -1 is passed. We could also just remove this parameter, it would not be difficult to add it back if/when we implement the ability to generate results in either order. (see also #501)

zqd: http error handling issues in /search endpoint

The search handler calls http.Error() after having possibly written to its responsewriter, which triggers sending a 200. In that case, the subsequent http.Error() is a no-op and no error is sent to the client.

This could be addressed by having the handler do upfront request validation, returning an HTTP error upon failure. If the request is valid, then the handler launches the search and the HTTP request gets a 2xx back, after which api protocol messages can be sent to the client. If there is an error during search processing, it is sent back to the client in the final TaskEnd message.

Rename packet endpoints to pcap

With the decision to refer to packet captures as "pcaps", these name of these two endpoints need to change:

  • GET /space/:space/packet -> /space/:space/pcap
  • POST /space/:space/packet -> /space/:space/pcap

There's also some api payloads that need to change:

  • SpaceInfo.packet_path
  • SpaceInfo.packet_size
  • SpaceInfo.packet_support
  • PacketPostStatus.packet_total_size
  • PacketPostStatus.packet_read_size

Should also use this opportunity to change the name of any additional struct/var/function names that refer to Packet when it should be pcap.

The api changes will break the brim api client, changes will need to be made there when updating to the latest version of zqd.

JSON type warnings vs. errors

@henridf notes:

Errors related to types.json type mapping (for example no _path, or no descriptor found) are printed on stderr with an appropriate line number to help the user dig in and figure out what is wrong.

However warnings are not. They should be.

To be specific, right now there's only one instance of "warning" , which is when a ndjson record has fields that aren't present in the descriptor. It's a warning because we can still type the object correctly (at least, the parts of it that are referenced in the descriptor), but the user should be made aware of this mismatch between their data and the types.json they are using.

Commonize ZQL compilation code

The code to parse, compile, and drive a zql query to completion can be found in a few places in slightly different forms. It's most obvious that it's copied between the zq command the zqd search API endpoint, but it's also used in a few test calls as well. We should refactor as needed so that we can put the code in a common place.

Change standard file extensions for ZNG files

The ZNG spec defines binary and text formats for files. We currently use 'bzng' as the extension for binary and 'zng' as the extension for text. The binary format is more widely used; the text format is currently used for testing, debugging, or demos.

We discussed internally, and thought it would make sense to change the 'standard' extensions now: we'll use 'zng' as the extension for the binary format, and 'tzng' for the text format. We can continue to interpret an extension of 'bzng' as the binary format.

master starts to error screen with existing space

If I try to run Brim off of current master (16e606a7) when I've previously created a space with the Brim 0.6.0 release, the application loads directly to an error screen, with this error:

TypeError: Invalid attempt to spread non-iterable instance
    at _nonIterableSpread (/Users/alfred/work/brim/dist/js/state/LogDetails/reducer.js:15:39)
    at _toConsumableArray (/Users/alfred/work/brim/dist/js/state/LogDetails/reducer.js:13:95)
    at toHistory (/Users/alfred/work/brim/dist/js/state/LogDetails/reducer.js:83:43)
    at /Users/alfred/work/brim/dist/js/state/LogDetails/selectors.js:32:33
    at /Users/alfred/work/brim/node_modules/reselect/lib/index.js:76:25
    at /Users/alfred/work/brim/node_modules/reselect/lib/index.js:36:25
    at /Users/alfred/work/brim/node_modules/reselect/lib/index.js:90:33
    at Object.getHistory (/Users/alfred/work/brim/node_modules/reselect/lib/index.js:36:25)
    at Function.stateToProps [as mapToProps] (/Users/alfred/work/brim/dist/js/components/RightPane.js:172:40)
    at mapToPropsProxy (/Users/alfred/work/brim/node_modules/react-redux/lib/connect/wrapMapToProps.js:53:92)

repro on windows & mac:

  • run 0.6.0 release, ingest a pcap, exit brim
  • run master (currently at 16e606a7)

Fix pcap search endpoint not found

There's some buggy behavior around how zqd handles searches that do not produce any results:

  • The 404 response that the handler tries to write is never received because a 200 response already has been sent
  • It also attempts to write a string error message that doesn't match the pcap content-type header that has already been sent.
  • The pcap SearchReader will write also stream back section headers for each pcap on searches that return zero packets.

We should reformat the pcap.SearchReader to peek through the pcap file being searched to check if there are any results to the search, before the content type headers are written and data is copied to the response writer.

Pcap Search Endpoint: streaming errors

The pcap reader returns an error as it parses the underlying pcap file. Currently zqd attempts to write status 500 with a string error message when an error is found as the pcap reader parses the underlying pcap file. Because the status headers have already been sent this does not make sense. Unlike the search endpoint an error message cannot be sent because the content type has already been set to pcap.

Instead of trying to respond with an error message and status code the handler should abort the request and log the error.

Bare searches should match on field names

Currently bare searches match on field values, but not field names. For example:

$ echo '{"foo": "bar"}' | zq "bar" -
#0:record[foo:string]
0:[bar;]

$ echo '{"foo": "bar"}' | zq "foo" -
[no output]

Other search tools that users are accustomed to using would often match in the second case as well. To meet user expectations, we should do the same.

It seems like once we do this, we already have a syntax for users to express a desire to match only on field values. In this example it would be:

$ echo '{"foo": "bar"}' | zq "*=bar" -
#0:record[foo:string]
0:[bar;]

json reader: Support multiple typed arrays

The json parser returns an error when parsing a json array with multiple types, e.g.:

{ "arr1": [1, "mystring", { "string1": "value1"} ] } 

It would be great if this would work- either through some kind of type coercion rules, or maybe the a type vector[any].

Ambiguity between fractions & CIDR

In the era of put, it seems our relaxed parsing of CIDR notation can create ambiguity. Currently working with zq version 41f545a, consider the following:

# echo '{"foo": "bar"}' | zq "put a=22/7 | cut a" -
#0:record[a:net]
0:[22.0.0.0/7;]

It's entirely likely the user in the case was trying to populate field a with a calculated value close to Pi, not a net. We need to find an intuitive way for the user to express each case appropriately.

There may be multiple ways to remove this ambiguity, but one possibility that comes to mind would be to require CIDR notation in ZQL to be written out fully, e.g. 22.0.0.0/7 in this case.

Let groupby use arbitrary expressions

The by ... clause in a groupby proc is currently a list of field names. We should generalize this to allow an arbitrary expression. This would enable lots of things that can't be easily expressed right now, such as

count() by len(someset)
count() by String.tolower(something)
count() by 1000*Math.round(bytes/1000)

Zeek reader trims whitespace too aggressively

The code here trims any trailing whitespace: https://github.com/brimsec/zq/blob/master/zio/zeekio/reader.go#L46

A log in which the last column is a string type and there's a value that just consists of whitespace is valid but we fail to read it. @philrz came across an example in the wild that looks like this:

#separator \x09
#set_separator	,
#empty_field	(empty)
#unset_field	-
#path	unknown_mime_type_discovery
#open	2020-04-10-18-15-32
#fields	ts	fid	bof
#types	time	string	string
1425634708.454143	FTBt2P3wJBdHGKpZwc	 

Its not visible in the text display but after the uid on the last line is a tab character and space character.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.