The data_model from aousterh

data_model's People

Contributors

A. Setup tasks:

Search pipeline setup (ElasticSearch)
Analytics pipeline setup (Spark)
ZNG/ZST setup
Kafka setup
- Script to ingest all JSON files (e.g., from zq-sample-data) into Kafka, one type of log per topic
- Consume data from Kafka and write to Elastic (e.g., using LogStash)
- Consume data from Kafka and write to Parquet files (e.g., using SparkStreaming)

TBD: check with Z people with their setups

B. Basic experiment tasks to check that everything works:

Feed JSON data into Kafka and check that it is automatically fed to Elastic and you can query it there
Feed JSON data into Kafka and check that it is automatically fed into Parquet files, which you can query
Check that we can query ZNG data
Issue 3-5 queries over ElasticSearch, write down how to issue the queries in Elastic and jq and check that the results match
Issue 3-5 queries over Parquet data with Spark, write down how to issue the queries in Elastic and jq and check that the results match
Replicate the above queries using ZNG/ZST, write them down, and check that results match

C. Lower-priority tasks:

Figure out how to take a snapshot of an Elastic instance and use it to populate a different Elastic instance (useful for making experiments reproducible)
Network data source setup (pub-sub the zq-sample-data)
IoT data source setup (TSDB)

Create an EBS volume to hold all sample datasets and their transformations:

zq-sample-data (link → z-dataset-sample-data volume)
zeek dataset (link: shared privately → z-dataset-zeek)
suricata dataset (link: shared privately → z-dataset-suricata)
baseball dataset (link → z-dataset-baseball); format lineage:
- csv → zng → ndjson, zst
- csv → parquet
- elastic

#HowTos:

Volume creation:

Use the volume:

Attach the volume to your instance
Run lsblk on the instance to identify the device
Create a directory on the instance (e.g., /data) and mount the device to it;
(More details can be found on EBS doc)

Note: remember to umount the volume (if it is mounted somewhere) before detach on the EC2 console!

Done!

Recommend Projects