cisco-ie / pipeline-gnmi Goto Github PK

A Model-Driven Telemetry collector based on the open-source tool pipeline

License: Apache License 2.0

Go 98.23% Dockerfile 0.18% Shell 0.47% Makefile 1.12%

gnmi mdt cisco ios-xr ios-xe nx-os grpc telemetry

pipeline-gnmi's Introduction

pipeline-gnmi

NOTE: For a more recently developed collector with more output flexibility and support, please evaluate usage of the following Telegraf plugins for your use case: cisco_telemetry_mdt and cisco_telemetry_gnmi.

A Model-Driven Telemetry collector based on the open-source tool pipeline including enhancements and bug fixes.

pipeline-gnmi is a Model-Driven Telemetry (MDT) collector based on the open-source tool pipeline which has gNMI support and fixes for maintainability (e.g. Go modules) and compatibility (e.g. Kafka version support). It supports MDT from IOS XE, IOS XR, and NX-OS enabling end-to-end Cisco MDT collection for DIY operators.

The original pipeline README is included here for reference.

Usage

pipeline-gnmi is written in Go and targets Go 1.11+. Windows and MacOS/Darwin support is experimental.

pipeline-gnmi binaries may be downloaded from Releases
Built from source:

git clone https://github.com/cisco-ie/pipeline-gnmi
cd pipeline-gnmi
make build

Acquired via go get github.com/cisco-ie/pipeline-gnmi to be located in $GOPATH/bin

Configuration

pipeline configuration support is maintained and detailed in the original README. Sample configuration is supplied as pipeline.conf.

gNMI Support

This project introduces support for gNMI. gNMI is a standardized and cross-platform protocol for network management and telemetry. gNMI does not require prior sensor path configuration on the target device, merely enabling gRPC/gNMI is enough. Sensor paths are requested by the collector (e.g. pipeline). Subscription type (interval, on-change, target-defined) can be specified per path.

Filtering of retrieved sensor values can be done directly at the input stage through selectors in the configuration file, by defining all the sensor paths that should be stored in a TSDB or forwarded via Kafka. Regular metrics filtering through metrics.json files is ignored and not implemented, due to the lack of user-friendliness of the configuration.

[mygnmirouter]
stage = xport_input
type = gnmi
server = 10.49.234.114:57777

# Sensor Path to subscribe to. No configuration on the device necessary
# Appending an @ with a parameter specifies subscription type:
#   @x where x is a positive number indicates a fixed interval, e.g. @10 -> every 10 seconds
#   @change indicates only changes should be reported
#   omitting @ and parameter will do a target-specific subscriptions (not universally supported)
#
path1 = Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters@10
#path2 = /interfaces/interface/state@change

# Whitelist the actual sensor values we are interested in (1 per line) and drop the rest.
# This replaces metrics-based filtering for gNMI input - which is not implemented.
# Note: Specifying one or more selectors will drop all other sensor values and is applied for all paths.
#select1 = Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters/packets-sent
#select2 = Cisco-IOS-XR-infra-statsd-oper:infra-statistics/interfaces/interface/latest/generic-counters/packets-received

# Suppress redundant messages (minimum hearbeat interval)
# If set and 0 or positive, redundant messages should be suppressed by the server
# If greater than 0, the number of seconds after which a measurement should be sent, even if no change has occured
#heartbeat_interval = 0

tls = false
username = cisco
password = ...

Kafka 2.x Support

This project supports Kafka 2.x by requiring the Kafka version (kafkaversion) to be specified in the config file stage. This is a requirement of the underlying Kafka library and ensures that the library is communicating with the Kafka brokers effectively.

[kafkaconsumer]
topic=mdt
consumergroup=pipeline-gnmi
type=kafka
stage=xport_input
brokers=kafka-host:9092
encoding=gpb
datachanneldepth=1000
kafkaversion=2.1.0

Docker Environment Variables

This project has improved Docker support. The Dockerfile uses multi-stage builds and builds Pipeline from scratch. The configuration file can now be created from environment variables directly, e.g.

PIPELINE_default_id=pipeline
PIPELINE_mygnmirouter_stage=xport_input
PIPELINE_mygnmirouter_type=gnmi

is translated into a pipeline.conf with following contents:

[default]
id = pipeline

[mygnmirouter]
stage = xport_input
type = gnmi

If the special variable _password is used, the value is encrypted using the pipeline RSA key before being written to the password option. Similarly _secret can be used, then the value is read from the file whose name is given as value, encrypted using the pipeline RSA key and then written as password option. If the Pipeline RSA key is not given or does not exist it is created upon creation of the container.

Additionally, existing replays of sensor data can be fed in efficiently using xz-compressed files.

Licensing

pipeline-gnmi is licensed with Apache License, Version 2.0, per pipeline.

Help!

For support, please open a GitHub Issue or email [email protected].

Special Thanks

Chris Cassar for implementing pipeline used by anyone interested in MDT, Steven Barth for gNMI plugin development, and the Cisco teams implementing MDT support in the platforms.

pipeline-gnmi's People

Contributors

Stargazers

Watchers

Forkers

noelbundick nleiva bigevilbeard rrocher cisco-emea-cx-cto

pipeline-gnmi's Issues

Kafka 2.0 CRC errors

Looks like we need to specify the Kafka version for Sarama.
bsm/sarama-cluster#279
Also sarama-cluster appears to be deprecated.

Makefile Go Report Card

Add Go report card to project so we get comprehensive report.

mdt_msg_samples carries >50MB of overhead

mdt_msg_samples/ data contains items like dump.bin which is ~52 MB. We should ideally not have this size of a dump file within the repo, or use git lfs or something along those lines for large test data.

Fix Kafka producer verbosity

Kafka failures are extremely verbose emitting data by default. https://github.com/cisco-ie/pipeline-gnmi/blob/master/xport_kafka.go#L231

Bring your own Proto files

Now that the vendor folder is gone and we are using Go modules, we can import XR YANG proto files from any repo we want to. Still thinking how we could provide this as a functionality to the user, but all we need to do, to bring a different set of proto files is to modify the importing path of codec_gpb.go, xport_grpc_test.go, pipeline.go, etc. to point to a repo other than github.com/cisco/bigmuddy-network-telemetry-proto. Then compile again.

import (
        ...
	telem "github.com/cisco/bigmuddy-network-telemetry-proto/proto_go"
	pdt "github.com/cisco/bigmuddy-network-telemetry-proto/proto_go/old/telemetry"
        ...
)

Go generate could do this for us, but not sure if it something we want to support.

Create makefile help

Create makefile help so it is easy for new and old folks understand the targets and be most productive

Disallow build without Go module support

Now that we have mod enabled, makefile should present message if Go ver < 1.11 and/or no module support is present

Reorganize to suggested standard project layout

https://github.com/golang-standards/project-layout
This will require some refactoring and cleaning up of scoping.

InfluxDB Error - metric being POST'd with missing measurement

It appears that InfluxDB metrics are being set with a missing measurement value.

Debug showing the missing measurement name with the correct values. Shows the gNMI plugin is successfully reading the path and sending the data. Something is failing to parse the path out and create a measurement name.

/tmp # tail -f dump.txt_wkid0
(prec: [ms], consistency: [], retention: [])
	,Producer=10.2.1.3:3333,Target=gnmi /interfaces/interface/state/counters/in-errors=0 1586593538088000000
Server: [http://influxdb:8086], wkid 0, writing 1 points in db: telemetry
(prec: [ms], consistency: [], retention: [])
	,Producer=10.2.1.3:3333,Target=gnmi /interfaces/interface/state/counters/in-octets=0 1586593538088000000
Server: [http://influxdb:8086], wkid 0, writing 1 points in db: telemetry
(prec: [ms], consistency: [], retention: [])
	,Producer=10.2.1.3:3333,Target=gnmi /interfaces/interface/state/counters/in-multicast-pkts=0 1586593538088000000
Server: [http://influxdb:8086], wkid 0, writing 1 points in db: telemetry
(prec: [ms], consistency: [], retention: [])
	,Producer=10.2.1.3:3333,Target=gnmi /interfaces/interface/state/counters/out-discards=0 1586593538088000000
Server: [http://influxdb:8086], wkid 0, writing 1 points in db: telemetry
(prec: [ms], consistency: [], retention: [])

Metrics File:

[
	{
		"basepath" : "openconfig-interfaces:interfaces/interface/state/counters",
		"spec" : {
			"fields" : [
				{"name" : "in-octets"},
				{"name" : "out-octets"},
				{"name" : "in-errors"},
				{"name" : "out-errors"},
				{"name" : "in-discards"},
				{"name" : "out-discards"},
				{"name" : "in-broadcast-pkts"},
				{"name" : "out-broadcast-pkts"},
				{"name" : "in-multicast-pkts"},
                                {"name" : "out-multicast-pkts"},
				{"name" : "in-unicast-pkts"},
                                {"name" : "out-unicast-pkts"}
			]
		}
	}
]

Using the current master branch and the docker-compose file within.

Evaluate pushing [u]int/float64 to output plugins

Per #6 we replicate/customize (was originally a patch) jsonpb which emits uint64 values as strings to instead make them uint64s. It appears this was largely done to maintain correctness with JavaScript alongside some JSON specification language so this is a fine workaround, however if we can easily push that deserialization functionality to the outputs it would probably be a better solution and explicitly acknowledge the intended functionality of jsonpb.

Cleanup Test Output

make test Tests spill out hundreds of lines under ===A=== and ===B===, amongst others. I have no idea what these are but certainly add to the confusion. IMHO tests should output name of test and pass/fail.

Static builds?

Per https://github.com/cisco-ie/pipeline-gnmi#updated-dependencies-and-improved-builds - is there a command or was there a modification to the Makefile which enables this? Nothing jumped out at me.

TestCodecGPBBasic fails

TestCodecGPBBasic fails due to different produced and verification data. Produced data includes data_gpb:<nil> and data_gpbkv:[] whereas the verification data does not include these fields. This is likely due to the samples being compared directly from pure JSON as compared to serializing via Telemetry proto which does contain these fields.

569,571c569
< collection_end_time:1.471794103839e+12 collection_id:111814 collection_start_time:1.471794103827e+12 data_gpb:<nil> data_gpbkv:[] encoding_path:Cisco-IOS-XR-wdsysmon-fd-oper:system-monitoring/cpu-utilization msg_timestamp:1.471794103827e+12 node_id_str:uut subscription_id_str:test]
---
> collection_end_time:1.471794103839e+12 collection_id:111814 collection_start_time:1.471794103827e+12 encoding_path:Cisco-IOS-XR-wdsysmon-fd-oper:system-monitoring/cpu-utilization msg_timestamp:1.471794103827e+12 node_id_str:uut subscription_id_str:test]

Trying to determine which should be correct...

gNMI dialin: Requested encoding "PROTO" not supported

Hi,

I'm trying to get wireless telemetry data from a C9800 (Cisco IOS XE 17.1.1s) into splunk via pipeline-gnmi (recent git clone) and kafka. GRPC Dial-Out is working fine. But I'm unable to get gNMI dialin to work.

Error message in pipeline.log:

time="2020-04-01 15:41:57.742095" level=info msg="gnmi: Connected" name=mygnmirouter server="10.15.20.1:50052" tag=pipeline type=gnmi username=splunk
time="2020-04-01 15:41:57.762555" level=info msg="gnmi: SubscribeClient running" name=mygnmirouter server="10.15.20.1:50052" tag=pipeline type=gnmi username=splunk
time="2020-04-01 15:41:57.764296" level=error msg="gnmi: server terminated sub" error="rpc error: code = Unimplemented desc = Requested encoding \"PROTO\" not supported" name=mygnmirouter server="10.15.20.1:50052" tag=pipeline type=gnmi username=splunk

config:

[mygnmirouter]
tls=false
username=splunk
password=[snipped]
stage=xport_input
type=gnmi
server=10.15.20.1:50052
path1=/interfaces-ios-xe-oper:interfaces@10

Am I doing this wrong? I will be grateful for any help you can provide.

make integration-test cleans instead of stops containers

Currently make clean-containers also removes all images so every integration test requires pulling down the new images. This is cumbersome.

Create x-OS build scripts

Need x-OS build scripts for releases. Per #12

Remove sarama-cluster and use pure sarama or confluent-kafka-go

sarama-cluster has been deprecated.
https://github.com/bsm/sarama-cluster#deprecation-notice

Rethink approach to test data

We have ~50 MB dump.bin and more in mdt_msg_samples/ which really shouldn't be necessary to download for anyone downloading our repo. We should use git-lfs or a different repository or...something which makes the project less heavyweight to download.

Remove bin/pipeline from repository

Releasing the pipeline binary in the repository is pretty ugly right now. It carries a pretty large weight 40-60 MB in size. We should instead compile and post binaries in the Releases tab, and rewrite the git history to not include the pipeline binary.

@sbyx or @repenno do you have any opposition to removing the bin/pipeline binary?

Add Integration Tests to Travis

Enable caching for Travis CI

If there is a way we can enable caching for Travis CI that would be a good idea. https://docs.travis-ci.com/user/caching/
coredns/coredns#2686

make integration-test leaves zookeeper/kafka running

Ideally zookeeper and kafka should be torn down after the integration test finishes, not left to be cleaned up at the next run.

make fails on go vet

make fails during the build process when doing go vet.

Add support for Go modules

We need to migrate to Go modules (go.mod and go.sum). Pipeline is currently using Glide for dependency management which is no longer maintained.

In order to do so, we first need to remove the changes go generate makes to jsonpb as depicted in vendor.patch. EmitUInt64Unquoted is not part of the official protobuf library, so we need to replicate the Marshaler struct of github.com/golang/protobuf/jsonpb with this field on it. "A little copying is better than a little dependency".

I will first attempt to do this, before updating dependencies and bring Go 1.12 support as well.

unable to get data into influxDB

Hello,

i am collecting data from nexus 9000. I am able to send data to kafka without an issue.
Inspector works well too (i see data into the dump file) but i seem to be completely unable to get any data into influx DB.

This is what i get into inspectordump.txt

------- 2019-02-28 21:15:12.755654432 +0000 UTC m=+258.412984901 -------
Summary: GPB(common) Message [172.31.1.10:23105()//Cisco-NX-OS-device:System/procsys-items/sysload-items msg len: 360]
{
    "Source": "172.31.1.10:23105",
    "Telemetry": {
        "node_id_str": "leaf101-N93180YC-EX",
        "subscription_id_str": "1",
        "encoding_path": "/Cisco-NX-OS-device:System/procsys-items/sysload-items",
        "collection_id": 957,
        "collection_start_time": 0,
        "msg_timestamp": 1551388902456,
        "data_gpbkv": [],
        "data_gpb": null,
        "collection_end_time": 0
    },
    "Rows": [
        {
            "Timestamp": 0,
            "Keys": {
                "/Cisco-NX-OS-device:System/procsys-items/sysload-items": "/Cisco-NX-OS-device:System/procsys-items/sysload-items"
            },
            "Content": {
                "": {
                    "sysload-items": {
                        "": {
                            "loadAverage15m": "0.450000",
                            "loadAverage1m": "1.150000",
                            "loadAverage5m": "0.680000",
                            "name": "sysload",
                            "runProc": 1,
                            "totalProc": 360
                        }
                    }
                }
            }
        }
    ]
}

and this is what i have in my metrics.json.

[
        {
                "basepath" : "Cisco-NX-OS-device:System/procsys-items/sysload-items",
                "spec" : {
                        "fields" : [
                                {"name":"loadAverage15m"},
                                {"name":"loadAverage1m"},
                                {"name":"loadAverage5m"},
                                {"name":"name"},
                                {"name":"runProc"},
                                {"name":"totalProc"}
                        ]
                }
        }
]

In this file i used the following basepaths, but either way doesn't make any difference

                "basepath" : "Cisco-NX-OS-device:System/procsys-items/sysload-items",

                "basepath" : "/Cisco-NX-OS-device:System/procsys-items/sysload-items",

here is the conf file section for influxdb

[metrics_influx]
stage=xport_output
type= metrics
file=/etc/pipeline/metrics.json
datachanneldepth=10000
output=influx
influx=http://influxdb:8086
database=telemetry
workers=10
dump=/etc/pipeline/metricsdump.txt
username=client
password=FiCElcS3e0D4HL+bSejk5eFymwrxB2IJVmK7AFgCJVkn9bdJ1RDfRL3diGCEqqjvAY7jn1ux1V9JtpI+PpJRza7KjTUz/8jjapymVIxpoC8alwpxpIIeau41vCiTRCWPC6cwKBvvFTYBYa2TUR3b3TOMyibOEJg9edbAcIRSraFiwzrAhtTq0O2LHMFEnNGiLuzJ/DNPo281xA0oVMQYuyy7wC9AFwCXmZvpk0pwJI9PT2UJ5TVdf0uom4tEQ/ay8YrPXmgCjvjWVp6+eG2eLJBTXHx+hL4+tcLVRz/3stogcQVyxJSrpjn5oLQEZgzJLvWHKjGbjFBChsCVkxPNVrFJH2vri7SUzzWas/4OGXNOZ+lqWXQel+ATA39LPWbjO81+6huVAsj4xFjqHWbEQ8m3NoRJVlR0Nsg9vKBHjaNhtkGV/AZmT6fWVFyeQy8IvEIpb5MOnCQ6rDzdZxgU0LkgkkAl99dMOBkuEdwMbI3vZzd7CCLDz8qALDccFIwA7kszyJFUzKaEf540mqffbWOJOK5tJ667ewarrjQpW+2nbt7HgVZj8kgU1B/cwwxv6qa2QKi/7yH7HN3nC1a8VJ1844Dsx8FG3Equ1n6U2/OeX3Z4ya/H0DazCXa1/fQHSHpNL+uyjroN9JLW5fAHRGySFVq4CdiAJpyF4B7Pw2I=

and finally this is what i see in the logs

time="2019-02-28 21:08:21.244952" level=info msg="Conductor says hello, loading config" config=/etc/pipeline/pipeline.conf debug=false fluentd= logfile=/etc/pipeline/pipeline.dump maxthreads=4 tag
=pipeline version=unspecified
time="2019-02-28 21:08:21.245668" level=info msg="Conductor starting up section" name=conductor section=mykafka stage=xport_output tag=pipeline
time="2019-02-28 21:08:21.245713" level=info msg="Conductor starting up section" name=conductor section=inspector stage=xport_output tag=pipeline
time="2019-02-28 21:08:21.245738" level=info msg="Conductor starting up section" name=conductor section=metrics_influx stage=xport_output tag=pipeline
time="2019-02-28 21:08:21.246676" level=info msg="Metamonitoring: serving pipeline metrics to prometheus" name=default resource=/metrics server=":8989" tag=pipeline
time="2019-02-28 21:08:21.250666" level=info msg="Starting up tap" countonly=false filename=/etc/pipeline/inpesctordump.txt name=inspector streamSpec="&{2 <nil>}" tag=pipeline
time="2019-02-28 21:08:21.265310" level=info msg="setup authentication" authenticator="http://influxdb:8086" name=metrics_influx pem=/etc/pipeline/pipeline_key tag=pipeline username=client
time="2019-02-28 21:08:21.265393" level=info msg="setup metrics collection" basepath="Cisco-NX-OS-device:System/procsys-items/sysload-items" name=metrics_influx tag=pipeline
time="2019-02-28 21:08:21.265911" level=info msg="Conductor starting up section" name=conductor section=grpcdialout stage=xport_input tag=pipeline
time="2019-02-28 21:08:21.267159" level=info msg="Setting up workers" database=telemetry influx="http://influxdb:8086" name=metrics_influx tag=pipeline workers=4 xport_type=influx
time="2019-02-28 21:08:21.267158" level=info msg="gRPC starting block" encap=gpb name=grpcdialout server=":57500" tag=pipeline type="pipeline is SERVER"
time="2019-02-28 21:08:21.267310" level=info msg="gRPC: Start accepting dialout sessions" encap=gpb name=grpcdialout server=":57500" tag=pipeline type="pipeline is SERVER"
time="2019-02-28 21:08:21.317736" level=info msg="kafka producer configured" brokers="[rldv0217.gcsc.att.com:9092]" name=mykafka requiredAcks=0 streamSpec="&{2 <nil>}" tag=pipeline topic=telemetry
time="2019-02-28 21:08:21.549587" level=info msg="gRPC: Receiving dialout stream" encap=gpb name=grpcdialout peer="172.31.1.10:22999" server=":57500" tag=pipeline type="pipeline is SERVER"
time="2019-02-28 21:08:22.554280" level=info msg="gRPC: Receiving dialout stream" encap=gpb name=grpcdialout peer="172.31.1.11:30387" server=":57500" tag=pipeline type="pipeline is SERVER"
time="2019-02-28 21:08:23.669858" level=info msg="gRPC: Receiving dialout stream" encap=gpb name=grpcdialout peer="172.31.1.10:22999" server=":57500" tag=pipeline type="pipeline is SERVER"
time="2019-02-28 21:08:24.675529" level=info msg="gRPC: Receiving dialout stream" encap=gpb name=grpcdialout peer="172.31.1.11:30387" server=":57500" tag=pipeline type="pipeline is SERVER"
time="2019-02-28 21:08:25.792665" level=info msg="gRPC: Receiving dialout stream" encap=gpb name=grpcdialout peer="172.31.1.10:22999" server=":57500" tag=pipeline type="pipeline is SERVER"
time="2019-02-28 21:08:26.796743" level=info msg="gRPC: Receiving dialout stream" encap=gpb name=grpcdialout peer="172.31.1.11:30387" server=":57500" tag=pipeline type="pipeline is SERVER"
time="2019-02-28 21:08:26.915330" level=info msg="gRPC: Receiving dialout stream" encap=gpb name=grpcdialout peer="172.31.1.11:30387" server=":57500" tag=pipeline type="pipeline is SERVER"
time="2019-02-28 21:08:27.913142" level=info msg="gRPC: Receiving dialout stream" encap=gpb name=grpcdialout peer="172.31.1.10:22999" server=":57500" tag=pipeline type="pipeline is SERVER"

Go module usage fails on jsonpb

When not using the vendor/ directory, e.g. with Go modules, we have test failures related to jsonpb.go. This was due to careless merge in #6 and not properly testing with/without Go module usage.

Add Travis-CI support

Add support for Travis-CI for building, packaging and testing

Improve README documentation

We need to prettify the README and bring its contents up to par for OSS.

Clean up Dockerfile

Current Dockerfile is convoluted. We need to move to multi-stage builds to keep the image size under control.

Remove PDT support

Pipeline was built with some support for policy driven telemetry - we should determine if this code is still necessary and how to remove given its deprecation.

Tracking Testing Issues

go test ./... does not succeed or at least after waiting 10 minutes there was no output.

I understand make test performs some prep work for integration test but an often cited Go golden rule of testing is that go test ./... should run unit tests unencumbered right off the bat.

Tests create files in the root directory. We should move them to a test output directory.
make test itself fails, need to dig deeper as to why this is the case.

InfluxDB output casting all uint64 as float64

Learned something new - InfluxDB gates uint64 support behind a build flag per influxdata/influxdb#7801 (comment). We should probably rebuild InfluxDB internally, and add a flag to the output plugin to indicate that we don't want to cast those values.

Create make clean target

We need a make clean target that wipes out logs, binaries, etc.