openconfig / gribigo Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 24.0 2.46 MB

Go implementation of gRIBI.

License: Apache License 2.0

Go 99.82% Shell 0.17% Dockerfile 0.01%

gribigo's People

Contributors

Stargazers

Watchers

gribigo's Issues

[gRIBI-6] Flush RPC.

Ensure Flush is honoured from the current master.
- Through Get
Ensure that Flush is not honoured from a non-master client.
- Through Get
Ensure that Flush with override is honoured.
Ensure that Flush is applied to a specific network instance.
Ensure that Flush is applied to all network instances.

AFT telemetry should be covered in functional tests.

Update Flush() logic per https://github.com/openconfig/gribi/pull/45

Update server and compliance to use updated response to unsupported redundancy.

openconfig/gribi#11 changes the code returned from FailedPrecondition to Unimplemented. Current compliance expects FailedPrecondition.

[gRIBI-2.1] Additional coverage of compliance tests (routing entries)

[Note: we decided that AFT telemetry is out of scope for these tests, since we'd do this in a wider set of functional tests outside of this repository.]

Compliance test FlushNIUnspecified() is expecting an error which differs from the gribi.proto file comments

From gribi.proto:

  // network_instance specifies the Network Instance that should be flushed. If
  // the client specifies neither name nor all the server should respond with
  // an error specifying the INVALID_ARGUMENT canonical error code, and the
  // UNSPECIFIED_NETWORK_INSTANCE error code.

However, the FlushNIUnspecified() test is expecting codes.FailedPrecondition. I believe it should instead expect codes.InvalidArgument based on what gribi.proto says.

Improve unit test coverage.

Ensure that we are not counting generated code in coverage metrics.
Add simple tests for public APIs to the specified package where they are only implemented in integration tests currently.
Add additional stories for test gaps discovered from the above.

Multiple tests attempt to program isolated next hops which are not used by a next hop group

Our gribi server implementation is unable to ack an isolated next hop. Once a next hop is used by at least one next hop group, we are able to ack that next hop group and the next hop(s) it references. Next hop groups are the smallest entry that we can program and given that, we have chosen not to ack isolated next hops.

But, there are multiple tests that attempt to program isolated next hops:

TestCompliance/Implicit_replace_NH_entry_-_RIB_ACK
TestCompliance/Implicit_replace_NHG_entry_-_RIB_ACK
TestCompliance/Get_for_installed_NH_-_RIB_ACK
TestCompliance/Benchmark_Get_for_next-hops
TestCompliance/Get_for_installed_NH_-_FIB_ACK

If these tests could ensure that each NH is always referenced by at least one NHG before it expects an ack from the server, that would be ideal.

Change current AFTOperation error handling

We should change current AFTOperation error handling per https://github.com/openconfig/gribi/pull/38/files

[gRIBI-4.1] Master election

need fluent api for IPv6Entry()

featureprofile has scenario where gRIBI should inject an IPv6Entry enrty on the DUT. An enhancement of fluent api for the same will be helpful.

Support for Flush RPC.

Add support for gRIBI Flush.

Call Flush between test cases in compliance suite.

Once the Flush RPC is implemented, ensure that we call Flush between compliance test cases to remove any entries and start with an empty RIB. This ensures that there are not test failures on real implementations that do not reinitialise their RIB between transactions when running with PRESERVE persistence.

TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_FIB_ACK fails due to mismatched operation IDs

I am working on implementing my own gRIBI server and I am trying to use your client code and compliance tests to help to verify my work. Now, I am a go neophyte and it is possible that this problem is because I am doing something fundamentally wrong, in which case I would really appreciate if you can point me in the right direction and I will be on my way.

That said, I am trying to run the compliance tests against my implementation using a command line which looks like this:

$ cd cmd/ccli
$ go test . -addr 172.18.0.3:57401 -v 10

Here is a subset of the output I get and the failure is the one I want to focus on here:

--- FAIL: TestCompliance (0.27s)
    --- PASS: TestCompliance/Modify_RPC_connection (0.03s)
    --- PASS: TestCompliance/Modify_RPC_Connection_with_Election_ID (0.02s)
    --- PASS: TestCompliance/Modify_RPC_Connection_with_invalid_persist/redundancy_parameters (0.04s)
    --- PASS: TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_RIB_ACK (0.03s)
    --- FAIL: TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_FIB_ACK (0.04s)
        compliance.go:217: results did not contain a result of value <0 (0 nsec): AFTOperation { ID: 1, Status: FIB_PROGRAMMED }>, got: [<1628623964763337070 (0 nsec): SessionParameterResult: OK ()> <1628623964763337070 (0 nsec): ElectionID: 2> <1628623964763337070 (0 nsec): AFTOperation { ID: 4, Status: FIB_PROGRAMMED }> <1628623964763337070 (0 nsec): AFTOperation { ID: 5, Status: FIB_PROGRAMMED }> <1628623964763337070 (0 nsec): AFTOperation { ID: 6, Status: FIB_PROGRAMMED }>]

What I believe is happening here is that internally, there is an opCount value in the client instance which starts at 0 and is incremented to assign a operation id to the message. The TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_RIB_ACK uses operation IDs 1 through 3. That means that once TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_FIB_ACK runs, the operations are sent with IDs 4 through 6. However, the addIPv4Internal() function is expecting operation IDs 1 through 3. They are hard-coded in the chk.HasResult() statements. The output above also shows that the compliance test is expecting IDs 1 through 3 but my server is sending back 4 through 6.

As a test, I added a method on the client that returned the value of opCount and then changed the chk.HasResult() statements to look for (opCount - 2), (opCount - 1) and opCount instead of 1, 2 and 3. With that change in place, I can get TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_FIB_ACK to pass and the RIB variant also still passes. But I don't think that is a clean fix for the problem and I wouldn't necessarily recommend it (I am a go neophyte after all).

I am making the assumption that the values sent by the client in AFTOperation.id are the same numbers to be returned in AFTResult.id but the proto file seems to say that this is the right behaviour:

message AFTResult {
  // The ID corresponds to the operation ID that was
  // specified in the AFTOperation.
  uint64 id = 1;

Thanks for your help with this!

Allow compliance suites to validate based on AFT telemetry.

Telemetry updates are sent for each AFT change, the compliance tests should be able to leverage this to determine that the update has happened after the relevant gRIBI transaction.

This depends on getting the fluent telemetry library usable in gribigo.

TestCompliance/Election_-_Ensure_that_election_ID_is_not_accepted_in_ALL_PRIMARY_mode does not send an election id

When running this test, this is the first modify request that we receive:

gribi::ModifyRequest params {
}

Note the lack of an election_id inside the modify request. And no further modify request is received by the server with an election_id. So, there is no reason for the server to send an error back to the client because it never receives an election_id from the client that requested all primary mode.

I am not too familiar with the code but I think it may be due the switch on the redundancy mode in GRIBIClient Start function in fluent.go. In that switch, if we are in all primary mode, then the election id is ignored. Only when in the single primary mode is the election id even considered. I wonder if the test infrastructure is preventing this invalid request from being made in the first place.

MODIFY should not fail if applied to a nonexistent entry

In testing, we encountered a case where MODIFY was sent to an Ipv4Entry that didn't exist. For safety purposes, we would like this to succeed (and create the entry) - this would match ADD semantics.

Server implementation should return both RIB_ACK and FIB_ACK in the session of RIB_AND_FIB_ACK

Current server.go implementation returns only FIB_ACK in a RIB_AND_FIB_ACK session.

Indeed we don't need to return both RIB_ACK and FIB_ACK in a RIB_AND_FIB_ACK session, but it does allow so. Different vendor HW&SW arch might choose either way in this case.

Maybe we should consider making our server implementation returns both RIB_ACK and FIB_ACK mainly for the implementation reference and for the benefit of test coverage (e.g. issue #130 )

Ensure compliance with licensing/copyright.

Ensure that all files have licensing and the correct copyright etc.

[gRIBI-4.2] Master Failover

With persistence == DELETE, ensure that entries are removed after a client disconnects.
With persistence == PRESERVE, ensure that entries are not removed after a client disconnects.
- Using Get

AFT telemetry should be tested outside of the compliance tests in the functional tests.

issue with client connection order

see some times the test "TestParamsDifferFromOtherClients" in election.go fails due to client connection order. I see some times client B sends requests modify request first and clientA modify request fails so better to add the client A validation before sending Client B modify request that would avoid random failures due to timing issues .

move below code from line 164 to 157.
'''
clientAErr := awaitTimeout(context.Background(), clientA, t, time.Minute)
if err := clientAErr; err != nil {
t.Fatalf("did not expect error from server in client A, got: %v", err)
}
'''

Compliance tests that randomize AFTOperation order or implicit replace

There are some compliance tests that randomize the AFTOperation order like this one:

gribigo/compliance/compliance.go

Line 537 in 1a335e8

chk.HasResult(t, res,

Or implicit replace:

gribigo/compliance/compliance.go

Line 763 in 1a335e8

chk.HasResult(t, res,

The only specification I could find regarding the order is this:

message ModifyRequest {
  // A group of requests to add/modify/remove a single AFT entry.
  //
  // A gRIBI server :
  //  * Should process AFTOperations per the received order.
  ...
  repeated AFTOperation operation = 1;
}

https://github.com/openconfig/gribi/blob/2aaba323cde619765adcb25aae1855b962a4f667/v1/proto/service/gribi.proto#L62

By the way, each AFTOperation also has an id field, and the wording in ModifyRequest suggests that the operation ordering should be as "received" rather than ordered by this id.

I vaguely recall that there are some assumptions that vendors have made:

Forward referencing is not allowed, i.e. NH needs to be defined before NHG, and NHG defined before IPv4.
Implicit replace is not allowed, i.e. if NH exists, add the same NH should fail.

Can we formalized these assumptions as rules in gribi.proto and remove the compliance tests that violate these rules?

Compliance test AddIPv4ToMultipleNHsMultipleRequests() does not flush the server when done

Most tests start with a defer flushServer() but this function does not. That means that entries created by this test may still exist on the server for the next test. I suspect this is an oversight and should be fixed.

[gRIBI-3.5] ACK types

Compliance tests to cover:

RIB_ACK received when requested by client.
RIB_AND_FIB_ACK received when requested by client.
- When entry can be resolved wholly in gRIBI.
FIB_ACK not sent when system cannot resolve entry.
FIB_ACK received successfully when resolution is recursive inside gRIBI.
- 1-layer recursion
- 2-layer recursion

To be implemented in functional tests:

Successfully receive RIB_ACK when entry cannot be resolved within gRIBI.
FIB_ACK received successfully when resolution is recursive outside of gRIBI.
RIB_AND_FIB_ACK received when requested by client:
- When entry relies on external RIB to resolve the entry.

Gribi client throw `cannot remove pending operation 1, could not dequeue operation 1, unknown operation`

if FIB_BACK feature toggle is turned on, the client will throw cannot remove pending operation 1, could not dequeue operation 1, unknown operation error

reason:
the server side will send both FIB_BACK and RIB_BACK response to client,
at client side, each operation has one unique operation ID, the pending operation will be removed from pending operation queue by checking response's operation ID, normally, the server will return RIB_BACK, and then FIB_BACK, after response with RIB_BACK is received, the operation is removed from pending operation queue, when it comes to response with FIB_BACK, it can not find the operation, so the client throws this error

this is also the reason why client does not return responses with FIB_BACK.

[gRIBI-4.1] Add test for PARAMS_DIFFER_FROM_OTHER_CLIENTS

Connect clientA with election ID of 10. Connect clientB specifying ALL_PRIMARY client redundancy, ensuring that the connection is rejected with PARAMS_DIFFER_FROM_OTHER_CLIENTS

TestCompliance/Add_IPv4_Entry_that_references_a_NHG_in_a_different_network_instance submits operations out of order

When running this test, it sends this operation first:

operation {
  id: 1
  network_instance: "NON-DEFAULT-VRF"
  op: ADD
  ipv4 {
    prefix: "1.1.1.1/32"
    ipv4_entry {
      next_hop_group_network_instance {
        value: "DEFAULT"
      }
      next_hop_group {
        value: 1
      }
    }
  }
  election_id {
    low: 4500
  }
}

Note that it is referencing a next_hop_group in network instance DEFAULT which has not been created yet. This operation fails on our gribi server due to that.

After that, the following operation is sent:

operation {
  id: 2
  op: ADD
  next_hop_group {
    id: 1
    next_hop_group {
      next_hop {
        index: 1
        next_hop {
          weight {
            value: 1
          }
        }
      }
    }
  }
  election_id {
    low: 4500
  }
}

Note that there is no network instance specified here. I believe network instance DEFAULT should be specified.

Finally, the following operation is sent:

operation {
  id: 3
  op: ADD
  next_hop {
    index: 1
    next_hop {
      ip_address {
        value: "2.2.2.2"
      }
    }
  }
  election_id {
    low: 4500
  }
}

Again, no network instance is provided in the modify request operation.

I think that the next hop should be created first in the DEFAULT network instance, followed by the next hop group, also in the DEFAULT network instance and finally, the IPv4 entry should be created.

Consider whether RIB callbacks should be async/channel writes.

Today, we make a callback to the post-change function in the RIB when an operation has completed, this is currently blocking.

As @alshabib rightly points out -- this could cause performance issues. We should consider one of two things:

calling the callback in a goroutine so that it doesn't block.
deprecating the callback approach, and rather just providing a channel that we write these events to that the consumer can listen on

Install RIB entries into SysRIB.

Entries that have been accepted by the gRIBI RIB should be installed into a sysrib.RIB instance by the device package.

Add validation of received NextHopEntry (and associated entries) to ensure validity.

Current implementation will accept an empty next-hop for example, where some fields must be populated (e.g., solely index is not valid).

README for running gRIBIgo compliance test.

Election test TestUnsupportedElectionParams() is expecting the incorrect reason code

In this test, an election id is sent in all primary mode. According to the gribi.proto file:

    // If the server is currently operating in ALL_PRIMARY mode the server
    // should return an error specifying ELECTION_ID_IN_ALL_PRIMARY reason. The
    // status (canonical error code) should be FAILED_PRECONDITION.

But the test is currently expecting fluent.ParamsDifferFromOtherClients. Shouldn't this be fluent.ElectionIDNotAllowed?

Support for fake traffic.

sysrib is able to determine where an input packet would be forwarded based on the packet's destination address. Add support for the compliance tests to be able to create a flow and then hand this to sysrib to determine where it would be forwarded to.

[gRIBI-5.1] Get RPC.

Things to be tested in functional tests.

Ensure that routes that are not injected via gRIBI are not returned via Get.

Support gNMI Set RPC.

Allow the device configuration to be updated during the lifecycle of the device with the Set RPC.

Implement Delete NHG and NH.

Add support for deleting NHG and NHs from the RIB via gRIBI.

openconfig / gribigo Goto Github PK

gribigo's People

Contributors

Stargazers

Watchers

Forkers

gribigo's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs