GithubHelp home page GithubHelp logo

gribigo's People

Contributors

akrentsel avatar alshabib avatar bstoll avatar deep-gajjar5 avatar dependabot[bot] avatar greg-dennis avatar liulk avatar marcushines avatar mingyangcisco avatar mojiiba avatar nflath avatar nhawke avatar robshakir avatar sthesayi avatar wenovus avatar xw-g avatar yushiyushi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gribigo's Issues

[gRIBI-6] Flush RPC.

  • Ensure Flush is honoured from the current master.
    • Through Get
  • Ensure that Flush is not honoured from a non-master client.
    • Through Get
  • Ensure that Flush with override is honoured.
  • Ensure that Flush is applied to a specific network instance.
  • Ensure that Flush is applied to all network instances.

AFT telemetry should be covered in functional tests.

[gRIBI-2.1] Additional coverage of compliance tests (routing entries)

  • Add IPv4->NHG-> multiple NH
    • All in same ModifyRequest
    • In separate ModifyRequest messages
  • Delete IPv4, NHG, NH
  • Add IPv4 -> different NI NHG -> multiple NH
  • Implicit replace of IPv4, NHG, NH.
  • Error cases of removing referenced NHG, NH
  • NHs accepted using combinations of mac_address, ip_address, and interface_ref
  • Support for IPv4Entry metadata
  • Replace fails for non-existing entry.
  • Add/Delete/Add prefix within same ModifyRequest ensure that ACK for all three is sent.

[Note: we decided that AFT telemetry is out of scope for these tests, since we'd do this in a wider set of functional tests outside of this repository.]

Compliance test FlushNIUnspecified() is expecting an error which differs from the gribi.proto file comments

From gribi.proto:

  // network_instance specifies the Network Instance that should be flushed. If
  // the client specifies neither name nor all the server should respond with
  // an error specifying the INVALID_ARGUMENT canonical error code, and the
  // UNSPECIFIED_NETWORK_INSTANCE error code.

However, the FlushNIUnspecified() test is expecting codes.FailedPrecondition. I believe it should instead expect codes.InvalidArgument based on what gribi.proto says.

Improve unit test coverage.

  • Ensure that we are not counting generated code in coverage metrics.
  • Add simple tests for public APIs to the specified package where they are only implemented in integration tests currently.
  • Add additional stories for test gaps discovered from the above.

Multiple tests attempt to program isolated next hops which are not used by a next hop group

Our gribi server implementation is unable to ack an isolated next hop. Once a next hop is used by at least one next hop group, we are able to ack that next hop group and the next hop(s) it references. Next hop groups are the smallest entry that we can program and given that, we have chosen not to ack isolated next hops.

But, there are multiple tests that attempt to program isolated next hops:

  • TestCompliance/Implicit_replace_NH_entry_-_RIB_ACK
  • TestCompliance/Implicit_replace_NHG_entry_-_RIB_ACK
  • TestCompliance/Get_for_installed_NH_-_RIB_ACK
  • TestCompliance/Benchmark_Get_for_next-hops
  • TestCompliance/Get_for_installed_NH_-_FIB_ACK

If these tests could ensure that each NH is always referenced by at least one NHG before it expects an ack from the server, that would be ideal.

[gRIBI-4.1] Master election

  • Connect client B when client A connected:
    • PARAMS_DIFFER_FROM_OTHER_CLIENTS when a new client with mismatched parameters connects.
    • Successful connection at both clients when matching parameters.
      • ACK type
      • Redundancy mode
      • Persistence mode
  • New client is not rejected when they connect with lower election ID, but is not master.
  • Entries do not become inactive when new master with higher election ID connects.
  • ModifyRequest with higher ID is rejected if client has not updated election ID.
  • After election ID update:
    • ModifyRequest is honoured with new ID.
    • ModifyRequest with old ID is not honoured.
  • Client updates election ID to a lower value, ensure new value is ignored.

Call Flush between test cases in compliance suite.

Once the Flush RPC is implemented, ensure that we call Flush between compliance test cases to remove any entries and start with an empty RIB. This ensures that there are not test failures on real implementations that do not reinitialise their RIB between transactions when running with PRESERVE persistence.

TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_FIB_ACK fails due to mismatched operation IDs

I am working on implementing my own gRIBI server and I am trying to use your client code and compliance tests to help to verify my work. Now, I am a go neophyte and it is possible that this problem is because I am doing something fundamentally wrong, in which case I would really appreciate if you can point me in the right direction and I will be on my way.

That said, I am trying to run the compliance tests against my implementation using a command line which looks like this:

$ cd cmd/ccli
$ go test . -addr 172.18.0.3:57401 -v 10

Here is a subset of the output I get and the failure is the one I want to focus on here:

--- FAIL: TestCompliance (0.27s)
    --- PASS: TestCompliance/Modify_RPC_connection (0.03s)
    --- PASS: TestCompliance/Modify_RPC_Connection_with_Election_ID (0.02s)
    --- PASS: TestCompliance/Modify_RPC_Connection_with_invalid_persist/redundancy_parameters (0.04s)
    --- PASS: TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_RIB_ACK (0.03s)
    --- FAIL: TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_FIB_ACK (0.04s)
        compliance.go:217: results did not contain a result of value <0 (0 nsec): AFTOperation { ID: 1, Status: FIB_PROGRAMMED }>, got: [<1628623964763337070 (0 nsec): SessionParameterResult: OK ()> <1628623964763337070 (0 nsec): ElectionID: 2> <1628623964763337070 (0 nsec): AFTOperation { ID: 4, Status: FIB_PROGRAMMED }> <1628623964763337070 (0 nsec): AFTOperation { ID: 5, Status: FIB_PROGRAMMED }> <1628623964763337070 (0 nsec): AFTOperation { ID: 6, Status: FIB_PROGRAMMED }>]

What I believe is happening here is that internally, there is an opCount value in the client instance which starts at 0 and is incremented to assign a operation id to the message. The TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_RIB_ACK uses operation IDs 1 through 3. That means that once TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_FIB_ACK runs, the operations are sent with IDs 4 through 6. However, the addIPv4Internal() function is expecting operation IDs 1 through 3. They are hard-coded in the chk.HasResult() statements. The output above also shows that the compliance test is expecting IDs 1 through 3 but my server is sending back 4 through 6.

As a test, I added a method on the client that returned the value of opCount and then changed the chk.HasResult() statements to look for (opCount - 2), (opCount - 1) and opCount instead of 1, 2 and 3. With that change in place, I can get TestCompliance/Add_IPv4_entry_that_can_be_programmed_on_the_server_-_with_FIB_ACK to pass and the RIB variant also still passes. But I don't think that is a clean fix for the problem and I wouldn't necessarily recommend it (I am a go neophyte after all).

I am making the assumption that the values sent by the client in AFTOperation.id are the same numbers to be returned in AFTResult.id but the proto file seems to say that this is the right behaviour:

message AFTResult {
  // The ID corresponds to the operation ID that was
  // specified in the AFTOperation.
  uint64 id = 1;

Thanks for your help with this!

Allow compliance suites to validate based on AFT telemetry.

Telemetry updates are sent for each AFT change, the compliance tests should be able to leverage this to determine that the update has happened after the relevant gRIBI transaction.

This depends on getting the fluent telemetry library usable in gribigo.

TestCompliance/Election_-_Ensure_that_election_ID_is_not_accepted_in_ALL_PRIMARY_mode does not send an election id

When running this test, this is the first modify request that we receive:

gribi::ModifyRequest params {
}

Note the lack of an election_id inside the modify request. And no further modify request is received by the server with an election_id. So, there is no reason for the server to send an error back to the client because it never receives an election_id from the client that requested all primary mode.

I am not too familiar with the code but I think it may be due the switch on the redundancy mode in GRIBIClient Start function in fluent.go. In that switch, if we are in all primary mode, then the election id is ignored. Only when in the single primary mode is the election id even considered. I wonder if the test infrastructure is preventing this invalid request from being made in the first place.

Server implementation should return both RIB_ACK and FIB_ACK in the session of RIB_AND_FIB_ACK

Current server.go implementation returns only FIB_ACK in a RIB_AND_FIB_ACK session.

Indeed we don't need to return both RIB_ACK and FIB_ACK in a RIB_AND_FIB_ACK session, but it does allow so. Different vendor HW&SW arch might choose either way in this case.

Maybe we should consider making our server implementation returns both RIB_ACK and FIB_ACK mainly for the implementation reference and for the benefit of test coverage (e.g. issue #130 )

[gRIBI-4.2] Master Failover

  • With persistence == DELETE, ensure that entries are removed after a client disconnects.
  • With persistence == PRESERVE, ensure that entries are not removed after a client disconnects.
    • Using Get

AFT telemetry should be tested outside of the compliance tests in the functional tests.

issue with client connection order

see some times the test "TestParamsDifferFromOtherClients" in election.go fails due to client connection order. I see some times client B sends requests modify request first and clientA modify request fails so better to add the client A validation before sending Client B modify request that would avoid random failures due to timing issues .

move below code from line 164 to 157.
'''
clientAErr := awaitTimeout(context.Background(), clientA, t, time.Minute)
if err := clientAErr; err != nil {
t.Fatalf("did not expect error from server in client A, got: %v", err)
}
'''

Compliance tests that randomize AFTOperation order or implicit replace

There are some compliance tests that randomize the AFTOperation order like this one:

chk.HasResult(t, res,

Or implicit replace:

chk.HasResult(t, res,

The only specification I could find regarding the order is this:

message ModifyRequest {
  // A group of requests to add/modify/remove a single AFT entry.
  //
  // A gRIBI server :
  //  * Should process AFTOperations per the received order.
  ...
  repeated AFTOperation operation = 1;
}

https://github.com/openconfig/gribi/blob/2aaba323cde619765adcb25aae1855b962a4f667/v1/proto/service/gribi.proto#L62

By the way, each AFTOperation also has an id field, and the wording in ModifyRequest suggests that the operation ordering should be as "received" rather than ordered by this id.

I vaguely recall that there are some assumptions that vendors have made:

  • Forward referencing is not allowed, i.e. NH needs to be defined before NHG, and NHG defined before IPv4.
  • Implicit replace is not allowed, i.e. if NH exists, add the same NH should fail.

Can we formalized these assumptions as rules in gribi.proto and remove the compliance tests that violate these rules?

[gRIBI-3.5] ACK types

Compliance tests to cover:

  • RIB_ACK received when requested by client.
  • RIB_AND_FIB_ACK received when requested by client.
    • When entry can be resolved wholly in gRIBI.
  • FIB_ACK not sent when system cannot resolve entry.
  • FIB_ACK received successfully when resolution is recursive inside gRIBI.
    • 1-layer recursion
    • 2-layer recursion

To be implemented in functional tests:

  • Successfully receive RIB_ACK when entry cannot be resolved within gRIBI.
  • FIB_ACK received successfully when resolution is recursive outside of gRIBI.
  • RIB_AND_FIB_ACK received when requested by client:
    • When entry relies on external RIB to resolve the entry.

Gribi client throw `cannot remove pending operation 1, could not dequeue operation 1, unknown operation`

if FIB_BACK feature toggle is turned on, the client will throw cannot remove pending operation 1, could not dequeue operation 1, unknown operation error

reason:
the server side will send both FIB_BACK and RIB_BACK response to client,
at client side, each operation has one unique operation ID, the pending operation will be removed from pending operation queue by checking response's operation ID, normally, the server will return RIB_BACK, and then FIB_BACK, after response with RIB_BACK is received, the operation is removed from pending operation queue, when it comes to response with FIB_BACK, it can not find the operation, so the client throws this error

this is also the reason why client does not return responses with FIB_BACK.

TestCompliance/Add_IPv4_Entry_that_references_a_NHG_in_a_different_network_instance submits operations out of order

When running this test, it sends this operation first:

operation {
  id: 1
  network_instance: "NON-DEFAULT-VRF"
  op: ADD
  ipv4 {
    prefix: "1.1.1.1/32"
    ipv4_entry {
      next_hop_group_network_instance {
        value: "DEFAULT"
      }
      next_hop_group {
        value: 1
      }
    }
  }
  election_id {
    low: 4500
  }
}

Note that it is referencing a next_hop_group in network instance DEFAULT which has not been created yet. This operation fails on our gribi server due to that.

After that, the following operation is sent:

operation {
  id: 2
  op: ADD
  next_hop_group {
    id: 1
    next_hop_group {
      next_hop {
        index: 1
        next_hop {
          weight {
            value: 1
          }
        }
      }
    }
  }
  election_id {
    low: 4500
  }
}

Note that there is no network instance specified here. I believe network instance DEFAULT should be specified.

Finally, the following operation is sent:

operation {
  id: 3
  op: ADD
  next_hop {
    index: 1
    next_hop {
      ip_address {
        value: "2.2.2.2"
      }
    }
  }
  election_id {
    low: 4500
  }
}

Again, no network instance is provided in the modify request operation.

I think that the next hop should be created first in the DEFAULT network instance, followed by the next hop group, also in the DEFAULT network instance and finally, the IPv4 entry should be created.

Consider whether RIB callbacks should be async/channel writes.

Today, we make a callback to the post-change function in the RIB when an operation has completed, this is currently blocking.

As @alshabib rightly points out -- this could cause performance issues. We should consider one of two things:

  • calling the callback in a goroutine so that it doesn't block.
  • deprecating the callback approach, and rather just providing a channel that we write these events to that the consumer can listen on

Election test TestUnsupportedElectionParams() is expecting the incorrect reason code

In this test, an election id is sent in all primary mode. According to the gribi.proto file:

    // If the server is currently operating in ALL_PRIMARY mode the server
    // should return an error specifying ELECTION_ID_IN_ALL_PRIMARY reason. The
    // status (canonical error code) should be FAILED_PRECONDITION.

But the test is currently expecting fluent.ParamsDifferFromOtherClients. Shouldn't this be fluent.ElectionIDNotAllowed?

Support for fake traffic.

sysrib is able to determine where an input packet would be forwarded based on the packet's destination address. Add support for the compliance tests to be able to create a flow and then hand this to sysrib to determine where it would be forwarded to.

[gRIBI-5.1] Get RPC.

  • Validate that installed entries are returned via Get.
    • With ACK mode as RIB_ACK
    • With ACK mode as RIB_AND_FIB_ACK
    • In default network instance
    • in non-default network instance
  • Measure latency of Get RPC on specific table sizes (benchmark). [for NHGs, NHs and prefixes]
    • n=100
    • n=1000
  • Ensure that routes that are not possible to install in the FIB are not returned via Get when ACK mode is FIB ACK.
  • Ensure that metadata is returned via Get

Things to be tested in functional tests.

  • Ensure that routes that are not injected via gRIBI are not returned via Get.

Support gNMI Set RPC.

Allow the device configuration to be updated during the lifecycle of the device with the Set RPC.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.