GithubHelp home page GithubHelp logo

bgp / stayrtr Goto Github PK

View Code? Open in Web Editor NEW
89.0 89.0 13.0 8.76 MB

RPKI-To-Router server implementation in Go

License: BSD 3-Clause "New" or "Revised" License

Makefile 1.90% Go 97.17% Dockerfile 0.85% Shell 0.08%
bgp go rpki rtr

stayrtr's People

Contributors

afenioux avatar benjojo avatar cjeker avatar erikrozendaal avatar floatingstatic avatar haraldnordgren avatar hellt avatar int3l avatar jbampton avatar jejenone avatar job avatar lspgn avatar marenamat avatar mellowdrifter avatar natesales avatar netixx avatar netravnen avatar randomthingsandstuff avatar shimmerglass avatar ties avatar vincentbernat avatar waehlisch avatar zhaofengli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stayrtr's Issues

Segfaults when the source HTTP endpoint does not accept connections

./_build/bin/stayrtr -cache http://localhost:12345/
ERRO[0000] Error updating: Get "http://localhost:12345/": dial tcp [::1]:12345: connect: connection refused 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7b0ace]

goroutine 1 [running]:
main.(*state).updateFromNewState(0xc0001b4000)
	github.com/bgp/stayrtr/cmd/stayrtr/stayrtr.go:256 +0x4e
main.main()
	github.com/bgp/stayrtr/cmd/stayrtr/stayrtr.go:587 +0xaf5
[Exit 2]

And also when the input is somehow unexpected:

./_build/bin/stayrtr -cache http://localhost/
INFO[0000] new cache file: Updating sha256 hash  -> 65eaa1d99e3f824a4282a0ef9c752aaeb1ead15d7ec8ef39a11dc5a36f31f90f 
ERRO[0000] Error updating: invalid character '<' looking for beginning of value 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7b0ace]

goroutine 1 [running]:
main.(*state).updateFromNewState(0xc000206000)
	github.com/bgp/stayrtr/cmd/stayrtr/stayrtr.go:256 +0x4e
main.main()
	github.com/bgp/stayrtr/cmd/stayrtr/stayrtr.go:587 +0xaf5
[Exit 2]

A long-lived daemon should just pause and retry.

Please consider to send "Cache Reset" when client Serial Number is too old (CSCvp8228716)

Dear developers,

We're running staryrtr to send ROAs to routers running IOS-XR versions affected by CSCvp82287.

Along with stayrtr, we also run routinator and we have noticed that the effects of CSCvp82287 are way smaller for the sessions established with the latter. And after long debugging, we have discovered that routinator sends Cache Reset when the Serial Number requested by the client is too old. And that that mitigates the inconsistent states describe by CSCvp82287.

Here's an example from a controlled environment with rtr_client and routinator running in a container.

(venv) marco@lilith:~/CSCvp8228713/rpki-rtr-client$ rtr_client -h 127.0.0.1 -p 323 -S 58407 -s 3
2023-01-27-171415: CONNECT localhost.323
+
2023-01-27-171415: NEW SESSION ID 58407
....withdraw(84.32.25.0/24, 207279, None) - failed
withdraw(45.166.128.0/22, 267957, 24) - failed
withdraw(2602:fb26:900::/48, 23470, None) - failed
withdraw(191.7.72.0/21, 53130, 24) - failed
withdraw(2804:141c::/32, 53130, None) - failed
withdraw(187.120.240.0/20, 53130, 24) - failed
withdraw(181.214.170.0/23, 205474, 24) - failed
withdraw(88.216.18.0/24, 50225, None) - failed
withdraw(2804:55cc::/32, 267957, 48) - failed
withdraw(5.105.131.0/24, 204384, None) - failed

2023-01-27-171415: SESSION 58407 NEW SERIAL 3->12     <--- New serial is 12 (routinator sends Cache Response + Data)


(venv) marco@lilith:~/CSCvp8228713/rpki-rtr-client$ rtr_client -h 127.0.0.1 -p 323 -S 58407 -s 2
2023-01-27-171416: CONNECT localhost.323
+
2023-01-27-171416: NEW SESSION ID 58407
.
2023-01-27-171416: SESSION 58407 NEW SERIAL 2->0       <--- New serial is 0 (routinator sends Cache Reset)

Even though i understand is IOS-XR's rather than stayrtr's fault, i believe that sending Cache Reset when serial is too old is a much better behavior for the server. And it also in line with RFC8210

     If the Serial Numbers in the old
      and new sessions are different enough, the cache will respond to
      the router's Serial Query with a Cache Reset, which will solve the
      problem.

Thank you

dev branch does not accept JSON in "AS1234" format

I am running (three instances) of rtrmon to compare multiple JSON endpoints:

  • an reference endpoint for VRPs
  • versus rpki-client or routinator json

Yesterday, when rebasing my changes (#9) on top of dev, I found out that stayrtr no longer accepts the JSON from some of the endpoints.

 go build cmd/rtrmon/rtrmon.go   && ./rtrmon -secondary.host http://[url]:8888/objects/validated -primary.host https://[internal-endpoint]/[path]/roa-prefixes -secondary.refresh 30s -primary.refresh 30s 2>&1 | head -n 10  
time="2021-07-20T10:34:49+02:00" level=info msg="1: Fetching https://ba-apps.prepdev.ripe.net/certification/api/monitoring/roa-prefixes"
time="2021-07-20T10:34:49+02:00" level=info msg="2: Fetching http://alias.student.utwente.nl:8888/objects/validated"
time="2021-07-20T10:34:51+02:00" level=error msg="2: exploration error for {2.0.0.0/12 17 3215 ripe 1626832357} asn: Could not decode ASN: 3215 as part of VRP"
time="2021-07-20T10:34:51+02:00" level=error msg="2: exploration error for {2.9.0.0/17 24 3215 ripe 1626832357} asn: Could not decode ASN: 3215 as part of VRP"
time="2021-07-20T10:34:51+02:00" level=error msg="2: exploration error for {2.9.128.0/17 24 3215 ripe 1626832357} asn: Could not decode ASN: 3215 as part of VRP"

The root cause appears to be a difference in JSON format - which now is :

// internal endpoint
    {
      "asn": "AS0",
      "prefix": "95.214.130.0/24",
      "maxLength": 24
    },
// rpki-client
    {
      "asn": 0,
      "prefix": "95.214.130.0/24",
      "maxLength": 24,
      "ta": "ripe",
      "expires": 1626853914
    },
// routinator
    {
      "asn": "AS0",
      "prefix": "95.214.130.0/24",
      "maxLength": 24,
      "ta": "ripe"
    },

While I do agree that rpki-client matches the syntax from rfc8416 and the others use a legacy, implicitly documented format, I think that breaking compatibility here causes operational issues in mixed environments.

If it helps I'm willing to maintain the parsing code for this json :-).

Stayrtr throws rtrlib.SendableData is *rtrlib.BgpsecKey, not *rtrlib.VRP

Dear developers of stayrtr

Whenever i start stayrtr 8a3a71e0 in my network, i get the error below:

stayrtr_1      | panic: interface conversion: rtrlib.SendableData is *rtrlib.BgpsecKey, not *rtrlib.VRP
stayrtr_1      | 
stayrtr_1      | goroutine 1046 [running]:
stayrtr_1      | github.com/bgp/stayrtr/lib.(*Client).SendSDs.func1(0x1?, 0x0)
stayrtr_1      | 	github.com/bgp/stayrtr/lib/server.go:1059 +0x2f9
stayrtr_1      | sort.partition_func({0xc0031c2e80?, 0xc00733d470?}, 0x0, 0x21f63, 0xc008a77630?)
stayrtr_1      | 	sort/zsortfunc.go:142 +0xaf
stayrtr_1      | sort.pdqsort_func({0xc0031c2e80?, 0xc00733d470?}, 0x0?, 0x62c42?, 0x87d300?)
stayrtr_1      | 	sort/zsortfunc.go:114 +0x20f
stayrtr_1      | sort.pdqsort_func({0xc0031c2e80?, 0xc00733d470?}, 0x7f1e00deed28?, 0x18?, 0xc00006ec00?)
stayrtr_1      | 	sort/zsortfunc.go:121 +0x25a
stayrtr_1      | sort.Slice({0x87d300?, 0xc00733aca8?}, 0x6ec00?)
stayrtr_1      | 	sort/slice.go:26 +0xfa
stayrtr_1      | github.com/bgp/stayrtr/lib.(*Client).SendSDs(0xc00582cd20, 0x4ced, 0x0, {0xc008858000?, 0x62c42, 0x6ec00})
stayrtr_1      | 	github.com/bgp/stayrtr/lib/server.go:1035 +0xb8
stayrtr_1      | github.com/bgp/stayrtr/lib.(*DefaultRTREventHandler).RequestCache(0xc0001a4da0, 0xc00582cd20)
stayrtr_1      | 	github.com/bgp/stayrtr/lib/server.go:84 +0x109
stayrtr_1      | github.com/bgp/stayrtr/lib.(*Server).RequestCache(0xc00a7ce200?, 0x886e60?)
stayrtr_1      | 	github.com/bgp/stayrtr/lib/server.go:520 +0x2b
stayrtr_1      | github.com/bgp/stayrtr/lib.(*Client).passSimpleHandler(0xc00a7cc6f0?, {0x9e4668?, 0xc00733ea3c?})
stayrtr_1      | 	github.com/bgp/stayrtr/lib/server.go:802 +0x71
stayrtr_1      | github.com/bgp/stayrtr/lib.(*Client).Start(0xc00582cd20)
stayrtr_1      | 	github.com/bgp/stayrtr/lib/server.go:877 +0x377
stayrtr_1      | created by github.com/bgp/stayrtr/lib.(*Server).acceptClientTCP
stayrtr_1      | 	github.com/bgp/stayrtr/lib/server.go:554 +0x276

Here's how i am launching it:

  stayrtr:
    image: rpki/stayrtr:8a3a71e0
    depends_on:
      - rpki-client
    volumes:
     - "rpki-client-output:/var/lib/rpki-client/"
    command:
      - -cache
      - /var/lib/rpki-client/json
      - -checktime=false
      - -bind
      - 0.0.0.0:323
    expose:
      - 323
    ports:
      - 323:323

I believe it's worth to note that I have also tried with rpki/stayrtr:v0.5.0 with similar results and that everything was working with rpki/stayrtr:ad3ed83a .

Could you please look into it?

Thank you

Regards

Expose better file fetch metrics

Breakout of #53


have basic metrics for http behaviour. We have part of this, but last succesful response for url/response size/duration/status code should be tracked. And some metrics can be moved: RefreshStatusCode etc could be tracked from the http util.

Do not ignore unknown/leftover arguments

As a user I expect stayrtr/rtrmon/rtrdump to exit when there are leftover positional arguments.

Steps to reproduce:

# docker but also applies with the binaries
# note the missing dash before primary.host
$ docker run --rm rpki/rtrmon primary.host https://console.rpki-client.org/vrps.json
time="2021-08-10T14:38:28Z" level=info msg="1: Connecting with tcp to rtr.rpki.cloudflare.com:8282"
time="2021-08-10T14:38:28Z" level=info msg="2: Fetching https://rpki.cloudflare.com/rpki.json"
time="2021-08-10T14:38:29Z" level=info msg="1: Received: PDU Cache Response v1 (session: 45143)"
time="2021-08-10T14:38:29Z" level=info msg="Worker 2 finished: comparison"

The same happens with unknown flags to stayrtr (have not tested rtrdump)

Expected behaviour:

Program exits because there is an unused positional argument. Instead of connecting to a different host and ignoring the argument.

Duplicate VRP served out

Running StayRTR v0.5.0 and noticed a resetting RTR session today:

27076 2023/02/28 20:03:03.344 CET MINOR: RPKI #2001 management RPKI
"Rpki Session state on x.x.x.x changed to down due to fatalErrorCode"

27075 2023/02/28 20:03:03.344 CET CRITICAL: LOGGER #2002 Base A:RPKI:UNUSUAL_ERROR
"Slot A: rpkiValidatePrefix: x.x.x.x:Duplicate V6 prefix 2a02:88d:4004:: prefix len 48"

The output of rpki-client shows one entry for the mentioned prefix in the json file. When issuing a rtrdump (./rtrdump -file debug.json -connect 127.0.0.1:3323) towards StayRTR we can see a duplicate entry in the debug.json file:

{"prefix":"2a02:88d:4004::/48","maxLength":48,"asn":48695},{"prefix":"2a02:88d:4004::/48","maxLength":48,"asn":48695}

We have not been able to reproduce this issue so far.. :)

deb/rpm packages

It would be great to include production builds for linux as deb/rpm packages. I'd be happy to open a PR for this with goreleaser if that works for the maintainers. (This could also address #19 for docker builds)

Fix the Docker Hub container thingy

Yeah, its been broken since we forked. I don't have a hub account and should probably think about getting one or something.

I want this done before we release the first (real) version... so let's see if we get it done within 5 versions :D

Use compression for transfer of JSON

The file with validated objects is growing and starting. Depending on the output style, I see 43MB (current rpki-client output), 57MB (jq or new rpki-client builds, one attribute per line, no colour, indented), while compact JSON is ~37M (jq -c).

It compresses pretty well.

Because the client trusts the server (w.r.t. content) I do not think gzip bombs are an issue. So, as a user, I want the HTTP client should use HTTP compression when loading the prefixes.

Multiple target support for rtrmon

As a user I want to monitor between multiple pairs of sources with one rtrmon instance so that I do not need to run n rtrmon instances to compare n rps versus a source of truth.

Consider this situation, where you have:

  • one source of truth
  • two rpki-client instances
  • two routinator instances

If you want to monitor that all are in sync, right now, I would need to run four rtrmon instances. Running a single one would be easier (and reduce the load on the source of truth endpoint, receiving 4x less requests)

Handle update failure, retry more often

in routineUpdate(), sometimes an update can fail. But if it does, the loop will simply wait for the refresh interval (default 1 hour). Against OctoRPKI, this was seen owing to some date parsing thing that cleared up shortly after.

If an update fails, make the refresh interval short to try again soon.

Failing to run rtr

After recent OS package updates, stayrtr does not appear to be working as expected.

OS: Centos 7.9.2009 (yes.. I know it's old)
stayrtr: 0.3.1-1

Running this manually

# stayrtr -bind :8323 -cache /var/lib/rpki-client/json -checktime=false -metrics.addr :8081
INFO[0000] new cache file: Updating sha256 hash  -> 8838fea941c8c2c57e69a644c231b5f7cff50b2909752a8d83f78f00b3cefad1 
INFO[0001] New update (385902 uniques, 385902 total prefixes). 
# curl http://localhost:8081/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.6371e-05
go_gc_duration_seconds{quantile="0.25"} 2.6995e-05
go_gc_duration_seconds{quantile="0.5"} 4.5386e-05
go_gc_duration_seconds{quantile="0.75"} 7.1309e-05
go_gc_duration_seconds{quantile="1"} 0.000138547
go_gc_duration_seconds_sum 0.001206487
go_gc_duration_seconds_count 24
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 9
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.17.3"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.40855224e+08
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 1.123946376e+09
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 4259
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 1.0851318e+07
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0.0009512647985787957
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 1.7992664e+07
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 1.40855224e+08
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 2.5690112e+08
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.45391616e+08
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 1.911108e+06
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 1.65691392e+08
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 4.02292736e+08
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.6695937986341834e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 1.2762426e+07
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 2400
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 625600
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 1.572864e+06
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 2.80704016e+08
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 666341
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 360448
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 360448
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 4.22905696e+08
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 5
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 4.11
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 10
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.51097088e+08
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.66959366395e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.079205888e+09
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 14
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
]# 
# rtrdump -connect 127.0.0.1:8323 -file debug.json
INFO[0000] Connecting with plain to 127.0.0.1:8323      
FATA[0000] dial tcp 127.0.0.1:8323: connect: connection refused 

No firewall running.

The same json file from rpki-client works fine on another box.

Add released version as a tag for the released docker image

Hi all,
currently container images are tagged with commits, it is a good practice to have tags matching the released version of the package. This will make it easier or consumers to pin stable releases in their labs/infras.

PS. It is a bit sad to see goreleaser decommissioned as it is very powerful and low-maintenance release swiss knife

add serial back to json output

Hello @job

in 38d058e some metadata outputs (Generated, Valid and Serial) were removed from rtrdump JSON output.

I think rtrdump should retain this functionality for machine based troubleshooting and monitoring purposes. The RTR serial number can otherwise only be extracted by running rtrdump in debug mode.

My use-case is that I'm monitoring serial increase over time with rtrdump and alert when there is a hung rtr server: https://github.com/lukastribus/rtrcheck/blob/d072488ab9e0f9bf9afe16f62dcd5c07a311ac1e/rtrcheck#L101

Tracked in lukastribus/rtrcheck#1

I realize there is rtrmon but it's use case seem to be quite different:

  • checkrtr is a shell script that can be dropped into any monitoring systems compatible with nagios plugins and will provide monitoring for availability and hung/stuck rtr server. It requires the user to know how to access the RTR endpoint and a sufficiently large check interval, but doesn't require any other inputs from the user.

  • rtrmon collects data from RTR endpoints, compares the data and provides the data for external consumption. It does not make it's own decision about whether a situation is good or bad, that is out of scope.

I don't see a lot of drawbacks to keep the rtr serial in the metadata json output of rtrdump.

Thank you,
Lukas

Ensure session ID uniqueness

After the patch of #84 I realised that while the default is a random session ID, the argument that allows you to set the session ID causes potential indeterminate behaviour.

I can imagine cases where you want to set the session ID for debugging or development - but this is a nice footcannon.

https://www.rfc-editor.org/rfc/rfc8210#section-5.1

Should a cache erroneously reuse a Session ID so that a router
does not realize that the session has changed (old Session ID and
new Session ID have the same numeric value), the router may become
confused as to the content of the cache. The time it takes the
router to discover that it is confused will depend on whether the
Serial Numbers are also reused. If the Serial Numbers in the old
and new sessions are different enough, the cache will respond to
the router's Serial Query with a Cache Reset, which will solve the
problem. If, however, the Serial Numbers are close, the cache may
respond with a Cache Response, which may not be enough to bring
the router into sync. In such cases, it's likely but not certain
that the router will detect some discrepancy between the state
that the cache expects and its own state. For example, the Cache
Response may tell the router to drop a record which the router
does not hold or may tell the router to add a record which the
router already has. In such cases, a router will detect the error
and reset the session. The one case in which the router may stay
out of sync is when nothing in the Cache Response contradicts any
data currently held by the router.

Unable to create Debian Package

Hi

I followed the process to build a Debian Package of StayRTR and its not been successful.

I am using go version go1.20.3

:~/stayrtr$ sudo docker-compose -f docker-compose-pkg.yml up
Starting stayrtr_packager_1 ... done
Attaching to stayrtr_packager_1
packager_1  | fatal: detected dubious ownership in repository at '/work'
packager_1  | To add an exception for this directory, call:
packager_1  | 
packager_1  | 	git config --global --add safe.directory /work
packager_1  | mkdir -p dist/
packager_1  | go build -trimpath -ldflags '-X main.version= -X main.buildinfos=(2023-04-14T09:43:13+0000)' -o dist/stayrtr--linux-x86_64 cmd/stayrtr/stayrtr.go
packager_1  | # golang.org/x/crypto/ssh
packager_1  | /root/go/pkg/mod/golang.org/x/[email protected]/ssh/cipher.go:499:13: undefined: io.Discard
packager_1  | /root/go/pkg/mod/golang.org/x/[email protected]/ssh/session.go:508:14: undefined: io.Discard
packager_1  | /root/go/pkg/mod/golang.org/x/[email protected]/ssh/session.go:521:14: undefined: io.Discard
packager_1  | note: module requires Go 1.17
packager_1  | # github.com/bgp/stayrtr/utils
packager_1  | utils/utils.go:167:15: undefined: io.ReadAll
packager_1  | note: module requires Go 1.17
packager_1  | # golang.org/x/sys/unix
packager_1  | /root/go/pkg/mod/golang.org/x/[email protected]/unix/syscall.go:83:16: undefined: unsafe.Slice
packager_1  | /root/go/pkg/mod/golang.org/x/[email protected]/unix/syscall_linux.go:2271:9: undefined: unsafe.Slice
packager_1  | /root/go/pkg/mod/golang.org/x/[email protected]/unix/syscall_unix.go:118:7: undefined: unsafe.Slice
packager_1  | /root/go/pkg/mod/golang.org/x/[email protected]/unix/sysvshm_unix.go:33:7: undefined: unsafe.Slice
packager_1  | note: module requires Go 1.17
packager_1  | make: *** [Makefile:54: build-stayrtr] Error 2
stayrtr_packager_1 exited with code 2

I have followed the instructions to do

packager_1  | 	git config --global --add safe.directory /work
packager_1  | mkdir -p dist/
packager_1  | go build -trimpath -ldflags '-X main.version= -X main.buildinfos=(2023-04-14T09:43:13+0000)' -o dist/stayrtr--linux-x86_64 cmd/stayrtr/stayrtr.go

but this has not worked. I have also downgraded to Go 1.17 and this still does the same.

Any chance please you can point out if I am doing something wrong or is this a genuine issue ?

stayrtr-0.4.0-1.x86_64.rpm install failing on centos7.9

Hi,

The new portable version file stayrtr-0.4.0-1.x86_64.rpm is failing to install on two machines.

[root@sydnetdev03 pmawson]# rpm -ivh stayrtr-0.4.0-1.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:stayrtr-0.4.0-1 ################################# [100%]
error: unpacking of archive failed on file /dist/rtrdump-v0.4.0-linux-x86_64;63e0483a: cpio: link failed - No such file or directory
error: stayrtr-0.4.0-1.x86_64: install failed
[root@sydnetdev03 pmawson]# cd /dist/
[root@sydnetdev03 dist]# ls
stayrtr_0.4.0_amd64.deb
[root@sydnetdev03 dist]#

]# rpm2cpio stayrtr-0.4.0-1.x86_64.rpm | cpio -t
./dist/stayrtr_0.4.0_amd64.deb
./etc/default/stayrtr
./lib/systemd/system/stayrtr.service
./usr/share/stayrtr/.keep
./dist/rtrdump-v0.4.0-linux-x86_64
./usr/bin/rtrdump
./dist/rtrmon-v0.4.0-linux-x86_64
./usr/bin/rtrmon
./dist/stayrtr-v0.4.0-linux-x86_64
./usr/bin/stayrtr
94919 blocks

Centos 7.9, old version (0.3.0-1) works fine

docker-compose -f docker-compose-pkg.yml up - broke for me - Golang version and fpm options

I manually did this... Manually set STAYRTR_VERSION and removed --package dist/ in Makefile..

Using Pre-release Tag 0.3.0

vi Makefile
STAYRTR_VERSION := 0.3.0
REPLACE --package dist/ with --maintainer "Human Fixes"

debian bullseye

docker run -it --rm --name stayrtr_ruby -v "$PWD":/work ruby /bin/bash

apt-get update && apt-get install -y git make rpm golang
&& gem install fpm && cd work && mkdir -p dist

make build-rtrmon
make build-rtrdump
make build-stayrtr
make package-rpm-stayrtr

mkdir -p dist/
fpm -s dir -t rpm -n stayrtr -v 0.3.0
--description "StayRTR: a RPKI-to-Router server"
--url "https://github.com/bgp/stayrtr"
--architecture x86_64
--license "BSD-3"
--maintainer "Human Fixes"
dist/stayrtr-0.3.0-linux-x86_64=/usr/bin/stayrtr
package/stayrtr.service=/lib/systemd/system/stayrtr.service
package/stayrtr.env=/etc/default/stayrtr
dist/rtrdump-0.3.0-linux-x86_64=/usr/bin/rtrdump
dist/rtrmon-0.3.0-linux-x86_64=/usr/bin/rtrmon

Master wanted GoLang 1.16+ not in Ruby

git clone https://github.com/bgp/stayrtr.git

docker pull ubuntu:22.04
docker run -it --rm --name stayrtr_ubuntu -v "$PWD":/work ubuntu:22.04 /bin/bash
apt-get update && apt-get install -y git make rpm golang ruby
&& gem install fpm && cd work && mkdir -p dist

vi Makefile
STAYRTR_VERSION := 0.3.1
REPLACE --package dist/ with --maintainer "Human Fixes"

make build-rtrmon
make build-rtrdump
make build-stayrtr
make package-rpm-stayrtr

mkdir -p dist/
fpm -s dir -t rpm -n stayrtr -v 0.3.1
--description "StayRTR: a RPKI-to-Router server"
--url "https://github.com/bgp/stayrtr"
--architecture x86_64
--license "BSD-3"
--maintainer "Human Fixes"
dist/stayrtr-0.3.1-linux-x86_64=/usr/bin/stayrtr
package/stayrtr.service=/lib/systemd/system/stayrtr.service
package/stayrtr.env=/etc/default/stayrtr
dist/rtrdump-0.3.1-linux-x86_64=/usr/bin/rtrdump
dist/rtrmon-0.3.1-linux-x86_64=/usr/bin/rtrmon

Installed OK...
rpm2cpio stayrtr-0.3.1-1.x86_64.rpm | cpio -imdv

https://fpm.readthedocs.io/en/latest/getting-started.html

Does rtrdump actually downgrade the version?

We’re having an issue in Routinator (NLnetLabs/routinator#950) where rtrdump fails to produce any output when not started with -rtr.version 1. It does print “Downgrading to version 1” but then seems to just close the connection. I did test with the fixed RTR server code that reports the expected version in the error PDU.

Can you confirm that this indeed because rtrdump doesn’t retry with the new version or is there something that Routinator does wrong?

serial is updated but FetchFile returns 304 for objects and slurm is local and not changed

Hello,

I added the serial metric in #113 and then I saw that it was continuously increasing.

In the logs, I see that the objects are not updated:

time="2024-02-27T14:53:15Z" level=info msg="HTTP 304 Not modified for http://rpki-validator-rpki-client/objects/validated"
time="2024-02-27T14:53:16Z" level=info msg="New update (517256 uniques, 517256 total prefixes, 51 vaps, 2 router keys)."

However, I also see that a new "update" is triggered.

I have a static slurm.json file provided as a local path in the container (via -slurm /slurm.json option), and it's never updated.

I think the serial should only be updated if the vrps are updated or the slurm is changed, and that seems to be what the code want's to do as well:

// Only process the first time after there is either a cache or SLURM
// update.
if cacheUpdated || slurmNotPresentOrUpdated {

So I think there is unexpected behavior in :

slurmNotPresentOrUpdated, err = s.updateSlurm(slurmFile)

Maybe the FetchConfig could maintain a hash of the file (when it's a local file) to see if it has been modified (emulate etag behavior for local files) ?
Or maybe it should be in the state object ?

I you provide guidance I am happy to implement :)

Add proper support for RFC 8210 (BGPSEC)

I think RFC 8210 section 6 was implemented, but Router Keys are not yet picked up from the JSON and converted into RTR PDUs (Section 5.10). An example Router Key is available under the RIPE TA. The pubkey field contains the SPKI in base64 encoded form.

-version doesn't expose version

/usr/bin/stayrtr -version
StayRTR

I know the package has a version number, but it would be nice to have this reflected in the version string of the build.

Use inotify to reload validated payload JSON file

To supplement periodically checking whether the JSON file with the validated ROAs, ASPAs & BGPsec data changed, it might be worth implementing something using inotify to get fresh data into the pipeline faster

CGO build issue

The cgo tool is enabled by default for native builds on systems where it is expected to work. It is disabled by default when cross-compiling. You can control this by setting the CGO_ENABLED environment variable when running the go tool: set it to 1 to enable the use of cgo, and to 0 to disable it. The go tool will set the build constraint "cgo" if cgo is enabled. The special import "C" implies the "cgo" build constraint, as though the file also said "// +build cgo". Therefore, if cgo is disabled, files that import "C" will not be built by the go tool. (For more about build constraints see https://golang.org/pkg/go/build/#hdr-Build_Constraints).

https://github.com/bgp/stayrtr/runs/4011014697?check_suite_focus=true

Basically, some libs have // +build cgo and the default is to do it. Stuff should work in straight Go. Go make the makefile set CGO_ENABLED=0 for CI/docker and test against that.

Load testing tool?

          > > I would like it if there was a tool that could easily emulate a high number of connections,

I assume you are saying this because your day-to-day usecase is a StayRTR with lots of client connections.

Can you give a ballpark on how many connections you are thinking?

My day job (RIPE NCC) does not involve running an rtr daemon. However we do run rtrmon for end to end monitoring between CA systemen and what is visible for RPs.

I have seen a screenshot of a dashboard of a rtr server with ~500 clients. However I can imagine those parties would not like to comment in public about the size of their networks.

Originally posted by @ties in #58 (comment)

Release management

Would like to discuss release management for the project.

Currently we use simple incrementing release, but I'd like to have the ability to qualify releases against our test suite and tag them appropriately. Not every release should go through full cycle IMO.

Open to thoughts, etc. If we don't come up with some consensus, I will simply formulate what I think works best and move on by ~Late October, per my current testing plans.

Error Response message not sent

There is a race in StayRTR that results in missing Error Response PDUs because the RTR connection is closed before the message is sent.

I noticed this while working on RTR ASPA support. Since my OpenBGPD opens the connection with version 2 an Error Response with Code 4 should be sent by StayRTR but most of the time the connection is closed before this happens.

Unexpected Exchange:

rtr stayRTR: RTR RetryTimer triggered
rtr stayRTR: state change idle -> idle, reason: connection open
rtr stayRTR: state change idle -> closed, reason: connection closed

Expected exchange which happens from time to time:

rtr stayRTR: RTR RetryTimer triggered
rtr stayRTR: state change idle -> idle, reason: connection open
rtr stayRTR: received error: Unsupported Protocol Version: Bad protocol version
rtr stayRTR: state change idle -> closed, reason: connection closed with reset

Detect BGPsec Router Key corruption in JSON input

Although the SKI field in BGPSec Router Keys appears to be redundant, its presence can perhaps be used to detect data corruption in the pipeline.

Given the following example:

"bgpsec_keys": [
  { "asn": 15562, "ski": "5D4250E2D81D4448D8A29EFCE91D29FF075EC9E2", "pubkey": "MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEgFcjQ/g//LAQerAH2Mpp+GucoDAGBbhIqD33wNPsXxnAGb+mtZ7XQrVO9DQ6UlAShtig5+QfEKpTtFgiqfiAFQ==", "ta": "ripe", "expires": 1699105676 }
]

The SKI can be confirmed by calculating the SHA-1 hash of the BIT STRING present in the base64-encoded DER-encoded SPKI.

$ echo MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEgFcjQ/g//LAQerAH2Mpp+GucoDAGBbhIqD33wNPsXxnAGb+mtZ7XQrVO9DQ6UlAShtig5+QfEKpTtFgiqfiAFQ== \
  | base64 -d \
  | dd bs=1 skip=26 2>/dev/zero # skip the ASN1 tags \
  | openssl sha1
(stdin)= 5d4250e2d81d4448d8a29efce91d29ff075ec9e2

Perhaps it is robust behavior to log a warning and ignore the Router Key entry if there is a mismatch between the calculated SKI and the listed SKI?

Add syslog flag to log to syslog

Greetings.

Grateful if the code can be updated to send logs to Syslog rather than to stdout, in cases where no logging options have been specified upon start-up of StayRTR. Thanks.

Mark.

stayrtr and Juniper 19.4

Hi team,

We are having issues with stay-rtr and juniper 19.4 router. We are seeing lots of retransmission and VRP are not getting update by juniper routers. We are also seeing on wireshark, "tcp window full" message on the capture. There is no issue with Cisco router running 7.7.x

show validation session    
Session                                  State   Flaps     Uptime #IPv4/IPv6 records
xx.xx.xx.xx                           Connect     4            23/5 <--- Stay-rtr
xx.xx.xx.yy                       Up          0 4d 06:26:23 343532/187642 <--- routinator

regards,
Skanda

exponential backoff on fetch errors

A breakout of #53


My view is that since the default refresh is likely to be quite large, this is not as big of a issue from a load point of view. But it is a risk for "missing" a update, if your JSON endpoint is down at the time of the update, you will miss it for the next hour. We should likely try and fix that.

Interval of update loop varies and can slow down on slow backend

As stayrtr operator I want stayrtr to keep fetching updates if the backend system is slow or not responsive.

If I want updates every 10 minutes, and a update takes 5 minutes, I want the next update to run 10 minutes after the previous one started. Not 15 minutes after (10 minutes after the previous finished).

Context

When running stayrtr from a slow connection (4G was not cooperating) I noticed that the update loop does not have a set interval but has a set delay. If the response of SLURM or the JSON are slow the loop takes (much) longer.

Root cause

Handling slow responses is a hard problem. It ends up being a tradeoff between liveliness of the whole system or getting all information.

For example, in my rpki-client wrapped I found that some repositories were so slow that they prevented me from updating on time. I decided to add a utility to timeout/abort fetching from slow repos. There I decided finishing an update was more important than having all information.

Desired behaviour

first of all:

  • exponential backoff on errors
  • have basic metrics for http behaviour. We have part of this, but last succesful response for url/response size/duration/status code should be tracked. And some metrics can be moved: RefreshStatusCode etc could be tracked from the http util.
  • make both updates (slurm + vrp-json) asynchronous, they can be performed in parallel.

then:

  • abort connection if retrieving the response takes longer than [timelimit] to send the response
  • schedule updates at set interval: "a update happens every interval". Not "interval after the previous update finishes"

Evict stale VRPs if (buildtime+24h < now() || expired)

The following snippet prevents us from loading a stale VRP file: https://github.com/bgp/stayrtr/blob/master/cmd/stayrtr/stayrtr.go#L269-L278

However, this check does not clear the previous state. The check helps prevent loading new stale data, but does not guard against VRPs in the current cache which became stale.

Two checks should probably be added to help prevent routing based on stale data:

  • All VRPs should be removed if metdata.buildtime + 24 hours lays in the past
  • Individual VRPs should be removed if their expires moment has been reached

stayrtr services crashes: panic: interface conversion: rtrlib.SendableData is *rtrlib.VAP, not *rtrlib.VRP

Running stayrtr 0.5.0-1 on Debian unstable (the latest versions available as I write this) we observe crashes of stayrtr:

root@rpki1:~# stayrtr -bind :8323 -checktime=false -cache /var/lib/rpki-client/json
INFO[0000] new cache file: Updating sha256 hash  -> 20b0ad2a81664fd9428005e867012272b54fd10d0014d654696b53aced35f578
INFO[0003] New update (409561 uniques, 409561 total prefixes).
INFO[0004] Updated added, new serial 0
INFO[0004] StayRTR Server started (sessionID:28188, refresh:3600, retry:600, expire:7200)
INFO[0012] Accepted tcp connection from 172.17.8.128:64739 (1/0)
INFO[0012] Accepted tcp connection from 172.17.8.27:54068 (2/0)
INFO[0012] Accepted tcp connection from 172.17.3.72:51164 (3/0)
INFO[0012] Accepted tcp connection from 172.17.3.76:55694 (4/0)
INFO[0605] File /var/lib/rpki-client/json is identical to the previous version
INFO[0606] New update to old state (409561 uniques, 409561 total prefixes). (old 409574 - new 409561)
INFO[0610] Updated added, new serial 1
INFO[1205] new cache file: Updating sha256 hash 20b0ad2a81664fd9428005e867012272b54fd10d0014d654696b53aced35f578 -> 560962ca54dd0808e2cc4192d9133a51bad03ed5dd1928baa6b89c2b606f9bfe
INFO[1208] New update (409560 uniques, 409560 total prefixes).
INFO[1212] Updated added, new serial 2
INFO[1804] new cache file: Updating sha256 hash 560962ca54dd0808e2cc4192d9133a51bad03ed5dd1928baa6b89c2b606f9bfe -> 39f3616a17e11dedea5a59c73e768152442df0b8ad3b9c12352e8d25b558e241
INFO[1807] New update (409563 uniques, 409563 total prefixes).
INFO[1810] Updated added, new serial 3
panic: interface conversion: rtrlib.SendableData is *rtrlib.VAP, not *rtrlib.VRP

goroutine 36 [running]:
github.com/bgp/stayrtr/lib.(*Client).SendSDs.func1(0x1e160?, 0x0)
	github.com/bgp/stayrtr/lib/server.go:1059 +0x305
sort.partition_func({0xc00855fe80?, 0xc00007e150?}, 0x0, 0x3c2c2, 0x9587a6?)
	sort/zsortfunc.go:142 +0xaf
sort.pdqsort_func({0xc00005de80?, 0xc00007e150?}, 0xc000012060?, 0x0?, 0x844274?)
	sort/zsortfunc.go:114 +0x254
sort.Slice({0x8a32e0, 0xc000012060}, 0x6ec00?)
	sort/slice.go:23 +0x97
github.com/bgp/stayrtr/lib.(*Client).SendSDs(0xc000122140, 0x6e1c, 0x3, {0xc00caa4000?, 0x64327, 0x6ec00})
	github.com/bgp/stayrtr/lib/server.go:1035 +0xaa
github.com/bgp/stayrtr/lib.(*DefaultRTREventHandler).RequestCache(0xc0000c2ec0, 0xc000122140)
	github.com/bgp/stayrtr/lib/server.go:84 +0x109
github.com/bgp/stayrtr/lib.(*Server).RequestCache(0xc0142482a0?, 0x8ad2c0?)
	github.com/bgp/stayrtr/lib/server.go:520 +0x2b
github.com/bgp/stayrtr/lib.(*Client).passSimpleHandler(0xc0000b6450?, {0xa265c8?, 0xc00f75a06c?})
	github.com/bgp/stayrtr/lib/server.go:802 +0x91
github.com/bgp/stayrtr/lib.(*Client).Start(0xc000122140)
	github.com/bgp/stayrtr/lib/server.go:877 +0x39f
created by github.com/bgp/stayrtr/lib.(*Server).acceptClientTCP
	github.com/bgp/stayrtr/lib/server.go:554 +0x276

stayrtr package used:

root@rpki1:~# apt-cache policy stayrtr
stayrtr:
  Installed: 0.5.0-1
  Candidate: 0.5.0-1
  Version table:
 *** 0.5.0-1 500
        500 http://httpredir.debian.org/debian bookworm/main amd64 Packages
        500 http://deb.debian.org/debian bookworm/main amd64 Packages
        100 /var/lib/dpkg/status

arguments used:

root@rpki1:~# cat /etc/default/stayrtr
STAYRTR_ARGS=-bind :8323 -checktime=false -cache /var/lib/rpki-client/json

This happens quite often (a few times per hour) on multiple stayrtr instances we have running (all using the same stayrtr version).

Crash when cache URL isn't present

./dist/stayrtr-0.1-62-gcc539c4-linux-x86_64 -metrics.addr "127.0.0.1:9847" -cache http://127.0.0.1:8081/output.json
ERRO[0000] Error updating: HTTP 503 Service Unavailable 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x7f7c4e]

goroutine 1 [running]:
main.(*state).updateFromNewState(0xc0000c6000)
        ./stayrtr.go:258 +0x4e
main.run()
        ./stayrtr.go:595 +0xaf5
main.main()
        ./stayrtr.go:488 +0x19

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.