GithubHelp home page GithubHelp logo

cilium / ebpf Goto Github PK

View Code? Open in Web Editor NEW
5.8K 5.8K 641.0 44.87 MB

ebpf-go is a pure-Go library to read, modify and load eBPF programs and attach them to various hooks in the Linux kernel.

Home Page: https://ebpf-go.dev

License: MIT License

Go 99.05% Shell 0.30% Makefile 0.29% Smarty 0.28% Awk 0.07%
btf ebpf go golang linux

ebpf's Introduction

Cilium Logo

CII Best Practices Go Report Card CLOMonitor Artifact Hub Join the Cilium slack channel GoDoc Read the Docs Apache licensed BSD licensed GPL licensed FOSSA Status Github Codespaces

Cilium is a networking, observability, and security solution with an eBPF-based dataplane. It provides a simple flat Layer 3 network with the ability to span multiple clusters in either a native routing or overlay mode. It is L7-protocol aware and can enforce network policies on L3-L7 using an identity based security model that is decoupled from network addressing.

Cilium implements distributed load balancing for traffic between pods and to external services, and is able to fully replace kube-proxy, using efficient hash tables in eBPF allowing for almost unlimited scale. It also supports advanced functionality like integrated ingress and egress gateway, bandwidth management and service mesh, and provides deep network and security visibility and monitoring.

A new Linux kernel technology called eBPF is at the foundation of Cilium. It supports dynamic insertion of eBPF bytecode into the Linux kernel at various integration points such as: network IO, application sockets, and tracepoints to implement security, networking and visibility logic. eBPF is highly efficient and flexible. To learn more about eBPF, visit eBPF.io.

Overview of Cilium features for networking, observability, service mesh, and runtime security

Stable Releases

The Cilium community maintains minor stable releases for the last three minor Cilium versions. Older Cilium stable versions from minor releases prior to that are considered EOL.

For upgrades to new minor releases please consult the Cilium Upgrade Guide.

Listed below are the actively maintained release branches along with their latest patch release, corresponding image pull tags and their release notes:

v1.15 2024-04-11 quay.io/cilium/cilium:v1.15.4 Release Notes
v1.14 2024-04-11 quay.io/cilium/cilium:v1.14.10 Release Notes
v1.13 2024-04-11 quay.io/cilium/cilium:v1.13.15 Release Notes

Architectures

Cilium images are distributed for AMD64 and AArch64 architectures.

Software Bill of Materials

Starting with Cilium version 1.13.0, all images include a Software Bill of Materials (SBOM). The SBOM is generated in SPDX format. More information on this is available on Cilium SBOM.

Development

For development and testing purpose, the Cilium community publishes snapshots, early release candidates (RC) and CI container images build from the main branch. These images are not for use in production.

For testing upgrades to new development releases please consult the latest development build of the Cilium Upgrade Guide.

Listed below are branches for testing along with their snapshots or RC releases, corresponding image pull tags and their release notes where applicable:

main daily quay.io/cilium/cilium-ci:latest N/A
v1.16.0-pre.2 2024-05-02 quay.io/cilium/cilium:v1.16.0-pre.2 Release Candidate Notes

Functionality Overview

Protect and secure APIs transparently

Ability to secure modern application protocols such as REST/HTTP, gRPC and Kafka. Traditional firewalls operate at Layer 3 and 4. A protocol running on a particular port is either completely trusted or blocked entirely. Cilium provides the ability to filter on individual application protocol requests such as:

  • Allow all HTTP requests with method GET and path /public/.*. Deny all other requests.
  • Allow service1 to produce on Kafka topic topic1 and service2 to consume on topic1. Reject all other Kafka messages.
  • Require the HTTP header X-Token: [0-9]+ to be present in all REST calls.

See the section Layer 7 Policy in our documentation for the latest list of supported protocols and examples on how to use it.

Secure service to service communication based on identities

Modern distributed applications rely on technologies such as application containers to facilitate agility in deployment and scale out on demand. This results in a large number of application containers being started in a short period of time. Typical container firewalls secure workloads by filtering on source IP addresses and destination ports. This concept requires the firewalls on all servers to be manipulated whenever a container is started anywhere in the cluster.

In order to avoid this situation which limits scale, Cilium assigns a security identity to groups of application containers which share identical security policies. The identity is then associated with all network packets emitted by the application containers, allowing to validate the identity at the receiving node. Security identity management is performed using a key-value store.

Secure access to and from external services

Label based security is the tool of choice for cluster internal access control. In order to secure access to and from external services, traditional CIDR based security policies for both ingress and egress are supported. This allows to limit access to and from application containers to particular IP ranges.

Simple Networking

A simple flat Layer 3 network with the ability to span multiple clusters connects all application containers. IP allocation is kept simple by using host scope allocators. This means that each host can allocate IPs without any coordination between hosts.

The following multi node networking models are supported:

  • Overlay: Encapsulation-based virtual network spanning all hosts. Currently, VXLAN and Geneve are baked in but all encapsulation formats supported by Linux can be enabled.

    When to use this mode: This mode has minimal infrastructure and integration requirements. It works on almost any network infrastructure as the only requirement is IP connectivity between hosts which is typically already given.

  • Native Routing: Use of the regular routing table of the Linux host. The network is required to be capable to route the IP addresses of the application containers.

    When to use this mode: This mode is for advanced users and requires some awareness of the underlying networking infrastructure. This mode works well with:

    • Native IPv6 networks
    • In conjunction with cloud network routers
    • If you are already running routing daemons

Load Balancing

Cilium implements distributed load balancing for traffic between application containers and to external services and is able to fully replace components such as kube-proxy. The load balancing is implemented in eBPF using efficient hashtables allowing for almost unlimited scale.

For north-south type load balancing, Cilium's eBPF implementation is optimized for maximum performance, can be attached to XDP (eXpress Data Path), and supports direct server return (DSR) as well as Maglev consistent hashing if the load balancing operation is not performed on the source host.

For east-west type load balancing, Cilium performs efficient service-to-backend translation right in the Linux kernel's socket layer (e.g. at TCP connect time) such that per-packet NAT operations overhead can be avoided in lower layers.

Bandwidth Management

Cilium implements bandwidth management through efficient EDT-based (Earliest Departure Time) rate-limiting with eBPF for container traffic that is egressing a node. This allows to significantly reduce transmission tail latencies for applications and to avoid locking under multi-queue NICs compared to traditional approaches such as HTB (Hierarchy Token Bucket) or TBF (Token Bucket Filter) as used in the bandwidth CNI plugin, for example.

Monitoring and Troubleshooting

The ability to gain visibility and troubleshoot issues is fundamental to the operation of any distributed system. While we learned to love tools like tcpdump and ping and while they will always find a special place in our hearts, we strive to provide better tooling for troubleshooting. This includes tooling to provide:

  • Event monitoring with metadata: When a packet is dropped, the tool doesn't just report the source and destination IP of the packet, the tool provides the full label information of both the sender and receiver among a lot of other information.
  • Metrics export via Prometheus: Key metrics are exported via Prometheus for integration with your existing dashboards.
  • Hubble: An observability platform specifically written for Cilium. It provides service dependency maps, operational monitoring and alerting, and application and security visibility based on flow logs.

Getting Started

What is eBPF and XDP?

Berkeley Packet Filter (BPF) is a Linux kernel bytecode interpreter originally introduced to filter network packets, e.g. for tcpdump and socket filters. The BPF instruction set and surrounding architecture have recently been significantly reworked with additional data structures such as hash tables and arrays for keeping state as well as additional actions to support packet mangling, forwarding, encapsulation, etc. Furthermore, a compiler back end for LLVM allows for programs to be written in C and compiled into BPF instructions. An in-kernel verifier ensures that BPF programs are safe to run and a JIT compiler converts the BPF bytecode to CPU architecture-specific instructions for native execution efficiency. BPF programs can be run at various hooking points in the kernel such as for incoming packets, outgoing packets, system calls, kprobes, uprobes, tracepoints, etc.

BPF continues to evolve and gain additional capabilities with each new Linux release. Cilium leverages BPF to perform core data path filtering, mangling, monitoring and redirection, and requires BPF capabilities that are in any Linux kernel version 4.8.0 or newer (the latest current stable Linux kernel is 4.14.x).

Many Linux distributions including CoreOS, Debian, Docker's LinuxKit, Fedora, openSUSE and Ubuntu already ship kernel versions >= 4.8.x. You can check your Linux kernel version by running uname -a. If you are not yet running a recent enough kernel, check the Documentation of your Linux distribution on how to run Linux kernel 4.9.x or later.

To read up on the necessary kernel versions to run the BPF runtime, see the section Prerequisites.

image

XDP is a further step in evolution and enables running a specific flavor of BPF programs from the network driver with direct access to the packet's DMA buffer. This is, by definition, the earliest possible point in the software stack, where programs can be attached to in order to allow for a programmable, high performance packet processor in the Linux kernel networking data path.

Further information about BPF and XDP targeted for developers can be found in the BPF and XDP Reference Guide.

To know more about Cilium, its extensions and use cases around Cilium and BPF take a look at Further Readings section.

Community

Slack

Join the Cilium Slack channel to chat with Cilium developers and other Cilium users. This is a good place to learn about Cilium, ask questions, and share your experiences.

Special Interest Groups (SIG)

See Special Interest groups for a list of all SIGs and their meeting times.

Developer meetings

The Cilium developer community hangs out on Zoom to chat. Everybody is welcome.

eBPF & Cilium Office Hours livestream

We host a weekly community YouTube livestream called eCHO which (very loosely!) stands for eBPF & Cilium Office Hours. Join us live, catch up with past episodes, or head over to the eCHO repo and let us know your ideas for topics we should cover.

Governance

The Cilium project is governed by a group of Maintainers and Committers. How they are selected and govern is outlined in our governance document.

Adopters

A list of adopters of the Cilium project who are deploying it in production, and of their use cases, can be found in file USERS.md.

Roadmap

Cilium maintains a public roadmap. It gives a high-level view of the main priorities for the project, the maturity of different features and projects, and how to influence the project direction.

License

The Cilium user space components are licensed under the Apache License, Version 2.0. The BPF code templates are dual-licensed under the General Public License, Version 2.0 (only) and the 2-Clause BSD License (you can use the terms of either license, at your option).

ebpf's People

Contributors

aditighag avatar aibor avatar alban avatar alxn avatar arthurfabre avatar brycekahle avatar chenhengqi avatar christarazi avatar danobi avatar dependabot[bot] avatar dylandreimerink avatar eiffel-fl avatar florianl avatar hao-lee avatar kwakubiney avatar lmb avatar markpash avatar mehrdadrad avatar mmat11 avatar mythi avatar nathanjsweet avatar olsajiri avatar paulcacheux avatar pippolo84 avatar rgo3 avatar thajeztah avatar ti-mo avatar tklauser avatar twpayne avatar zeffron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ebpf's Issues

bpf-to-bpf calls don't strip unused functions

Given a program like this:

int foo() { ... }
int bar() { ... }

__section("xdp") int xdp() { return bar(); }

Will fail to load with something like the following (assuming that bar isn't inlined):

invalid argument: unreachable insn XYZ

This is because both foo and bar are placed in the .text section, but foo isn't actually used by the program. When linking we don't remove bar, which the verifier rejects.

A work around to this is to pass -ffunction-sections to clang when compiling the eBPF. This places functions in their own sections, which means there is never any dead code. However, the linker isn't smart enough to link linked functions right now, so this breaks as well.

Upstream patched binary package from cilium

Currently, marshaling a 24 byte via binary allocates quite a lot:

name                      time/op
Marshalling/reflection-8  1.27µs ± 8%
Marshalling/custom-8       730ns ± 6%
Marshalling/unsafe-8       662ns ± 5%

name                      alloc/op
Marshalling/reflection-8    152B ± 0%
Marshalling/custom-8       32.0B ± 0%
Marshalling/unsafe-8       0.00B

name                      allocs/op
Marshalling/reflection-8    8.00 ± 0%
Marshalling/custom-8        1.00 ± 0%
Marshalling/unsafe-8        0.00

Cilium proper has a patched binary package which reduces allocations: https://github.com/cilium/cilium/tree/master/pkg/bpf/binary

Figure out map element vs. map batch APIs

The kernel APIs provide multiple implementations for map interactions:

  • Per-element APIs (Since Linux v3.18)
    • Lookup
    • Update
    • Delete
    • GetNextKey (iteration)
    • Lookup + Delete (Linux v4.20, stack/queue maps only)
  • Batched APIs (Linux v5.6):
    • Lookup batch
    • Lookup + Delete
    • Update batch
    • Delete batch

Do we want to export both in the library API, or export a batched API and if batch API is unavailable, simulate the batched behaviour using per-element APIs?

Things to keep an eye out for: per-element APIs might not provide the exact same behaviour so we'll have to pay close attention when defining semantics on these map interactions.

Can't be compiled for 32-bit ARM

opencontainers/runc#2145 (comment)

CGO_ENABLED=1 GOARCH=arm GOARM=6 CC=arm-linux-gnueabi-gcc go build -buildmode=pie  -ldflags "-X main.gitCommit="bf9519326d3dcc4a78f3cddbc54ac7a78a0aa948" -X main.version=1.0.0-rc9+dev " -tags "seccomp apparmor selinux ambient" -o runc-armel .
# github.com/opencontainers/runc/vendor/github.com/cilium/ebpf
vendor/github.com/cilium/ebpf/syscalls.go:285:17: constant 3405662737 overflows int32
Makefile:125: recipe for target 'localcross' failed

This line does not compile for GOARCH=arm GOARM=6

if statfs.Type != bpfFSType {

Program.Benchmark test is flaky

Just saw the following error on CI:

=== RUN   TestProgramBenchmark
--- FAIL: TestProgramBenchmark (0.00s)
    prog_test.go:103: Expected non-zero duration

Populate inner maps from BTF .values

Hi. thanks for the great project! 👍

I've been developing using the library since newtools/ebpf.

I tried to switch to cilium/ebpf and found that the support for innermap was not complete realize.

For example, This function doesn't seem to take into account the fact that it is innermap.

func (ec *elfCode) loadMaps(maps map[string]*MapSpec, mapSections map[elf.SectionIndex]*elf.Section) error {

Do you plan to support innermap in the future?
If no one else is planning to do it, I'd like to help with that myself:)

When I am deleting pod, I found some warning messages.

When I am deleting pod, I found some warning messages like this: level=warning msg="Unable to update ipcache map entry on pod add" error="empty PodIPs". But there is no error, so why does this message should be logged, and could it be deleted from source code ?

MapIterator doesn't terminate for heavily modified map

MapIterator.Next uses BPF_MAP_GET_NEXT_KEY to iterate a map. However, there is no guarantee that a lookup makes forward progress through a map. This is a problem with the kernel implementation, for which a couple of solutions were discussed at Linux Plumbers Conf '19.

We need a workaround until the solution is available. The Cilium project imposes a max # of iterations to protect against this behaviour. What is the right number for that? Maybe just use MaxElems as a hard limit? I'd like to avoid introducing user visible knobs, if possible.

FWIW, it's possible to do the following:

for i := 0; entries.Next(&key, &value) && i < 1000; i ++ {
    // ...
}

But it requires every user to know about the problem.

cc @joestringer

Split implementations into foo.go, foo_linux.go

It can be useful to split Linux-specific code into separate files, so that large programs which are written against the library can be compiled and unit-tested on other platforms, with stub implementations for the linux-specific portions. Otherwise as soon as some file in the user project imports the BPF library, it will cause compilation failures for developers on other platforms.

We have a few developers on Cilium who develop natively on OSX which this affects from time to time. Example commit which shifts the code around to achieve this:

cilium/cilium@1d28c26#diff-c2be647780ac9b1d4632bfb1b99a7c0a

Add bpf_link support

BPF has a new concept called bpf_link, which will unify attaching BPF to different hooks. We should support this in the library.

support compile once, run everywhere (CO-RE)

libbpf supports modifying BPF bytecode to adjust for differences in kernel structure layouts. This is extremely useful for tracing programs that want to introspect kernel state, since you can now compile them against one kernel version but run them against another one.

Constituent parts:

  • Field based relocations (existence, size, offset)
  • Field signedness
  • Type based relocations
  • Enum based relocations
  • #384

Perf event output not reading packet data

As per discussion on the channel, I'm trying to read packet data with a kernel program using bpf_perf_event_output, but I'm only seeing the first 4 bytes. (the metadata I actually send)

Output: r.RawSample has 4 bytes (cookie + pkt len)
Expectation: r.RawSample would contain 4 bytes of cookie + pkt len and the next (pkt len) bytes of the packet.

Code: https://github.com/yarochewsky/xdp-go

Integration tests with libbpf are broken

TestLibBPFCompat tries to parse BPF from the kernel's selftests/bpf. The test is currently broken, since there are features we don't support, and because some tests need additional data to load. We should:

  1. Fix loading the kernel selftests: unsupported files should be skipped (with an explanation why)
  2. Start running them on CI: for each tested kernel version, we should pull a tarball with the BPF ELFs from that kernel release. We'll need to update ci-kernels for this.
  3. Create tickets for unsupported features, maybe implement some of them.

For point 2, we can use the fact that BPF ELFs have an entry in their headers that identifies them as EM_BPF:

$ llvm-readelf-10 --file-headers testdata/loader-clang-6.0.elf 
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           EM_BPF
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          6216 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         28
  Section header string table index: 1

clang-8 binaries without debug info can't be loaded

When compiling BPF programs with clang-8 without debug info, LoadCollectionSpec fails with:

load programs: program no_relocation: no extended BTF info for section socket

To reproduce:

clang-8 -target bpf -O2 -Wall -Werror -c testdata/loader.c -o testdata/loader-clang-8-nodebug.elf
sudo -E (which go) test -v -run "TestLoadCollectionSpec" ./...

Doc site ERR_SSL_VERSION_OR_CIPHER_MISMATCH

I'm using MacOSX with Chrome

Wireshark shows that Chrome have offer these cipher suites, However the doc site still shows handshake failure.

Cipher Suites (17 suites)
Cipher Suite: Unknown (0xeaea)
Cipher Suite: Unknown (0x1301)
Cipher Suite: Unknown (0x1302)
Cipher Suite: Unknown (0x1303)
Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 (0xc02b)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (0xc02f)
Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (0xc02c)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (0xc030)
Cipher Suite: Unknown (0xcca9)
Cipher Suite: Unknown (0xcca8)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (0xc013)
Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (0xc014)
Cipher Suite: TLS_RSA_WITH_AES_128_GCM_SHA256 (0x009c)
Cipher Suite: TLS_RSA_WITH_AES_256_GCM_SHA384 (0x009d)
Cipher Suite: TLS_RSA_WITH_AES_128_CBC_SHA (0x002f)
Cipher Suite: TLS_RSA_WITH_AES_256_CBC_SHA (0x0035)
Cipher Suite: TLS_RSA_WITH_3DES_EDE_CBC_SHA (0x000a)

Using libbpf-tools to generate ELFs

I'm a newbie to the eBPF / libbpf space so apologies for my potentially silly question!

My end goal is to be able to load & run eBPF programs using this golang eBPF library on our production clusters. To avoid compilation at runtime, I would like to use BPF CO-RE and compile my BPF programs ahead of time.

To test things out, I'm currently trying to use libbpf-tools for the build (since it's a simple make) & this ebpf golang library for the control app to load / read data.

Here's what I've tried so far:

  1. compile ELFs using libbpf-tools (just running make)
  2. verify that the binaries that the tool created run correctly (to make sure compilation was successful)
  3. use the outputted *.bpf.o files and pass it into the following test code:
spec, err := ebpf.LoadCollectionSpec(file)
if err != nil {
    panic(err)
}
coll, err := ebpf.NewCollection(spec)
if err != nil {
    panic(err)
}

However, I get a variety of errors when creating a new collection depending on which BPF program. To show you what kind of errors all at once, I ran my ELFs against the elf-reader-test file (as per #87) and here are the results that I got:

=== RUN   TestLibBPFCompat
=== RUN   TestLibBPFCompat/test_cpudist.bpf.o
=== PAUSE TestLibBPFCompat/test_cpudist.bpf.o
=== RUN   TestLibBPFCompat/test_drsnoop.bpf.o
=== PAUSE TestLibBPFCompat/test_drsnoop.bpf.o
=== RUN   TestLibBPFCompat/test_execsnoop.bpf.o
=== PAUSE TestLibBPFCompat/test_execsnoop.bpf.o
=== RUN   TestLibBPFCompat/test_filelife.bpf.o
=== PAUSE TestLibBPFCompat/test_filelife.bpf.o
=== RUN   TestLibBPFCompat/test_opensnoop.bpf.o
=== PAUSE TestLibBPFCompat/test_opensnoop.bpf.o
=== RUN   TestLibBPFCompat/test_runqslower.bpf.o
=== PAUSE TestLibBPFCompat/test_runqslower.bpf.o
=== CONT  TestLibBPFCompat/test_cpudist.bpf.o
=== CONT  TestLibBPFCompat/test_filelife.bpf.o
    TestLibBPFCompat/test_filelife.bpf.o: elf_reader_test.go:244: Can't read tests/test_filelife.bpf.o: file tests/test_filelife.bpf.o: load BTF maps: map events: can't get BTF: map events: missing 'key' in type
=== CONT  TestLibBPFCompat/test_runqslower.bpf.o
    TestLibBPFCompat/test_runqslower.bpf.o: elf_reader_test.go:244: Can't read tests/test_runqslower.bpf.o: file tests/test_runqslower.bpf.o: load BTF maps: map events: can't get BTF: map events: missing 'key' in type
=== CONT  TestLibBPFCompat/test_opensnoop.bpf.o
    TestLibBPFCompat/test_opensnoop.bpf.o: elf_reader_test.go:244: Can't read tests/test_opensnoop.bpf.o: file tests/test_opensnoop.bpf.o: load BTF maps: map events: can't get BTF: map events: missing 'key' in type
=== CONT  TestLibBPFCompat/test_execsnoop.bpf.o
    TestLibBPFCompat/test_execsnoop.bpf.o: elf_reader_test.go:244: Can't read tests/test_execsnoop.bpf.o: file tests/test_execsnoop.bpf.o: load BTF maps: map events: can't get BTF: map events: missing 'key' in type
=== CONT  TestLibBPFCompat/test_drsnoop.bpf.o
    TestLibBPFCompat/test_drsnoop.bpf.o: elf_reader_test.go:244: Can't read tests/test_drsnoop.bpf.o: file tests/test_drsnoop.bpf.o: load BTF maps: map events: can't get BTF: map events: missing 'key' in type
    TestLibBPFCompat/test_cpudist.bpf.o: elf_reader_test.go:250: map start: map create: invalid argument
--- FAIL: TestLibBPFCompat (0.00s)
    --- FAIL: TestLibBPFCompat/test_filelife.bpf.o (0.00s)
    --- FAIL: TestLibBPFCompat/test_runqslower.bpf.o (0.00s)
    --- FAIL: TestLibBPFCompat/test_opensnoop.bpf.o (0.00s)
    --- FAIL: TestLibBPFCompat/test_execsnoop.bpf.o (0.00s)
    --- FAIL: TestLibBPFCompat/test_drsnoop.bpf.o (0.00s)
    --- FAIL: TestLibBPFCompat/test_cpudist.bpf.o (0.01s)
FAIL

Any ideas why this could be happening / is what I'm trying to achieve doable? Please let me know if you have any other recommendations on using pre-compiled ELFs!

Info on what I'm using:

  • Linux Kernel: version 5.6.0 with CONFIG_DEBUG_INFO_BTF=y (and /sys/kernel/btf/vmlinux is there)
  • Ubuntu 18.04
  • Clang: version 10.0.1
  • Pahole: v1.17

reference to missing symbol BIT

When loading my elf object file with ebpf library, it outputs a message:
ebpf.NewCollectionWithOptions: program cls_xrt_parse_ingress: instruction 343: reference to missing symbol BIT
What does this mean ?

map, prog: allow access to bpfMapInfo, bpfProgInfo

Currently after i load a map/prog from LoadPinnedMap() / LoadPinnedProgram(), i can use Map.ABI() / Program.ABI() to get informations of the map/prog, Map.ABI's informations are enough for me, but Program.ABI has only ProgramType, why not just use bpfMapInfo/bpfProgInfo as an atrribute of Map/Program, and add an API to get it?

Wrong behavior on s390x

We are currently facing an issue using this library on s390x. I tired to run the test TestProgramRun. I got:

go test -run TestProgramRun
--- FAIL: TestProgramRun (0.00s)
    prog_test.go:50:     0: LdXMemW dst: r2 src: r1 off: 4 imm: 0
             1: LdXMemW dst: r1 src: r1 off: 0 imm: 0
             2: MovReg dst: r3 src: r1
             3: AddImm dst: r3 imm: 14
             4: JGTReg dst: r3 off: -1 src: r2 <out>
             5: StMemB dst: r1 src: r0 off: 0 imm: 222
             6: StMemB dst: r1 src: r0 off: 1 imm: 173
             7: StMemB dst: r1 src: r0 off: 2 imm: 190
             8: StMemB dst: r1 src: r0 off: 3 imm: 239
        out:
             9: LdImmDW dst: r0 imm: 42
            11: Exit
        
    prog_test.go:59: can't load program: permission denied: 0: (61) r1 = *(u32 *)(r2 +4)
        R2 !read_ok
        processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
FAIL
exit status 1
FAIL    github.com/cilium/ebpf  0.001s
Collapse

Should we adjust the ETH_P_ALL definition in example_sock_elf_test.go?

ETH_P_ALL(0x03) doesn't work in my virtual machine CentOS8, ByteOrder:binary.LittleEndian when I change ETH_P_ALL to 0x300, it works.

example_sock_elf_test.go

func openRawSock(index int) (int, error) {
	const ETH_P_ALL uint16 = 0x00<<8 | 0x03   // use syscall.ETH_P_ALL instead ?
	sock, err := syscall.Socket(syscall.AF_PACKET, syscall.SOCK_RAW|syscall.SOCK_NONBLOCK|syscall.SOCK_CLOEXEC, int(ETH_P_ALL))
        //...
}

Non-functioning strace logs:

socket(AF_PACKET, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, 3 /*0x03*/) = 3
bind(3, {sa_family=AF_PACKET, sll_protocol=htons(0x300 /* ETH_P_??? */), sll_ifindex=if_nametoindex("lo"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0

A working strace log:

socket(AF_PACKET, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, 768 /*0x300*/) = 3
bind(3, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("lo"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0

And I checked the same example In C samples/bpf/sock_example.h, the code is as follows.

 static inline int open_raw_sock(const char *name)
  {
         sock = socket(PF_PACKET, SOCK_RAW | SOCK_NONBLOCK | SOCK_CLOEXEC, htons(ETH_P_ALL));
         // htons(ETH_P_ALL)  htons can change ETH_P_ALL(0x03) to System BigEndian
         // ...
   }

Should we add some helper function to make this example work on both ByteOrder:binary.LittleEndian and ByteOrder:binary.BigEndian, like the htons function?

The difference between MapSpec and MapABI is unclear

See #30, #26 and probably others. The difference between MapABI and MapSpec isn't entirely obvious. Either we get rid of MapABI somehow, or we document the differences.

One possibility is to make the fields in MapABI public in Map, so users could access Map.Type, etc. This creates the opportunity for race conditions however, so I'm not a big fan.

Maybe there is also a better name for MapABI (and friends).

error when loading CollectionSpec with map definition has more than type, keysize, valuesize, maxentries, and flags fields

To be more specific, let's say a map definition contains fields that are taken advantage of if loading via tc filter, iproute2, or even iovisor/goebpf such as .pinning or .namespace; the inclusion of those fields causes collectionspec loading to error out in elf_reader.go loadMaps with an error concerning "unknown and non-zero fields in definition"

for my purposes, my bpf progs and maps are exclusively loaded using this repo so it's not a big deal to me atm, if this could be relaxed in some way, i think that could be helpful for others. Not sure if it's better or worse to add fields to MapSpec that are non-mandatory or adding some switch/field/property to skip aforementioned error?

error: unknown func bpf_l4_csum_replace#11

When running test ebpf kernel code, the loader throws a error message as cilium ebpf debug format:

221: (85) call bpf_l4_csum_replace#11
unknown func bpf_l4_csum_replace#11

What does this mean ? How to solve it ?

Document supported Go and Linux versions

For Go, I believe we should only explicitly support the last two Go versions, in accordance with https://golang.org/doc/devel/release.html#policy This means that currently Go 1.12 and 1.13 is supported. It would also be nice to test against these on CI.

For Linux, Cilium's minimum requirement is 4.9. Cloudflare is always on either the ultimate or penultimate LTS release. I think we should only support mainline LTS kernels (so testing on CI).

CPU profiling breaks BPF_PROG_TEST_RUN

It seems like Go CPU profiling uses signals to periodically interrupt the process and sample the program counter at that point. This has the side effect of aborting in-progress calls to BPF_PROG_TEST_RUN with EINTR.

The simplest fix is to simply retry on EINTR. However this will skew benchmarks, since in fact we did more work.

Migrate to xerrors

We should migrate from github.com/pkg/errors to golang.org/x/xerrors. This probably means removing the various Is* functions in favour of sentinel errors. I'm also not entirely sure how to deal with internal.UnsupportedFeatureError yet. Possibly wrap it around a sentinel error?

By default, we should not wrap "foreign" errors originating from outside the package. Any functions calling other package functions should however do so. See https://blog.golang.org/go1.13-errors#TOC_3.4.

LLVM 10 emits incompatible BTF

When using LLVM 10 to compile to eBPF, the generated BTF seems invalid:

--- FAIL: TestLoadCollectionSpec (0.00s)
    --- FAIL: TestLoadCollectionSpec/loader-clang-10.elf (0.00s)
        elf_reader_test.go:76: invalid argument: magic: 0xeb9f
            version: 1
            flags: 0x0
            hdr_len: 24
            type_off: 0
            type_len: 536
            str_off: 536
            str_len: 582
            btf_total_size: 1142
            [1] STRUCT (anon) size=40 vlen=5
            	type type_id=2 bits_offset=0
            	key type_id=6 bits_offset=64
            	value type_id=6 bits_offset=128
            	max_entries type_id=2 bits_offset=192
            	map_flags type_id=2 bits_offset=256
            [2] PTR (anon) type_id=4
            [3] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
            [4] ARRAY (anon) type_id=3 index_type_id=5 nr_elems=1
            [5] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)
            [6] PTR (anon) type_id=7
            [7] TYPEDEF uint32_t type_id=8
            [8] INT unsigned int size=4 bits_offset=0 nr_bits=32 encoding=(none)
            [9] VAR btf_map type_id=1 linkage=1
            [10] FUNC_PROTO (anon) return=3 args=(3 arg)
            [11] FUNC helper_func2 type_id=10 vlen != 0

(The error message is from the kernel.)

According to the BTF spec, it is indeed invalid for a FUNC to have a vlen != 1. Maybe this is some kind of new extension?

Formalize contributing guide

We had an interested developer join the weekly Cilium community meeting looking to understand the contribution process for submitting changes into this library.

Github acknowledges a file named CONTRIBUTING.md in the root of any repo and adjusts the UI to highlight that process, we could add such a file to clarify the process for new contributors.

There are probably better examples elsewhere, but at least for Cilium it points towards the main Cilium documentation page on contributing (https://github.com/cilium/cilium/blob/master/CONTRIBUTING.md -> https://docs.cilium.io/en/stable/contributing/development/contributing_guide/). I don't believe it makes sense to inherit these into the process for this library as the needs of contributors and the development requirements are vastly different.

when pinning map, possible to fail pinning or have extra chars on end of filename because of non-null terminated string

this happened more commonly with maps with longer filenames for some reason, but for example, if i were to pin something like
/sys/fs/bpf/aaaaa/cachedConnectionTbl
there was a low chance that the pin would fail due to "No such file or directory" randomly, or a somewhat higher chance the file would be created but it'd end up something like
/sys/fs/bpf/aaaaa/cachedConnectionTbl[&%&%&%
like there would be random characters attached to the end. I'm guessing it's because in bpfPinObject we directly pass in the pointer to the first character of the filename to pin the bpf object at without checking if it's null terminated, so there's a chance of stuff being slapped onto the end.

In my case, passing in a string to .Pin(fileName) with null terminator purposely added to the end seems to fix the issue.

MapABI should contain Flags

Cilium propagates a user option to determine whether to pre-allocate the memory for entries in maps that support it, using the bpf_attr.map_flags field. To support this option in Cilium, we would need to allow this to be configured somehow (presumably in MapABI?)

Related comment on prior PR: #27 (comment)

Support nested maps in LoadPinnedMap / NewMapFromFD

The kernel currently doesn't have a way to do BPF_OBJ_GET_INFO on the "inner map" of a nested map. This means we can't populate MapABI.Inner when loading a map from an fd or from bpffs.

We should do two things:

  1. Don't rely on MapABI.Inner in the map code. This might already be the case. However, we wouldn't be able to return the correct value from Map.ABI(). This suggests that Map.ABI has to become func() (*MapABI, error).
  2. We should contribute a patch upstream which allows us to get the nested map info.

On kernels > 4.13 we can add a work around by iterating the nested map. If at least one slot is populated we can call BPF_OBJ_GET_INFO on it. On kernels < 4.13 we'd still have to return an error.

support attaching to tracepoints

cilium/ebpf does not seem to provide functions to enable tracepoints with BPFs. Would you be OK to accept changes that help to do that?

Support .values in BTF map definition

Specifying a values field allows defining nested maps in BTF. Data may also be initialized.

m_array_of_maps specifies both value and values, that may be a bug?

// tools/testing/selftests/bpf/progs/test_ringbuf_multi.c
struct ringbuf_map {
	__uint(type, BPF_MAP_TYPE_RINGBUF);
	__uint(max_entries, 1 << 12);
} ringbuf1 SEC(".maps"),
  ringbuf2 SEC(".maps");

struct {
	__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
	__uint(max_entries, 4);
	__type(key, int);
	__array(values, struct ringbuf_map);
} ringbuf_arr SEC(".maps") = {
	.values = {
		[0] = &ringbuf1,
		[2] = &ringbuf2,
	},
};

// tools/testing/selftests/bpf/progs/map_ptr_kern.c
struct {
	__uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS);
	__uint(max_entries, MAX_ENTRIES);
	__type(key, __u32);
	__type(value, __u32);
	__array(values, struct {
		__uint(type, BPF_MAP_TYPE_ARRAY);
		__uint(max_entries, 1);
		__type(key, __u32);
		__type(value, __u32);
	});
} m_array_of_maps SEC(".maps") = {
	.values = { (void *)&inner_map, 0, 0, 0, 0, 0, 0, 0, 0 },
};

non-static global variables return error "missing map <var name>"

Originally #196, it turns out that kernel selftests now use global variables without a static qualifier:

const char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string_to_stress_byte_loop";
// instead of
static const char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string_to_stress_byte_loop";

This breaks our ELF relocation logic, since it assumes that a relocation for a GLOBAL OBJECT always references a map.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.