GithubHelp home page GithubHelp logo

Comments (9)

dylandreimerink avatar dylandreimerink commented on July 22, 2024

I have been playing around with this a bit. The flaky behavior seems to originate in the kernels WakeupEvents logic. I have not looked into the kernel code yet, but the current test fails from time to time until I always add the WakeupEvents + 1 amount of events, then it consistently passes.

	// send followup events
	for i := 1; i < numEvents+1; i++ {
		_, _, err = prog.Test(internal.EmptyBPFContext)
		if err != nil {
			t.Fatal(err)
		}
	}

So perhaps this has to do with memory alignment of the map or something like that. I have tried varying the numEvents and sampleSize but changes there don't seem to change anything.

from ebpf.

dylandreimerink avatar dylandreimerink commented on July 22, 2024

I think I found the cause. The WakeupEvents limit is per ring, one per CPU. And when we execute BPF_PROG_RUN multiple times, we sometimes write 2 messages to different rings. If I log the CPU ID of the first and the followup events I see:

=== RUN   TestPerfReaderWakeupEvents
ret 7
ret 7
--- PASS: TestPerfReaderWakeupEvents (0.01s)
=== RUN   TestPerfReaderWakeupEvents
ret 7
ret 7
--- PASS: TestPerfReaderWakeupEvents (0.01s)
=== RUN   TestPerfReaderWakeupEvents
ret 7
ret 7
--- PASS: TestPerfReaderWakeupEvents (0.01s)
=== RUN   TestPerfReaderWakeupEvents
ret 7
ret 7
--- PASS: TestPerfReaderWakeupEvents (0.01s)
=== RUN   TestPerfReaderWakeupEvents
ret 7
ret 0
panic: test timed out after 1s

The numbers changes from run to run, and its seems pure luck that the +1 I mentioned earlier happens to land on the same CPU as one of the once before.

A potential fix would be to add the following to the start of the test:

import extUnix "golang.org/x/sys/unix"

...

func TestPerfReaderWakeupEvents(t *testing.T) {
	// Lock goroutine to thread
	runtime.LockOSThread()
	defer runtime.UnlockOSThread()

	// Save CPU affinity
	var set extUnix.CPUSet
	err := extUnix.SchedGetaffinity(0, &set)
	qt.Assert(t, qt.IsNil(err))
	// Schedule test to run on only CPU 0
	err = extUnix.SchedSetaffinity(0, &extUnix.CPUSet{1})
	qt.Assert(t, qt.IsNil(err))
	// Restore CPU affinity
	defer extUnix.SchedSetaffinity(0, &set)

Perhaps there are other alternatives (this doesn't win any beauty awards)

from ebpf.

brycekahle avatar brycekahle commented on July 22, 2024

Could we send numCPUs * WakeupEvents events to ensure that at least one CPU gets woken up?

from ebpf.

dylandreimerink avatar dylandreimerink commented on July 22, 2024

Yea, that should also work, but I don't know if that defeats the purpose of the test, in my case you would be enqueue'ing 16 events to test a 2 event limit.

from ebpf.

brycekahle avatar brycekahle commented on July 22, 2024

The test was more for making sure it didn't wakeup after 1 event.

from ebpf.

brycekahle avatar brycekahle commented on July 22, 2024

I'm not sure we can control the CPU the eBPF program actually runs on by controlling the affinity of the userspace program.

from ebpf.

dylandreimerink avatar dylandreimerink commented on July 22, 2024

I'm not sure we can control the CPU the eBPF program actually runs on by controlling the affinity of the userspace program.

I tested the code I showed seems to work, at least locally. By default the BPF program executes on the CPU making the syscall. Although that isn't official so not guaranteed.

The Program.Run also has a parameter to pick a CPU to run on, but looking at the kernel, it only works for raw tracepoint programs, so if we can change the program type for our sample prog, then that might be an option. (torvalds/linux@1b4d60e)

from ebpf.

brycekahle avatar brycekahle commented on July 22, 2024

it only works for raw tracepoint programs

That would constrain what kernel versions we can test on though.

from ebpf.

lmb avatar lmb commented on July 22, 2024

I'd be fine with both solutions. I remember that we have the same problem (samples submitted on the "wrong" CPU) in other places as well. Maybe we could reuse the user space code.

I think it's also fine to constrain this to a smaller number of kernel versions: we're testing that the plumbing we have ~ works. We don't need to / want to assert that the kernel isn't doing dodgy things (as we'd never see the end of it 😆 ).

from ebpf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.