GithubHelp home page GithubHelp logo

linux-kernel-programming's Issues

Ch 10 Question 8

Hello Kaiwan, I was confused by Ch 10 Question 8. Based on my reading of the book (p. 514) there are any number of runqueues. And as far as I could find, there was no discussion of waitqueues in the chapter.

The question and listed answer are:

8. On a modern Linux system with 4 CPU cores, there will be ___ runqueue(s) for
SMP scalability; how many waitqueues can it have: __
 1. 1, 4
 2. 4, any number
 3. 4, 1
 4. any number of, any number

A. option 2

Having trouble booting the 5.4 kernel on Ubuntu Jammy 22.04

Hello Kaiwan.

I've been using your book to revamp my rusty Linux kernel programming knowledge. I was out of the loop since the 3.x days.
I'm on Ubuntu 22 (Jammy) running kernel version 5.15. I git cloned tag v5.4 of the upstream stock kernel and did as your book mentioned. First off, I ran into the following problem and fixed it as mentioned here

arch/x86/entry/thunk_64.o: warning: objtool: missing symbol table
make[2]: *** [scripts/Makefile.build:357: arch/x86/entry/thunk_64.o] Error 1
make[2]: *** Deleting file 'arch/x86/entry/thunk_64.o'
make[1]: *** [scripts/Makefile.build:509: arch/x86/entry] Error 2
make[1]: *** Waiting for unfinished jobs....

Then, I again ran into the following problem and fixed it as mentioned here

LD  arch/x86/boot/compressed/vmlinux
ld: arch/x86/boot/compressed/pgtable_64.o:(.bss+0x0): multiple definition of `__force_order';
arch/x86/boot/compressed/kaslr_64.o:(.bss+0x0): first defined here

I was finally able to build and install the modules and the kernel. But then I ran into this:

image
On googling, I resorted to manually editing the /etc/initramfs-tools/initramfs.conf file to replace the ZSTD compression algorithm with gzip. Now the initramfs gets decompressed but it drops to the initramfs prompt since it cannot boot into the rootfs. From the picture below, initramfs says it was not able to find where the rootfs is located:

image

It looks like I'm running into circles just getting this thing up and running. Do you have any suggestions?

Fix for 5.7+ Linux kernel and example code for ch13 percpu_var.c

Hi Kaiwan,

I was working through the example for ch13 "percpu_var" module and noticed that since 5.7+ kernel's the kallsyms_lookup_name() is no longer exported (I'm using latest stable of 6.4.9 for all the examples in the book). Anyway, I came up with a workaround that solves the issue (it's kind of hacky), but thought you might get a kick out of it. :-)

/* HACK: To we can't easily use the kallsyms_lookup_name() on 5.7+ Linux kernels

  • since this function no longer has EXPORT_SYMBOL() support, so needed an
  • alternate workaround (using a module parameter seemed to be the quickest solution).
    */
    #if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 7, 0)
    static unsigned long kallsyms_sched_setaffinity;
    module_param(kallsyms_sched_setaffinity, ulong, 0660);
    MODULE_PARM_DESC(kallsyms_sched_setaffinity, "With 5.7+ Linux kernel's we need to use alternate method for"
    " instead of using kallsyms_lookup_name() for a pointer to sched_setaffinity(). To use kernel module"
    " load passing in the kallsyms_sched_setaffinity module parameter:\n"
    " $ insmod percpu_var.ko kallsyms_sched_setaffinity=0x$(sudo grep -w 'sched_setaffinity$' /proc/kallsyms | cut -d' ' -f1)\n");
    #endif

Then down in the module init function 'init_percpu_var' I added the Linux version conditional check:

    /* Following line won't work for 5.7+ kernels since this function no longer has
     * EXPORT_SYMBOL support:
     *      ptr_sched_setaffinity = (void *)kallsyms_lookup_name("sched_setaffinity");
     * Possible workaround in the code below, see the module parameter at the
     * beginning of this file (or use 'modinfo percpu_var.ko' from cmdline) to see it's use.
     */
    ret = -ENOSYS;

#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 7, 0)
ptr_sched_setaffinity = (void *)kallsyms_sched_setaffinity;
#else
ptr_sched_setaffinity = (void *)kallsyms_lookup_name("sched_setaffinity");
#endif

With these changes I'm able to run the driver as expected and get the dmesg results (I did make a few alterations to the values displayed for tx/rx increments just so I could see the changes a little easier):

dmesg of insmod (looks like my 3 CPU virtual machine was interleaving output I guess compared to screenshot from book :-) ):
[ 390.116111] percpu_var:init_percpu_var(): inserted
[ 390.116584] percpu_var:thrd_work(): *** kthread PID 11503 on cpu 0 now ***
[ 390.116868] percpu_var:thrd_work(): thrd_0/cpu0: pcpa = +1
[ 390.117101] percpu_var:thrd_work(): thrd_0/cpu0: pcp ctx: tx = 100, rx = 10
[ 390.117408] percpu_var:thrd_work(): thrd_0/cpu0: pcpa = +2
[ 390.117556] percpu_var:thrd_work(): *** kthread PID 11504 on cpu 1 now ***
[ 390.117653] percpu_var:thrd_work(): thrd_0/cpu0: pcp ctx: tx = 200, rx = 20
[ 390.117660] percpu_var:thrd_work(): thrd_0/cpu0: pcpa = +3
[ 390.117664] percpu_var:thrd_work(): thrd_0/cpu0: pcp ctx: tx = 300, rx = 30
[ 390.117824] percpu_var:thrd_work(): thrd_1/cpu1: pcpa = -1
[ 390.117828] percpu_var:disp_vars(): 000) [thrd_0/0]:11503 | .N.0 /* disp_vars() /
[ 390.118002] percpu_var:thrd_work(): thrd_1/cpu1: pcp ctx: tx = 20, rx = 200
[ 390.118007] percpu_var:disp_vars(): cpu 0: pcpa = +3, rx = 30, tx = 300
[ 390.118138] percpu_var:thrd_work(): thrd_1/cpu1: pcpa = -2
[ 390.118140] percpu_var:thrd_work(): thrd_1/cpu1: pcp ctx: tx = 40, rx = 400
[ 390.118143] percpu_var:thrd_work(): thrd_1/cpu1: pcpa = -3
[ 390.118432] percpu_var:disp_vars(): cpu 1: pcpa = -3, rx = 400, tx = 40
[ 390.118440] percpu_var:disp_vars(): cpu 2: pcpa = +0, rx = 0, tx = 0
[ 390.118446] percpu_var:thrd_work(): Our kernel thread #0 exiting now...
[ 390.120060] percpu_var:thrd_work(): thrd_1/cpu1: pcp ctx: tx = 60, rx = 600
[ 390.120238] percpu_var:disp_vars(): 001) [thrd_1/1]:11504 | .N.0 /
disp_vars() */
[ 390.120427] percpu_var:disp_vars(): cpu 0: pcpa = +3, rx = 30, tx = 300
[ 390.120597] percpu_var:disp_vars(): cpu 1: pcpa = -3, rx = 600, tx = 60
[ 390.120767] percpu_var:disp_vars(): cpu 2: pcpa = +0, rx = 0, tx = 0
[ 390.120937] percpu_var:thrd_work(): Our kernel thread #1 exiting now...

dmesg of rmmod:
[ 501.720553] percpu_var:exit_percpu_var(): kthread #0 stopped
[ 501.720703] percpu_var:exit_percpu_var(): kthread #1 stopped
[ 501.720839] percpu_var:disp_vars(): 001) rmmod :14779 | ...0 /* disp_vars() */
[ 501.721023] percpu_var:disp_vars(): cpu 0: pcpa = +3, rx = 30, tx = 300
[ 501.721194] percpu_var:disp_vars(): cpu 1: pcpa = -3, rx = 600, tx = 60
[ 501.721661] percpu_var:disp_vars(): cpu 2: pcpa = +0, rx = 0, tx = 0
[ 501.721862] percpu_var:exit_percpu_var(): removed, bye

One other thing I noticed when compiling was a stack size warning from the compiler (prior to the changes above and after):
warning: the frame size of 1232 bytes is larger than 1024 bytes [-Wframe-larger-than=]

I noticed that if I commented out the following lines in function set_cpuaffinity() the warning went away:

// cpumask_clear(&mask);
// cpumask_set_cpu(cpu, &mask); // 1st param is the CPU number, not bitmask
/* !HACK! sched_setaffinity() is NOT exported, we can't call it
* sched_setaffinity(0, &mask); // 0 => on self
* so we invoke it via it's function pointer
*/
// ret = (*ptr_sched_setaffinity)(0, &mask); // 0 => on self

Is the quick explanation that this struct is so large that when putting it on the local stack frame it's bigger than the default recommended stack frame size by GCC?

Thanks,

Brian W.

P.S. I'm definitely going to give this book an excellent write-up on Amazon. :-)

Possible memory leak in ch12 mutex/spinlock examples

Hi Kaiwan,

When working through the example code in chapter 12 for both the mutex and spinlock C files I noticed in the miscdrv_init_* functions the following comment & code which appears (I may be incorrect) to create a memory leak by not being free'd on module exit:

    /*
 * A 'managed' kzalloc(): use the 'devres' API devm_kzalloc() for mem
 * alloc; why? as the underlying kernel devres framework will take care of
 * freeing the memory automatically upon driver 'detach' or when the driver
 * is unloaded from memory
 */
ctx = kzalloc(sizeof(struct drv_ctx), GFP_KERNEL);

Then in the exit function I didn't see a kfree() call:

    static void __exit miscdrv_exit_mutexlock(void)
    {
        mutex_destroy(&lock1);
        mutex_destroy(&ctx->lock);
        misc_deregister(&llkd_miscdev);
        pr_info("LLKD misc driver %s deregistered, bye\n", llkd_miscdev.name);
    }

Should the initial allocation look like this so we don't need to free up the memory (letting the kernel handle deallocation):
ctx = devm_kzalloc(llkd_miscdev.this_device, sizeof(struct drv_ctx), GFP_KERNEL);

Or maybe the kfree()/kzfree() in the exit function just got missed being included? :-)

This has been one of the best kernel books I've read to date, looking forward to finishing the last chapter and moving onto book 2.

Brian W.

Book Translation

Hi,Kaiwan
I am a kernel developer and really interested this book. Has Linux-Kernel-Programming been translated with chinese and puliblished in China(I haven't seen it)?IIf not, is there any plan,I wannt to try it if possible。
Good luck!!!

ch6: viewing userspace stack for bash using gdb

Hello @kaiwan

I was reading ch6, section entitled Viewing the user space stack of a given thread or process in which you mentioned that the pstack utility on Ubuntu doesn't work as expected and it's better to use gdb to view the userspace backtrace. So I attempted this exercise but I am unable to make it work.

I'll describe my steps here:

  1. Attach gdb to the running kernel (loaded into memory) using the /proc/kcore file like so:
cd $LinuxSrcDir
sudo gdb vmlinux /proc/kcore
  1. attach to the thread running bash (in my case TID 1283) like so:
(gdb) attach 1283

After a warning stating there being a 'build ID' mismatch vmlinux and /usr/bin/bash, it gives me this:

image

Issuing a bt gives me this:
image

I can understand the missing function names for the userspace portion of the process. However, the kernel space of that process should have shown something. Right? Reading the procfs entry for TID 1283 however makes a lot more sense.
image

Am I doing this incorrectly, or is this something to be expected? By the way, FYI this kernel has been compiled with the -ggdb flag passed to the top level makefile via KCFLAGS so I know the symbols are intact.

Some typos in this book.

Hi @kaiwan, Thanks for writing this excellent book.
After reading this book thoroughly, i found some typos which you may could help to correct it.

  1. In Page 155, Double 'w' used at line "// ch4/helloworld_lkm/hellowworld_lkm.c". red line marked as below.
    image

  2. In Page 307, at line " first and second columns in the preceding code block that represent the TGID and PID respectively". From the configure below, In fact, " first and second columns in the preceding code block that represent the PID and TGID respectively"
    image

  3. In Page 656, at last line, "In place of atomic64_dec_if_positive(), use
    atomic64_dec_if_positive()." It may should be "In place of atomic_dec_if_positive(), use
    atomic64_dec_if_positive()."
    image

Thanks
Hengwei

Not all recommended packages available in Ubuntu 18.04 LTS repos

Running the following command from Chapter 2: Kernel Workspace Setup I get some errors I thought I would report:

$ sudo apt install git fakeroot build-essential tar ncurses-dev tar xz-utils libssl-dev bc stress python3-distutils libelf-dev linux-headers-$(uname -r) bison flex libncurses5-dev util-linux net-tools linux-tools-$(uname -r) exuberant-ctags cscope sysfsutils gnome-system-monitor curl perf-tools-unstable gnuplot rt-tests indent tree pstree smem libnuma-dev numactl hwloc bpfcc-tools sparse flawfinder cppcheck tuna hexdump openjdk-14-jre trace-cmd virt-what
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Note, selecting 'libncurses5-dev' instead of 'ncurses-dev'

E: Unable to locate package pstree
E: Unable to locate package tuna
E: Unable to locate package hexdump
E: Unable to locate package openjdk-14-jre

Using Ubuntu 64 bit 18.04.6 release as recommended.

Missing sample code of Part 2

I clone this repo but look like it doesn't contain the sample code for Part 2 but only the e-book. Please help check and update it.
Thanks.

Possible free_pages() issue in sample code (book text example code is correct)

Hi Kaiwan,

I was just working on the example code for chapter 8's lowlevel_mem.c and noticed that in line 126 issuing the call for pointer requesting 5 pages:
gptr5 = page_address(alloc_pages(GFP_KERNEL, 5));
But in the exit function it appears to only be freeing 3 pages:
free_pages((unsigned long)gptr5, 3);

I've confirmed the code snippet shown in the current book is correct (uses 3 in both the alloc/free). Maybe the name gptr'5' caused the use of '5' instead of '3' during the allocation in the sample file. :-)

I have to say you have one of the best books I've read on kernel development (I also loved your kernel debugging book); I can't wait to read the updated version of this book with it's release in January 2024.

All the best,

Brian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.