mikaku / fiwix Goto Github PK

View Code? Open in Web Editor NEW

412.0 19.0 33.0 1.37 MB

A UNIX-like kernel for the i386 architecture

Home Page: https://www.fiwix.org

License: Other

Makefile 0.52% C 98.16% Assembly 1.30% Nix 0.01%

kernel operating-system c posix unix-like i386 os

fiwix's People

Contributors

Stargazers

Watchers

fiwix's Issues

man page segfaults

Attempting to display a man page on anything using man segfaults. Not sure why.

[EDIT: Running mandocdb /usr/share/man produces the same result].

Passing a long argument to execve crashes the kernel

The following test program should crash Fiwix:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>

char longarg[128 * 4096 + 1];

int main() {
        char *argv[] = { "cat", NULL, NULL };
        char *envp[] = { NULL };
        memset(longarg, 'a', 128 * 4096);
        argv[1] = longarg;
        int err = execve("cat", argv, envp);
        printf("errno is %d\n", errno);
        printf("err is %d\n", err);
}

While there is argument and environment length checking in Fiwix, the checking is done too late to prevent the problem.

The GNU autoconf tool can perform checks that try to determine the maximum argument length that can be passed to a program, which triggers this crash.

A PR with a suggested fix is forthcoming.

Missing return value from read_inode

Commit e48c263 removed the return statement from read_inode in fs/inode.c.

Please revise to this:

static int read_inode(struct inode *i)
{
        int errno;

        inode_lock(i);
        errno = i->sb->fsop->read_inode(i);
        inode_unlock(i);
        return errno;
}

Incorrect passing of e820 memory map to Linux kexec guests

The following 2 excerpts are from the same live-bootstrap qemu session.

Fiwix reported the memory map as follows:

memory    0x0000000000000000-0x000000000009fbff available
          0x000000000009fc00-0x000000000009ffff reserved
          0x00000000000f0000-0x00000000000fffff reserved
          0x0000000000100000-0x00000000bffdffff available
          0x00000000bffe0000-0x00000000bfffffff reserved
          0x00000000feffc000-0x00000000feffffff reserved
          0x00000000fffc0000-0x00000000ffffffff reserved
          0x0000000100000000-0x000000013fffffff available

Then, after kexec, Linux reports:

[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d000-0x000000000009efff] reserved
[    0.000000] BIOS-e820: [mem 0x000000000009f000-0x000000000009fbfe] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009fffe] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000ffffe] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdfffe] usable
[    0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bffffffe] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000fefffffe] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000fffffffe] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013ffffffe] usable

Two oddities are visible:

All of the memory regions reported by Fiwix are missing one byte at the end, when re-reported by Linux.
An additional reservation is visible in the range 9d000-9efff. This block, for some reason, doesn't have the off-by-1 ending seen in other blocks.

Recent floating point math not compiling with tcc

Fiwix doesn't compile with tcc now, ending with this error message:

tcc: error: undefined symbol '__fixdfdi'
make: *** [Makefile:70: all] Error 1

This is a result of adding floating point math recently using PAGE_HASH_PERCENTAGE.
The floating point math can be avoided with a simple change like I've pasted below. It's not the greatest patch so if you have any suggestions let me know. This could also be changed with #ifdef __TINYC__. If you have a preference and/or would like me to submit a PR please let me know.

diff --git a/include/fiwix/config.h b/include/fiwix/config.h
index 9c55834..ff87afb 100644
--- a/include/fiwix/config.h
+++ b/include/fiwix/config.h
@@ -16,7 +16,7 @@
 #define NR_FLOCKS              (NR_PROCS * 5)  /* max. number of flocks */

 #define FREE_PAGES_RATIO       5       /* % minimum of free memory pages */
-#define PAGE_HASH_PERCENTAGE   0.1     /* % of hash buckets relative to the
+#define PAGE_HASH_PER_10K       10     /* % of % of hash buckets relative to the
                                           number of physical pages */
 #define BUFFER_PERCENTAGE      100     /* % of memory for buffer cache */
 #define BUFFER_HASH_PERCENTAGE 10      /* % of hash buckets relative to the
diff --git a/mm/memory.c b/mm/memory.c
index 2addf61..56a3a00 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -450,7 +450,7 @@ void mem_init(void)
 #endif /* CONFIG_KEXEC */

        /* the last one must be the page_table structure */
-       n = (kstat.physical_pages * PAGE_HASH_PERCENTAGE) / 100;
+       n = (kstat.physical_pages * PAGE_HASH_PER_10K) / 10000;
        n = MAX(n, 1);  /* 1 page for the hash table as minimum */
        n = MIN(n, 16); /* 16 pages for the hash table as maximum */
        page_hash_table_size = n * PAGE_SIZE;

Memory mapped files not written to disk correctly

The contents of memory mapped files created using mmap are not written correctly.

The following test program (adapted from https://stackoverflow.com/questions/26259421/use-mmap-in-c-to-write-into-memory) can reproduce the problem:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h> /* mmap() is defined in this header */
#include <fcntl.h>
#include <string.h>

void err_quit(char *msg)
{
    puts(msg);
    exit(1);
}
int main (int argc, char *argv[])
{
 int fdin, fdout;
 char *src, *dst;
 struct stat statbuf;
 int mode = 0x0777;

 if (argc != 3)
   err_quit ("usage: a.out <fromfile> <tofile>");

 /* open the input file */
 if ((fdin = open (argv[1], O_RDONLY)) < 0)
   {printf("can't open %s for reading", argv[1]);
    return 0;
   }

 /* open/create the output file */
 if ((fdout = open (argv[2], O_RDWR | O_CREAT | O_TRUNC, mode )) < 0)//edited here
   {printf ("can't create %s for writing", argv[2]);
    return 0;
   }

 /* find size of input file */
 if (fstat (fdin,&statbuf) < 0)
   {printf ("fstat error");
    return 0;
   }

 /* go to the location corresponding to the last byte */
 if (lseek (fdout, statbuf.st_size - 1, SEEK_SET) == -1)
   {printf ("lseek error");
    return 0;
   }

 /* write a dummy byte at the last location */
 if (write (fdout, "", 1) != 1)
   {printf ("write error");
     return 0;
   }

 /* mmap the input file */
 if ((src = mmap (0, statbuf.st_size, PROT_READ, MAP_SHARED, fdin, 0))
   == (caddr_t) -1)
   {printf ("mmap error for input");
    return 0;
   }

 /* mmap the output file */
 if ((dst = mmap (0, statbuf.st_size, PROT_READ | PROT_WRITE,
   MAP_SHARED, fdout, 0)) == (caddr_t) -1)
   {printf ("mmap error for output");
    return 0;
   }

 /* this copies the input file to the output file */
 memcpy (dst, src, statbuf.st_size);
 return 0;

} /* main */

The following commands will run the test.

cc mmaptest.c -o mmaptest
dd if=/dev/zero of=testin bs=1 count=10037
./mmaptest testin testout
diff testin testout

The result should be that testin and testout are the same.
Instead testout is mostly filled with "random" data.

The root cause of the problem is this line of code:

Fiwix/mm/mmap.c

Line 288 in abc14a0

write_page(pg, vma->inode, addr, length);

The code above writes the full length of the file from each and every page. So, even though a page is at most 4096 bytes, the write_page routine is trying to copy data from each page far past the end of the page boundary. This puts random data into the file.

A PR with a suggested fix is forthcoming.

Kexec implementation

Fiwix kexec implementation

Kexec is a mechanism that let's you switch into a different kernel without rebooting your machine. It differs from the Linux kexec implementation because it doesn't need a system call nor specific user-space tools, and your current system will completely shutdown before jumping to the new kernel.

The new kernel can be another Fiwix kernel or any other ELF binary kernel. It currently only supports the Multiboot 1 Specification to be passed to the new kernel, but it's open to include support for other boot methods.

How it works

Your system needs to know at boot-time that it might switch to a different kernel. That is, you need to pass some special parameters in your kernel command line during the system boot. The following are the three parameters that you need to specify:

kexec_proto=
This is the boot method of the new kernel. Currently the only supported value is multiboot1.
kexec_size=
Size in KB of the memory space to be reserved to allocate the new kernel.
kexec_cmdline=
Command line to be passed to the new kernel (enclosed in double quotes).

Example of a kernel command line:

/boot/fiwix ro root=/dev/hda2 kexec_proto=multiboot1 kexec_size=500 kexec_cmdline="fiwix ro root=/dev/hda2"

The RAMdisk drives play an important role here. You already know that they can be used to allocate an initrd image specified during the boot, or to allocate one or more all-purpose RAMdisk drives and, of course, they are also used for kexec to allocate the new kernel.

Kexec uses always the first unused RAMdisk drive. You'll know which one is the first unused depending if you specified an initrd image, or if you specified to have all-purpose RAMdisk drives at boot-time.

Fiwix is configured by default to have the following possible RAMdisk drives layout:

If an initrd image was specified at boot-time then the first RAMdisk drive is used for it (i.e: /dev/ram0).
If ramdisksize= was specified at boot-time then the first unused RAMDISK_DRIVES RAMdisk drives will be used.
If kexec_size= was specified at boot-time then the first unused RAMdisk drive will be used.

Let's see some examples to understand better these rules:

Example 1:

Kernel cmdline: ... ramdisksize=49152 initrd=/initrd.img
Resulting RAMdisk drives layout:
- /dev/ram0 -> initrd RAMdisk drive of 48MB (/ filesystem).
- if RAMDISK_DRIVES == 1
  - /dev/ram1 -> all-purpose RAMdisk drive of 48MB.
- if RAMDISK_DRIVES == 2
  - /dev/ram1 -> all-purpose RAMdisk drive of 48MB.
  - /dev/ram2 -> all-purpose RAMdisk drive of 48MB.
    ...

Example 2:

Kernel cmdline: ... ramdisksize=16384
Resulting RAMdisk drives layout:
- if RAMDISK_DRIVES == 1
  - /dev/ram0 -> all-purpose RAMdisk drive of 16MB.
- if RAMDISK_DRIVES == 2
  - /dev/ram0 -> all-purpose RAMdisk drive of 16MB.
  - /dev/ram1 -> all-purpose RAMdisk drive of 16MB.
    ...

Example 3:

Kernel cmdline: ... ramdisksize=16384 kexec_size=500 ...
Resulting RAMdisk drives layout:
- if RAMDISK_DRIVES == 1
  - /dev/ram0 -> all-purpose RAMdisk drive of 16MB.
  - /dev/ram1 -> kexec RAMdisk drive of 500KB.
- if RAMDISK_DRIVES == 2
  - /dev/ram0 -> all-purpose RAMdisk drive of 16MB.
  - /dev/ram1 -> all-purpose RAMdisk drive of 16MB.
  - /dev/ram2 -> kexec RAMdisk drive of 500KB.
    ...

Once you know what RAMdisk drive will be used to allocate the new kernel, you can proceed to copy the ELF binary kernel into such RAMdisk drive by using any user-space tool like cp or dd.

Finally, when you are ready, just do a normal shutdown to switch automatically to the new kernel.

Example:

# cp fiwix /dev/ram1
# shutdown -h 0

Boot with ram drive not working

Booting with the initrd.img ram drive from media_setup does not work. The screen output after the startup messages is:

PANIC: in init_init()
can't find /sbin/init

Fiwix version: current git code a542c56

How to reproduce:

qemu-system-x86_64 \
    -kernel fiwix \
    -initrd initrd.img \
    -append "root=/dev/ram0 ramdisksize=1024 initrd=initrd.img"

Qemu places the initrd image right after the kernel, as a multiboot module. The problem is that get_last_boot_addr is not accounting for the initrd module. Well it is, but then some newer code from April 29th can move the last address back down to right after the kernel, just below the initrd module. Then, kernel stack and data structures will overwrite the initrd image.

If I move that code above the multiboot module adjustments, the kernel boots ok:

--- a/kernel/multiboot.c
+++ b/kernel/multiboot.c
@@ -177,6 +177,11 @@ unsigned int get_last_boot_addr(unsigned int info)

        addr = strtab->sh_addr + strtab->sh_size;

+       /* no ELF header tables */
+       if(!(mbi->flags & MULTIBOOT_INFO_ELF_SHDR)) {
+               addr = (unsigned int)_end + PAGE_SIZE;
+       }
+
        /*
         * https://www.gnu.org/software/grub/manual/multiboot/multiboot.html
         *
@@ -190,11 +195,6 @@ unsigned int get_last_boot_addr(unsigned int info)
                }
        }

-       /* no ELF header tables */
-       if(!(mbi->flags & MULTIBOOT_INFO_ELF_SHDR)) {
-               addr = (unsigned int)_end + PAGE_SIZE;
-       }
-
        return P2V(addr);
 }

Support syscall chown32

This syscall is needed for musl support because musl defaults to using this version of this system call rather than SYS_chown.
This syscall supports 32-bit user and group ids (versus 16-bit for the SYS_chown syscall).

Test program is attached. Note that this requires super user permissions to change the file ownership.

# cc -m32 testchown32.c -o testchown32
# touch testfile
# ./testchown32
stat64: uid is 80000
stat64: gid is 80001
# rm testfile

testchown32.c.gz

Support syscalls ftruncate64, stat64, lstat64, fstat64, and fcntl64

These syscalls are needed for musl support because musl defaults to using 64-bit versions of these system calls.

The definition of the stat64 structure is derived from here:
https://git.musl-libc.org/cgit/musl/tree/arch/i386/bits/stat.h

The size of underlying types are determined from here:
https://git.musl-libc.org/cgit/musl/tree/include/alltypes.h.in
https://git.musl-libc.org/cgit/musl/tree/arch/i386/bits/alltypes.h.in

Test programs are attached. The first test program uses ftruncate64 to truncate a file to 100 bytes and then calls the three new stat calls to get the size. The second program calls fcntl64 to set close-on-exec for a file handle, then spawns checkfd to verify that the handle is closed.

This is the expected output:

[(root) ~]# tar xf test64.tar.gz
[(root) ~]# ./runtest.sh
1000+0 records in
1000+0 records out
1000 bytes (1.0 kB) copied, 0.003208 s, 312 kB/s
stat64: file size is 100
lstat64: file size is 100
fstat64: file size is 100
-rw-r--r-- 1 root root 100 Nov 28 16:48 testfile.bin
myfile = 3
mydupfd = 4
mycheck result = -1

test64.tar.gz

Allow removing directory in use

Linux allows removing a directory that is the current directory for a running process:

$ mkdir testdir
$ cd testdir
$ rmdir ../testdir
$

Fiwix does not:

[(root) ~]# mkdir testdir
[(root) ~]# cd testdir
[(root) ~/testdir]# rmdir ../testdir
rmdir: failed to remove '../testdir': Device or resource busy
[(root) ~/testdir]#

This issue causes live-bootstrap to fail:

flex-2.5.11: postprocess binaries.
flex-2.5.11: creating package.
flex-2.5.11: cleaning up.
rm: cannot remove directory `/steps/flex-2.5.11/build/flex-2.5.11'
Subprocess error 1
ABORTING HARD
Subprocess error 1
ABORTING HARD

WARNING: the last user process has exited. The kernel will stop itself.

It appears that programs are run in the background as part of the build process for many different packages.
These programs don't necessarily finish before the package build finishes.
I was not able to find a simple way to avoid this behavior.

I have a patch I have been applying that disables the check in Fiwix, which is in a forthcoming PR.
I haven't seen any bad side effects of removing this check but I haven't explored all the possible issues.
If there is an issue with this change perhaps it could be configurable.

An initrd of 1700K or higher will crash the kernel on boot

An initrd of 1700K or larger will push low kernel memory over the 4MB mark, but the memory over 4MB is not mapped by setup_minmem so the kernel crashes when trying to access it.

Details on how to reproduce this are described later.

This Issue can be divided into two separate issues:

An initrd that is too large should produce an appropriate error and fail more gracefully
The kernel should support larger initrd drives.

At least #1 should be fixed. This can be done by determining how big an initrd can be accommodated and checking that it can fit on startup. If it cannot fit then display an error message. Currently it just hangs or seg faults.

For #2, currently Fiwix supports an initrd large enough to boot the kernel and mount a root partition. So, opinions may vary on how important it is to support a larger initrd drive. However, the Fiwix documentation in docs/kexec.txt seems to anticipate a 48MB initrd so it seems like larger drives were intended to be supported. With a modest number of changes a larger initrd can be supported.

The following explains the cause of the problem.

Very early in the boot process, the kernel calls setup_minmem to map enough memory to boot the kernel. Later, the kernel places critical data structures such as page tables in low memory, after the kernel image and multiboot modules. And importantly, the stack (%esp) is set to a location after the multiboot modules:

Fiwix/kernel/boot.S

Lines 140 to 146 in e4c1e7e

 call get_last_boot_addr 

 popl %ecx /* restore Multiboot magic value */ 

 popl %ebx /* restore Multiboot info structure */ 

 andl $0xFFFFF000, %eax /* page aligned */ 

 addl $0x3000, %eax /* 2 whole pages for kernel stack */ 

 subl $4, %eax 

 movl %eax, %esp /* set kernel stack */

However, if an initrd module is large enough, it will push the stack pages beyond the 4MB of low mapped memory and the kernel will crash when the first value is pushed on the stack here:

Fiwix/kernel/boot.S

Lines 146 to 148 in e4c1e7e

 movl %eax, %esp /* set kernel stack */ 

 pushl %esp /* save kernel stack address */

Even if the stack fits within the first 4MB, the page tables which are allocated after the stack may not fit in 4MB and so the kernel may crash later when accessing page table memory in map_kaddr.

A proper solution seems to require mapping more low memory in the boot process to fit the multiboot modules + page_tables + kernel data structures, etc. But how much memory must be mapped is not calculated exactly until much later in the boot process (by mem_init). But we cannot wait until this late stage to map the stack page because the stack is used earlier to pass local variables to start_kernel.

Perhaps the amount of low memory to be mapped could be estimated and mapped earlier in the boot (perhaps in setup_minmem). But consider that setup_minmem would not have access to kstat which holds the size of physical memory because that structure has not been populated yet. So calculating the size of the page tables would be difficult. There might be a way to make this work but it would be difficult and would require ongoing maintenance when the size of kernel data structures change.

For these reasons, I am proposing a different idea: simply leave the stack where it was initially set on boot, below the kernel rather than above where it would take signficant work to map it properly. Then, the boot process can stay the same for the most part. After the kernel starts and determines how much low memory needs to be mapped (in the mem_init function), the kernel can map additional memory pages as needed in the normal page tables for the initrd module and kernel data structures.
Note that my PR only reserves one stack page because the kernel doesn't appear to need more than that but more pages could easily be reserved if necessary.

The forthcoming PR will allow mapping the initrd ram disk into a much larger memory area. However, at some point the initrd module may bump into another problem: it will start to conflict with memory mapped by user processes. User processes typically start their mapping at 0x08048000. If the initrd is large enough, its mapping will overlap process memory mappings. So, without changing the kernel initrd memory mapping design, an initrd must be limited to approximately 120MB. But this is still much better than 1.6MB.

The role of ramdisksize= kernel parameter

In my PR I have stopped using ramdisksize to control the size of the initrd ram drive. Using the ramdisksize parameter for this purpose is unnecessary and can lead to either an error (if too small) or wasted memory (if too large). Instead, the ram drive is sized to match the size of the provided initrd file system image. This allows a small initrd to be combined with a large ram drive (or the reverse). If ramdisksize controlled the initrd size then ramdisksize could not exceed 120MB as previously explained, and so all ram drives would be need to be limited to 120MB.

I can't think of a reason to give your initrd ram drive a size other than the size of the initrd file system anyway. In general, requiring all ram drives, including initrd, to be the same size is inflexible and while this change does not resolve that completely, it is a significant improvement, in my opinion.

Notes on Reproducing the Issue and Testing the PR

How to reproduce:

qemu-system-i386 \
    -kernel fiwix \
    -initrd initrd-2048K.img \
    -append "root=/dev/ram0 ramdisksize=2048 initrd=initrd-2048K.img"

To simplify testing, I have created initrd.img files of various sizes that can be booted standalone with qemu. In other words, they do not require a hard drive.

I have started with an initrd.img originating from fiwix.org/downloads.html. There is an initrd.img inside the FiwixOS-3.2-initrd-i386.img file system. However, this initrd.img has an fstab that mounts root from a hard drive. So, I modify the initrd /etc/fstab file to mount root from /dev/ram0.

Here is a script I used to extract initrd.img from FiwixOS-3.2-initrd-i386.img:

#!/usr/bin/env bash
set -euo pipefail

rm -rf BUILD
mkdir BUILD
cd BUILD

IMG=FiwixOS-3.2-initrd-i386.img

cp ../$IMG .
mkdir mnt
sudo mount -t ext2 $IMG mnt
cp mnt/boot/initrd.img ..
sudo umount mnt

cd ..
rm -rf BUILD

Here is a script I used to create initrd-${SIZE}.img:

#!/usr/bin/env bash
set -euo pipefail

SIZE=$1
IMGSRC=initrd.img
IMGDST=initrd-${SIZE}.img

# Prepare mounted source image
mkdir mnt.src
sudo mount -t ext2 -o loop,rw ${IMGSRC} mnt.src

# Prepare mounted destination image
qemu-img create ${IMGDST} ${SIZE}
# block size 1024, num inodes 512, revision 0, label "FIWIX"
mkfs.ext2 -b 1024 -N 512 -r 0 -L "FIWIX" ${IMGDST}
mkdir mnt.dst
sudo mount -t ext2 -o loop,rw ${IMGDST} mnt.dst

# Use tar to copy and restore device nodes
(cd mnt.src;sudo tar -c -z -p -f ../src.tgz .)
(cd mnt.dst;sudo tar xzpf ../src.tgz)
rm -f src.tgz

# Change initrd to mount itself as root rather than a separate hard drive
cat << "EOF" > fstab
/dev/ram0       /       ext2    defaults        1 1
none            /proc   proc    defaults,noauto 0 0
EOF
sudo mv fstab mnt.dst/etc/fstab

sudo umount mnt.src
rmdir mnt.src
sudo umount mnt.dst
rmdir mnt.dst

Example usage:

./create-initrd-sized.sh 2048K

Note these scripts use sudo so if sudo is not setup properly then you may need to run the script as root using su or another method.

tar can fail due to incorrect error code for missing directory

Extracting a tar file can fail on Fiwix because tar expects ENOENT for missing intermediate directories but it is getting ENOTDIR instead.

The relevant source code for tar:

https://git.savannah.gnu.org/cgit/tar.git/tree/src/extract.c#n883

This code should return -ENOENT:

Fiwix/fs/namei.c

Lines 129 to 131 in 6e036aa

 /* that's an non-existent directory */ 

 *d_res = NULL; 

 errno = -ENOTDIR;

ENOTDIR is the error code for trying to do something to a directory that is not a directory.

Support F_DUPFD_CLOEXEC for fcntl

Please support F_DUPFD_CLOEXEC for fcntl.

It is used by musl-1.1.24 library (which is used by live-bootstrap) to implement popen:
https://git.musl-libc.org/cgit/musl/tree/src/stdio/popen.c?h=v1.1.24&id=ea9525c8bcf6170df59364c4bcd616de1acf8703

Here is a test program fdcloexec.c:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>

#ifndef F_DUPFD_CLOEXEC
#define F_DUPFD_CLOEXEC        1030    /* duplicate file descriptor with close-on-exec*/
#endif

int main() {
        char cmd[100];

        int myfile = open("mytestfile", O_WRONLY | O_CREAT);
        int mydupfd = fcntl(myfile, F_DUPFD_CLOEXEC);
        //int mydupfd = fcntl(myfile, F_DUPFD);

        printf("mydupfd = %d\n", mydupfd);
        sprintf(cmd, "./checkfd %d", mydupfd);
        system(cmd);
}

Here is the second test program checkfd.c:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
        if  (argc < 2) {
                exit(1);
        }
        int result = lseek(atoi(argv[1]), 0, SEEK_SET);
        printf("checkfd result = %d\n", result);
}

To test:

# cc fdcloexec.c -o fdcloexec
# cc checkfd.c -o checkfd
# ./fdcloexec
mydupfd = 4
checkfd result = -1

The first program opens a file and passes that file number as an argument to the checkfd program which it executes with system.

If F_DUPFD_CLOEXEC is used to duplicate the file number then the checkfd result should be -1. This indicates the file number passed to checkfd was closed when the program checkfd was executed.

If a plain F_DUPFD is used the checkfd result will be 0.

A PR is forthcoming which implements the F_DUPFD_CLOEXEC option.

lseek past end of file does not zero fill hole

If you lseek past the end of a file and write then the hole that was skipped should be filled with zeros but is not.

Here is documentation of that expected behavior:
https://www.gnu.org/software/libc/manual/html_node/File-Position-Primitive.html

You can set the file position past the current end of the file. 
This does not by itself make the file longer; lseek never changes the file. 
But subsequent output at that position will extend the file. 
Characters between the previous end of file and the new position are filled with zeros.

This test program illustrates the problem:

#include <unistd.h>
#include <fcntl.h>

int main() {
        char buf[] = "testdata";
        int myfile = open("mytest", O_WRONLY | O_CREAT);
        lseek (myfile, 16, SEEK_SET);
        write(myfile, buf, 8);
        close(myfile);
}

$ cc lseekhole.c -o lseekhole
$ ./lseekhole
$ od -tx1 -Ax mytest
000000 09 2e 66 69 6c 65 09 22 6c 73 65 65 6b 68 6f 6c
000000 74 65 73 74 64 61 74 61

Expected output (which linux/gcc produces):

000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
000000 74 65 73 74 64 61 74 61

Overlapping VMA regions not merged correctly

If a program calls mmap on memory already mapped the segments are not split/merged correctly. This can result in parts of an existing segment being lost. In the case where the segment was the heap, attempts to access that part of the heap results in a segfault. (See test case 3 below).

This is partly a regression introduced by c786079

The following is a list of various kinds of segment overlaps that can occur. ("old" refers to an existing segment).

1:
old: xxxx
new:     xxxx

2:
old: xxxxxxxx
new:     xxxx

3:
old: xxxxxxxx
new: xxxx

4:
old: xxxx
new: xxxxxxxx

5:
old: xxxxxxxx
new:   xxxx

6:
old: xxxxxxxx
new:   xxxxxxxx

7:
new: xxxxxxxxx
old:   xxxxxxxxx

The new segment should be mapped fully and any overlapping part of the old segment should be discarded. For case 1, if the segments are compatible, they should be merged into one.

Background information that describes the expected behavior is provided here: https://stackoverflow.com/questions/14943990/overlapping-pages-with-mmap-map-fixed

Prior to c786079 cases 2, 3, and 5 worked properly while 1, 4, 6, and 7 caused reboots.

After c786079 case 1 did not merge, 2 and 4 worked, while 3, 5, 6 and 7 left overlapping segments. Also 4 leaks memory and 1 subjects vma memory to a double-free.

With the PR provided, 1, 2, 3, 4, 5, and 6 work properly but 7 still leaves an overlapping segment.

The following are test programs for cases 1 through 7.

Case 1:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>

#define PAGE_SIZE            4096
#define PAGE_MASK            ~(PAGE_SIZE - 1)        /* 0xFFFFF000 */

int main (int argc, char *argv[])
{
 char *mem;
 char *heap_start = malloc(1);
 unsigned int aligned = (((unsigned int) heap_start) + 8192) & PAGE_MASK;
 char inbuff[100];

 /* expand heap to more than one page */
 char *expansion = malloc(8192);

 /* mmap memory at the start of the heap, but only part of it. */
 if ((mem = mmap ((void *)aligned, 16384, 0, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0)) == (caddr_t) -1) {
    puts ("mmap error!");
    return 1;
 }

 aligned += 16384;
 /* mmap memory right after previous mmap segment */
 if ((mem = mmap ((void *)aligned, 16384, 0, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0)) == (caddr_t) -1) {
    puts ("mmap error!");
    return 1;
 }

 return 0;
}

Case 2:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>

#define PAGE_SIZE            4096
#define PAGE_MASK            ~(PAGE_SIZE - 1)        /* 0xFFFFF000 */

int main (int argc, char *argv[])
{
 char *mem;
 char *heap_start = malloc(1);
 unsigned int aligned = (((unsigned int) heap_start) + 4096) & PAGE_MASK;

 /* expand heap to more than one page */
 char *expansion = malloc(16384);

 /* mmap memory to the end of the heap. */
 if ((mem = mmap ((void *)aligned, 20480, 0, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0)) == (caddr_t) -1) {
    puts ("mmap error!");
    return 1;
 }

 return 0;
}

Case 3:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>

#define PAGE_SIZE            4096
#define PAGE_MASK            ~(PAGE_SIZE - 1)        /* 0xFFFFF000 */

int main (int argc, char *argv[])
{
 char *mem;
 char *heap_start = malloc(1);
 unsigned int aligned = ((unsigned int) heap_start) & PAGE_MASK;

 /* expand heap to more than one page */
 char *expansion = malloc(16384);

 /* mmap memory at the start of the heap, but only part of it. */
 if ((mem = mmap ((void *)aligned, 4096, 0, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0)) == (caddr_t) -1) {
    puts ("mmap error!");
    return 1;
 }

 /* test accessing the end of the heap */
 expansion[16000] = 0;
 return 0;
}

Case 4:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>

#define PAGE_SIZE            4096
#define PAGE_MASK            ~(PAGE_SIZE - 1)        /* 0xFFFFF000 */

int main (int argc, char *argv[])
{
 char *mem;
 char *heap_start = malloc(1);
 unsigned int aligned = ((unsigned int) heap_start) & PAGE_MASK;

 /* mmap memory past the end of the heap. */
 if ((mem = mmap ((void *)aligned, 16384, 0, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0)) == (caddr_t) -1) {
    puts ("mmap error!");
    return 1;
 }

 return 0;
}

Case 5:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>

#define PAGE_SIZE            4096
#define PAGE_MASK            ~(PAGE_SIZE - 1)        /* 0xFFFFF000 */

int main (int argc, char *argv[])
{
 char *mem;
 char *heap_start = malloc(1);
 unsigned int aligned = (((unsigned int) heap_start) + 4096) & PAGE_MASK;

 /* expand heap to more than one page */
 char *expansion = malloc(16384);

 /* mmap memory in the middle of the heap. */
 if ((mem = mmap ((void *)aligned, 4096, 0, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0)) == (caddr_t) -1) {
    puts ("mmap error!");
    return 1;
 }

 /* test accessing the end of the heap */
 expansion[16000] = 0;
 return 0;
}

Case 6:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>

#define PAGE_SIZE            4096
#define PAGE_MASK            ~(PAGE_SIZE - 1)        /* 0xFFFFF000 */

int main (int argc, char *argv[])
{
 char *mem;
 char *heap_start = malloc(1);
 unsigned int aligned = (((unsigned int) heap_start) + 4096) & PAGE_MASK;

 /* expand heap to more than one page */
 char *expansion = malloc(8192);

 /* mmap memory past the start of the heap and extending beyond the end of the heap. */
 if ((mem = mmap ((void *)aligned, 16384, 0, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0)) == (caddr_t) -1) {
    puts ("mmap error!");
    return 1;
 }

 return 0;
}

Case 7:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>

#define PAGE_SIZE            4096
#define PAGE_MASK            ~(PAGE_SIZE - 1)        /* 0xFFFFF000 */

int main (int argc, char *argv[])
{
 char *mem;
 char *heap_start = malloc(1);
 unsigned int aligned = (((unsigned int) heap_start) + 8192) & PAGE_MASK;
 char inbuff[100];

 /* expand heap to more than one page */
 char *expansion = malloc(8192);

 /* mmap memory at the start of the heap, but only part of it. */
 if ((mem = mmap ((void *)aligned, 16384, 0, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0)) == (caddr_t) -1) {
    puts ("mmap error!");
    return 1;
 }

 aligned -= 4096;
 /* mmap memory starting before previous mmap and overlapping it  */
 if ((mem = mmap ((void *)aligned, 16384, 0, MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0)) == (caddr_t) -1) {
    puts ("mmap error!");
    return 1;
 }

 return 0;
}

It can be helpful to print the vma regions before and after every mmap syscall by applying this patch to the kernel:

diff --git a/kernel/syscalls.c b/kernel/syscalls.c
index 5dece15..ebc65eb 100644
--- a/kernel/syscalls.c
+++ b/kernel/syscalls.c
@@ -8,6 +8,7 @@
 #include <fiwix/types.h>
 #include <fiwix/syscalls.h>
 #include <fiwix/mm.h>
+#include <fiwix/mman.h>
 #include <fiwix/stat.h>
 #include <fiwix/errno.h>
 #include <fiwix/string.h>
@@ -406,9 +407,17 @@ int do_syscall(unsigned int num, int arg1, int arg2, int arg3, int arg4, int arg
                return -ENOSYS;
        }
        current->sp = (unsigned int)&sc;
+       int retval;
+       if (num == 90) {
+               show_vma_regions(current);
+       }
 #ifdef CONFIG_SYSCALL_6TH_ARG
-       return sys_func(arg1, arg2, arg3, arg4, arg5, arg6, &sc);
+       retval = sys_func(arg1, arg2, arg3, arg4, arg5, arg6, &sc);
 #else
-       return sys_func(arg1, arg2, arg3, arg4, arg5, &sc);
+       retval = sys_func(arg1, arg2, arg3, arg4, arg5, &sc);
 #endif /* CONFIG_SYSCALL_6TH_ARG */
+       if (num == 90) {
+               show_vma_regions(current);
+       }
+       return retval;
 }
diff --git a/kernel/syscalls/old_mmap.c b/kernel/syscalls/old_mmap.c
index de28f0f..1e804f6 100644
--- a/kernel/syscalls/old_mmap.c
+++ b/kernel/syscalls/old_mmap.c
@@ -12,6 +12,7 @@
 #include <fiwix/errno.h>
 #include <fiwix/string.h>

+#define __DEBUG__
 #ifdef __DEBUG__
 #include <fiwix/stdio.h>
 #include <fiwix/process.h>

[coreboot/AMD] Page Fault at 0x00400000 (writing) with error code 0x00000002 (0b10)

@mikaku after #1 happened in QEMU , today I tested Fiwix v1.0.1 as a virtual floppy inside coreboot+SeaBIOS build for quadcore AMD A10-5750M Lenovo G505S laptop with 16GB RAM installed (great powerful coreboot-supported laptop by the way, it doesn't have Intel ME / AMD PSP hardware backdoors inside its' CPU and could be found online for $100-$150 in good condition, if any questions I'll be happy to answer)
However I also got an error, this time an earlier different one : cb-amd.txt
Memory map lines marked with <- ??? are questionable because after the first screen (ending with console) it waits for a few seconds but then dumps a lot of info very quickly and despite my best efforts I couldn't take a reliable clear photo of the middle part - it's being skipped very quickly (is there a way to slow down the logs printing?)

Multiboot not working due to kernel stack being overwritten

Using qemu to boot Fiwix (using multiboot) does not currently work for me. There is no output from the kernel.

The change d164d40 moved the kernel stack after _end but get_last_boot_addr is still using _end to calculate the start of available kernel memory. It should use _kstack now or it will cause the kernel stack to be overwritten.

To reproduce, I've built fiwix in the fiwix-issues subdirectory and run qemu like this:

qemu-system-i386
-nographic
-drive file=FiwixOS-3.2-i386.raw,format=raw,if=ide,cache=writeback,index=0
-machine pc
-cpu 486
-enable-kvm
-kernel fiwix-issues/fiwix
-append "console=/dev/ttyS0 root=/dev/hda2"
-m size=4G

With the forthcoming PR the kernel boots properly.

Please create interim source tarball release for live-bootstrap project

Live-bootstrap is currently pulling a patched source archive for Fiwix from github.com:rick-masters/Fiwix.
They prefer to use the code from the origin rather than using a fork.
Now that all patches have been incorporated into Fiwix this should be possible.

Therefore, it would be better if you could release a gzipped tar file of source code from your github or website.
Note that currently live-bootstrap has a problem with extracting files from a tarball that start with a period.
So, it is currently necessary to remove the .gitignore file and .git directory before creating the tar file.

I understand that you only do formal releases once in a while. And your releases probably get more stabilization, documentation, and testing. However, live-bootstrap is a faster moving project that tolerates a bit more instability. So, I'd like to suggest that you provide "interim" releases as needed for live-bootstrap. I have been appending -lb1, -lb2, etc. (for live-bootstrap 1, 2, etc) to the current Fiwix release number to identify interim releases supporting live-bootstrap.

It appear you have done a release in github before, but for your convenience, here are the exact steps that can produce a release from GitHub:

Clone the repo and delete .git and .gitignore
Create a tarball with .tar.gz extension such as fiwix-1.5.0-lb1.tar.gz
* Note that live-bootstrap does not support bzip2 until after booting Fiwix.
Create another tarball with .bz2 extension if you'd like.
Have the tarballs ready so you can drag and drop into github
Click on "Releases" on the right and then click on "Draft a new release" near the top
Click "Choose Tag" and enter a new tag name like v1.5.0-lb1 in the text box
Enter a title such as "Fiwix 1.5.0 for Live Bootstrap v1"
Enter a description such as "This is an interim release with changes for Live Bootstrap."
Drag and drop the tarballs that you created earlier
Click "Publish Release"

Once you have done this I will submit a change to the live-bootstrap project to reflect the new version and source and all the build changes since the last version.

You deserve tremendous gratitude from me and the live-bootstrap project for working over the past year to get through the 27 PRs that I submitted to Fiwix. Thank you!

FPU software emulation

A question, not an issue, if you please. I see: "i386 processor (with floating-point processor)" in the minimum requirements. Are you planning to implement FPU software emulation, perhaps, at some point? I'd be interesting in reviving my 386sx machine with some modern, yet lightweight OS.

Support dual boot

I would like to be able to dual boot Fiwix with other operating systems,
this will involve changing the installer to allow installing already made partitions,
and keeping /boot on the same partition.
Also, instructions for the GRUB entry would be helpful.

Support syscall utimes

The utimes syscall is needed for musl support because musl defaults to using the utimes system call rather than utime.
The musl library maps utime library call to the utimes system call.

The utimes syscall uses a timeval structures which support microseconds, which is different from utime which only supports seconds.

The forthcoming PR is mostly a copy of utime with small modifications.
Since the current Fiwix file systems do not support microseconds that field is currently ignored.

A test program is attached. The following output shows the file time being updated with the test program.

[(root) ~]# cc -m32 testutimes.c -o testutimes
[(root) ~]# rm testfile
rm: remove regular empty file 'testfile'? y
[(root) ~]# touch testfile;date
Tue Dec 26 19:25:43 GMT 2023
[(root) ~]# sleep 5   
[(root) ~]# date;./testutimes;stat testfile
Tue Dec 26 19:26:27 GMT 2023
  File: testfile
  Size: 0               Blocks: 1          IO Block: 1024   regular empty file
Device: 302h/770d       Inode: 17978       Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-12-26 19:26:27.000000000 +0000
Modify: 2023-12-26 19:26:27.000000000 +0000
Change: 2023-12-26 19:26:27.000000000 +0000
 Birth: -
[(root) ~]#

testutimes.c.gz

Compiling Fiwix with some versions of gcc requires -fno-pie

Depending on the version of gcc or linux distribution, gcc may build with the -fPIE (position independent executable) option enabled by default. PIE code is incompatible with Fiwix for several reasons. Therefore, the -fno-pie option must be added to the CFLAGS in the Makefile in order to build Fiwix properly for some builds or versions of gcc.

For me, I built Fiwix on Ubuntu 20.04 and Fiwix would not boot. There was no obvious indication what the problem was and it took me several days of difficult debugging to fully understand the issue and the solution. My version of gcc is "gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0" and it builds with -fPIE enabled by default. This can be verified by building with the gcc options "-Q -v". In contrast, the version of gcc (4.8.5) on Centos 7 does not enable -fPIE by default.

I tried to make Fiwix work with my gcc, as is, but I was unable. First, gcc with -fPIE produces several new ELF sections such as ".data.rel" which are not accounted for in the fiwix.ld linker script. After solving that, I found that gcc's PIE code populates the ebx register with the current instruction pointer and locates data structures relative to that register. However, routines in core386.S such as tlbinfo modify the ebx register without saving it, which is catastrophic. (Not consistently saving registers is a separate issue which is probably worth fixing). After solving that, I found that Fiwix copies the init_trampoline function to a different location, so data structures cannot be found in the same place, relative to the instruction pointer. Fixing that could be done by porting init_trampoline to assembly, but I eventually figured out that simply adding the -fno-pie option would get gcc to build Fiwix the way it was intended.

Support stub for syscall madvise

This feature request is to support a stub system call for madvise that always returns success.

madvise is a system call to "advise" the kernel about how you intend to use memory and is intended to increase performance.

Commonly, an application will operate correctly even if the madvise system call is ignored by the kernel. However, there are many options available and I haven't reviewed all of them and so it may not be generally a good idea to return success unless the call is actually implemented. Therefore, I have implemented this behind a config option which is disabled by default.

Note that the definition for SYS_madvise in unistd.h is not behind the config option. That's because it didn't seem like an application should require a special config option to access the SYS_madvise constant. Maybe an application or library would like to use that constant to make the system call and can handle it being unimplemented. Anyway, it doesn't matter to me if you want to wrap that with an #ifdef instead.

The reason this stub is necessary is that Live-bootstrap starts with an older version of musl (1.1.24) which calls madvise internally (in some malloc related functions) to advise the kernel it is no longer using memory and unfortunately leaves the errno variable set to an error if madvise returns an error due to not being implemented. This nonzero errno propagates to the application which can fail as a result. In the case of live-bootstrap I believe the find program is failing as part of building the linux kernel package. (It appears that a later version of musl (1.2.4) doesn't let the result of madvise affect the errno but live-bootstrap is using musl-1.1.24 at the beginning because it is simpler to build.)

I have not included a test program because there is no new functionality to test other than the return code, which is obvious from the code and also I have tested the forthcoming PR in live-bootstrap for months.

Support building with tcc compiler

The live-bootstrap project must compile Fiwix with tcc because gcc is not available until much later.
Note that the tcc used to build Fiwix must be patched to handle the physical / virtual addresss scheme used by Fiwix.
With gcc, the address scheme is handled by a linker script but tcc does not support linker scripts.

In the forthcoming PR, documentation is provided in docs/tcc.txt which explains where to get tcc, how to patch it, and how to build Fiwix.

The following is an explanation of the various changes to support tcc.
Some of these changes are significant and so I am open to discussing better alternatives.

Makefile:

Created new CCEXE variable to specify the gcc or tcc compiler
Moved CONFFLAGS to CC so that all -D flags in the same place
Created compiler-specific flags for CC, LD, and LDFLAGS
tcc does not use CPP because it does not use linker script, see fiwix.ld below
tcc does not have a VERSION variable by default, so I created one
tcc does not have a separate linker so we use tcc (which requires ARCH) for LD
tcc does not support elf_386, nostartfiles, or nodefaultlibs
tcc does not support linker script, so specify text address manually

drivers/block/ata.c:

tcc does not support 64-bit division, so replace with bit shifts

fiwix.ld:

tcc must use _start instead of start so I changed linker script to use _start
tcc does not support custom sections or the linker script at all therefore:
The _kstack section is not available, so I eliminate changing the kernel stack.
The kernel stack is kept at 0xF000 to 0x10000 as specified originally in setup_kernel.
I hestitated to remove all the _kstack related code but honestly, changing the kernel
stack does not appear to have an clear purpose and Fiwix appears to work fine without
doing that.
include/fiwix/asm.h:
tcc does not allow specifying register order so hard code the order in assembly

include/fiwix/config.h:

Make support for 64-bit printk types optional because tcc does not support 64-bit division

kernel/boot.S:

Since tcc does not support linker script custom addresses, physical addresses in the .setup section must be computed manually
SAVE_ALL - preserve ebx
tcc does not recognize pushal. Both tcc and gcc recognize pusha so that is used instead.
An .align 4 was removed because it didn't appear necessary and with tcc it pads code with nulls which caused problems.
do_switch: preserve ebx
tlbinfo: preserve ebx

kernel/init.c:

INIT_TRAMPOLINE needed to be bigger because tcc produced larger code

lib/printk.c

Make support for 64-bit printk types optional because tcc does not support 64-bit division

mm/memory.c

memmove is normally included into compiled code (I believe) for copying structures but tcc excludes it with our compile options. (gcc still includes it). So, I added in an explicit memmove function for tcc.

Memory overwrite regression in sys_getcwd

The recent change 58ca771 on March 22 changed the memory allocated for dirent_buff but did not change the size (PAGE_SIZE) passed to readdir later in the function and so readdir can read past the allocated memory. This results in a page fault when testing live-bootstrap.

You should be able to reproduce it by using the branch I provided in #27 and just applying 58ca771 to it.

The patch provide below fixes it. However, given that the buffer is being used to read multiple entries I think it might be better to also revert 58ca771 because using a larger buffer is probably more efficient.

diff --git a/kernel/syscalls/getcwd.c b/kernel/syscalls/getcwd.c
index 75838f4..3f31403 100644
--- a/kernel/syscalls/getcwd.c
+++ b/kernel/syscalls/getcwd.c
@@ -80,7 +80,7 @@ int sys_getcwd(char *buf, __size_t size)
                }
                do {
                        done = 0;
-                       bytes_read = up->fsop->readdir(up, &fd_table[tmp_fd], dirent_buf, PAGE_SIZE);
+                       bytes_read = up->fsop->readdir(up, &fd_table[tmp_fd], dirent_buf, sizeof(dirent_buf));
                        if(bytes_read < 0) {
                                release_fd(tmp_fd);
                                iput(up);

Sync drives on kernel stop

With live-bootstrap running on Fiwix there are occasionally errors that cause the build process to exit with an error.
Since live-bootstrap runs as init, and will be the last process to exit, the kernel will stop.
For debugging purposes, sometimes I extract the initrd file system from memory and examine it.
This can be done from the qemu console using the pmemsave command.

The problem is that the file system will not be sync'd and so there can be missing files or output.
It would be helpful if the file systems were sync'd before stop_kernel().

I have been using the forthcoming PR in my own testing for many months and it works well for me.

Cannot persist large files to ext2 disk correctly

There is a bug that prevents writing large files to an ext2 disk correctly.

This can be demonstrated with the following commands:

dd if=/dev/random of=bigfile bs=1024 count=150000
sha256sum bigfile
sync
reboot
# After login:
sha256sum bigfile

Result: you will either get an I/O error of a different checksum.

The problem is in the calculation of a block index for a triple-indirect block in fs/ext2/inode.c:

Fiwix/fs/ext2/inode.c

Lines 326 to 333 in 6e036aa

 if(level == EXT2_TIND_BLOCK) { 

 if(!(buf3 = bread(i->dev, indblock[block], blksize))) { 

 printk("%s(): returning -EIO\n", __FUNCTION__); 

 brelse(buf); 

 return -EIO; 

 } 

 tindblock = (__blk_t *)buf3->data; 

 block = tindblock[tblock / BLOCKS_PER_IND_BLOCK(i->sb)];

Here, tblock has not been adjusted to account for the number of blocks skipped by the last traversal.

I believe this is the appropriate code to adjust tblock before calculating the block index:

tindblock = (__blk_t *)buf3->data;
tblock -= BLOCKS_PER_DIND_BLOCK(i->sb) * block;
block = tindblock[tblock / BLOCKS_PER_IND_BLOCK(i->sb)];

Without this adjustment, tblock / BLOCKS_PER_IND_BLOCK(i->sb) will exceed the bounds (0..255) for tindblock and will write into memory beyond the size of the disk block. The block numbers stored in these indexes may be readable while in memory, but they cannot be persisted to disk through a reboot because the disk blocks only hold 256 entries. So, after rebooting and reloading the blocks from disk, indexing beyond 255 will produce an invalid block number.

Thank you, introduction, and licensing

First, this is a great project and I'd like to thank you for sharing it! It appears you started it around 25 years ago and its incredible to see the work you've put in and that you continue to make it better after all this time.

I'd like to introduce myself, my projects, and the community I work with. I'm (currently) a retired software developer who has been programming for around 40 years. This year, I joined the bootstrapping community and have contributed a primitive operating system written in hex which currently supports bootstrapping the tcc compiler using source code from stage0-posix and live-bootstrap.

The kernel I wrote is designed to be compatible with Linux and is under 4KB. However, now that it has been used to build tcc I have been working on transitioning to a more capable kernel written in C in order to proceed further with the bootstrapping effort, which requires pipes and a better file system. After an aborted effort to build Linux 2.6, I was referred to Fiwix.

I have spent the last three weeks working with Fiwix and I believe it fits our requirements for the next kernel which can carry us the rest of the way to building Linux. There is a lot of work I plan to do in order to integrate it into our bootstrap projects, but I'm pretty hopeful it can be done.

With regards to Licensing, many of the source files refer to the "Fiwix License" and I'm assuming that refers to the MIT license file in the root of the repo. If there are any files which are not under that license I would appreciate if you could point those out. I plan on filing a few issues that I ran into during my evaluation (with solutions) soon. For the record, all of my contributions will be under the same license.

Thanks again for Fiwix. I'm hopeful it will be an important component in the effort to bootstrap open source software.

About potential networking support

Hi there -- before I start this issue/suggestion, I would like to say that yes, I understand that networking is not implemented at this point in time.

Now with that out of the way; I would like to put in a potential "suggestion" of sorts for if/when networking becomes a feature or planned feature.

I have used networking before using SLIP (Serial Line IP) and I believe that before trying to implement any kind of NIC, I believe it would be helpful to try and do loopback first, SLIP, and then try to do an ethernet-based NIC later. My reasoning is that any two computers with a serial port can do SLIP, so it's hardware-agnostic.

I also know that you can do SLIP with QEMU (albeit, it's a bit of a cruddy way that I know of).

The QEMU way I know of doing would be specifying -serial pty as a qemu-system-i386 argument and then installing net-tools, and using sudo slattach /dev/pts/<whatever QEMU is using>.

I hope I'm not creating clutter. Just wanted to point out an idea for if/when TCP/IP becomes a feature.

do_divide_error() Booting problem with Fiwix v1.0.1 floppy at QEMU

Good day, @mikaku ! When I am trying to boot QEMU with Fiwix v1.0.1 floppy inside its' coreboot/SeaBIOS image as a virtual floppy, I am getting this error log with do_divide_error()
(retyped from screen by hand but I hope there are no errors) - QEMU.txt

QEMU command line that I used:
qemu-system-x86_64 -L . -m 256 -localtime -vga vmware -net nic,model=rtl8139 -net user -soundhw ac97 -usb -usbdevice tablet -bios ./build/coreboot.rom -serial stdio
where coreboot.rom (inside this coreboot.zip) is a build of coreboot+SeaBIOS for QEMU with this coreboot-config.txt as .config , to which I've added fiwix-1.0.1-i386.img as a virtual floppy with this command after the build completion: ./build/cbfstool ./build/coreboot.rom add -f ./fiwix-1.0.1-i386.img -n floppyimg/fiwix.lzma -t raw -c lzma

However, if I use almost the same QEMU command line, but with an extra option for Fiwix physical floppy plugged into QEMU's floppy drive -fda ./fiwix-1.0.1-i386.img - although still booting from that "virtual floppy" aka Ramdisk - then it detects this physical floppy as fd0 0x03F0-0x03F7 6 1.44 MB 3.5" (Intel 82078) and boots okay

P.S. Tested some other OS with floppy as a bootable media - e.g. MikeOS - added the same way as a virtual floppy, and they are working good. So this weird problem seems to be Fiwix exclusive, and I could help you to debug it

FiwixOS 3 Live Floppy (1.44MB) with initrd using RAMdisk driver - is booting fine :)

Hi there @mikaku , just wanted to tell you that I've tested a "FiwixOS 2 Live Floppy (1.44MB) with initrd using RAMdisk driver" floppy and it's still booting fine as a part of coreboot+SeaBIOS. Wish you the happy and productive times ahead ;-)

Uninitialized next pointer for pci device may crash kernel

When a pci device structure is added to the new linked list implementation its next pointer may be uninitialized. It should be set to NULL.

A new struct pci_device is initialized from a passed in variable pci_dev:

Fiwix/drivers/pci/pci.c

Line 99 in d1a53ab

*pdt = *pci_dev;

However, that structure is allocated on the stack in scan_bus so some members may be uninitialized:

Fiwix/drivers/pci/pci.c

Lines 112 to 117 in d1a53ab

 static void scan_bus(void) 

 { 

 int b, d, f; 

 unsigned int vendor_id, device_id, class; 

 unsigned char header, irq, prog_if; 

 struct pci_device pci_dev;

If pci_dev has a non-NULL next pointer, the kernel may attempt to access that memory which may be invalid and may crash the kernel.

Whether the crash is reproducible depends on the contents of stack memory which depends on many factors, so I have no easy way of providing a test case that triggers the problem. It always crashes for me and setting next to NULL fixes it.

Read from start of file right after opening with O_APPEND

In order to be consistent with Linux behavior and expectations of some software, the file position for reads should be set to zero when a file is opened with O_APPEND.

This may seem odd, but O_APPEND only specifies that writes will start at the end of the file.
The position of reads, however, is not specified in the POSIX standard and is left up to the implementation.
Linux for whatever reason chose to leave the read position at zero.
This is not well documented, but you can find anectdotes confirming this, for example:
https://cygwin.cygwin.narkive.com/6x8daLJe/bug-fopen-a-does-not-seek-to-end-of-file-until-some-write-operation
JuliaLang/julia#3374 (comment)

This can be verified empirically with the following program:

#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>

int main() {
        FILE* testfile = fopen("testfile", "w");
        fprintf(testfile, "hello\n");
        fclose(testfile);

/* Two different variations are provided here */

#if 1
        int testfd = open("testfile", O_RDWR|O_APPEND);
        int pos = lseek(testfd, 0, SEEK_CUR);
        printf("position: %d\n", pos);
        close(testfd);
#else
        testfile = fopen("testfile", "a+");
        int pos = ftell(testfile);
        printf("position: %d\n", pos);
        fclose(testfile);
#endif
}

On Linux, this will output:

position: 0

On Fiwix, this will output:

position: 6

Certain version of the m4 macro processor which is part of autotools depends on the read position being at the start of the file.
Version 1.4.10 of m4 opens "diversion" temporary files with "a+" append mode when re-inserting diverted content.
(See m4_tmpopen called by insert_diversion_helper.) It then reads the file and assumes it will be from the beginning.

I think m4 changed this in later versions, but to avoid any problems like this in the future, it would be preferable if Fiwix worked the same was as Linux.

Questions about Fiwix

Hi @mikaku,

This isn't an issue, but rather some quick questions about your project. First off, wow, what a nice project!! Very impressive how you've built such a straightforwardly-coded UNIX-like kernel, I find the sources easy to read and understand, which isn't always the case for complex kernels :)

I'm the maintainer over at ELKS, which is a Linux kernel, C library and applications for 8086 and compatible CPUs in real mode, running a segmented architecture. I was thinking it might be fun to jump in and contribute to Fiwix. However, my development environment is macOS. Do you think that might be much of an issue, substituting say x86_64-linux-musl-gcc for CC? I thought to ask before diving in to see what you think might be other gotcha's in the kernel build. We had the same issue at ELKS 3 years ago, but now the entire kernel, C lib, applications and images can all be built on macOS.

It's pretty cool how you've got a framebuffer console and /dev/fb running. I was thinking it might be fun to port over Microwindows or Nano-X over, which, now that Fiwix has UNIX sockets, should run easily directly on top of 16/24/32bpp framebuffer, and could use serial mouse rather than a dedicated kernel mouse driver. Any interest in that?

At ELKS, we have a nicely-working TCP/IP stack and application set, although in our case it runs in userland due to size constraints, which I wouldn't recommend here. But it has a nice state machine, which could possibly be somewhat easily inserted under the Fiwix socket code. Looking closer, I see it probably has the wrong license though.

Finally, is there a build script for the FiwixOS binaries, or is that all magic for the time being?

Thank you for your Fiwix project, I'm having fun reading the kernel code. Nicely done!

Support override of configuration definitions

Please support making it easier to customize the configuration of Fiwix from the command line.

The configuration of Fiwix for use with live-bootstrap requires customizing 12 different parameters so it would be helpful to allow specifying this in the build scripts instead of changing the source code.

For example, to use the 2/2 memory split it should be possible to compile like so:

gcc -DCONFIG_VM_SPLIT22 ...

To use the qemu debug console, it should be possible to compile like so:

gcc -DCONFIG_QEMU_DEBUGCON ...

To expand a limit, it should be possible to compile like so:

gcc -DOPEN_MAX=1536

However, these variables cannot be specified at the command line.
Since the latest definition "wins" a definition in a source file takes precedence over a command line option.

A common way to allow command line definitions is to wrap all definitions with #ifndef.
https://stackoverflow.com/questions/41437799/overwrite-a-macro-constant-from-a-makefile

Here is an example config file for a project that does it that way:
https://github.com/gkostka/lwext4/blob/master/include/ext4_config.h
Many examples of this pattern can be found in large C programs.

There are also many variables in Fiwix which have #undef before them. I understand this is to document their existence but an #undef also prevents defining a variable at the command line. A more common method in my experience is to comment out the definition to show that the definition is avaiable but is not defined by default.
For example see FLUSH_ALL_TLBS in this file:
https://github.com/qemu/qemu/blob/master/target/ppc/mmu_helper.c

Finally, it would be helpful if the Makefiles supported passing parameters from the command line to the compiler like so:

make CONFFLAGS="-DCONFIG_VM_SPLIT22 -DOPEN_MAX=1536"

Support O_DIRECTORY flag to the open syscall

The O_DIRECTORY option signifies that an open call should only succeed if the path is a directory.
This is used, for example, by opendir in the musl library:

https://git.musl-libc.org/cgit/musl/tree/src/dirent/opendir.c

Some programs use opendir to determine if a path is a directory and so it is important to support.

The following is a test program which must be named opendir.c and run in the same directory as the source. However, any path to an existing file can be used to test whether O_DIRECTORY is working.

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

int main() {
        int dirfd = open("opendir.c", O_DIRECTORY);
        printf("dirfd is %d\n", dirfd);
}

If O_DIRECTORY is working, then the path will not open and dirfd will be -1.

A PR is forthcoming that implements O_DIRECTORY in Fiwix.

Question, Not Issue Regarding Port to 32-bit M68K architecture?

This is a beautiful OS! Holy wow!

Just wondering if there have been any thoughts about porting it to the M68K 32-bit CPUs, so that it would work on hobbiest boards, early Macs, SHARP systems, Sun2, Sun3 and more of course.

32-bit CPUs would be:

MC68020 with FPU chip
MC68EC20 with FPU chip
MC68030
MC68EC030
MC68040
MC68EC040
MC68LC040
MC68060
MC68EC060
MC68LC060
Freescale/NXP Dragonball CPUs

And more not totally listed

Support readv and writev system calls

Please support readv and writev system calls.
These are used by musl internally to implement common libc I/O routines.

I am attaching a program for testing on Fiwix.

You can test it like so:

cc test-readv.c -o test-readv
rm -f testout.txt
./test-readv

The correct output is:

fd: 3
wrote 12
read return: 12
read str: /hello world!/

test-readv.c.gz

Support linux boot protocol for kexec

The live-bootstrap project boots Linux from Fiwix which requires support for the linux boot protocol.

The implementation in the forthcoming PR is similar to the multiboot1 solution:

The kexec_protocol with value linux must be specified at the kernel command line.
The kexec_size parameter is used to configure a ram drive which holds the kernel (and initrd ram drive).
The kexec is performed when the kernel halts.

There are several design points that should be explained.

The kexec process for live-bootstrap uses both a linux kernel and an initial ram file system.
These files are built under Fiwix and then must be passed to the Fiwix kernel for kexec.
Similar to multiboot, the files are copied to a ram drive device. However, since there are two files the process is a little more complicated as explained later.

Assuming you boot Fiwix from a hard drive (and so there is no initrd ram drive device) then the kexec ram drive will be /dev/ram0.

By the way...

Note that live-bootstrap boots with a very large initrd ram drive, which is configured with the Fiwix kernel parameter ramdisksize=1179648 (approx. 1.2G). This is a problem because Fiwix compiles by default with one (additional) "general purpose" ram drive which uses the same parameter for sizing. But there is not enough memory for that.

Therefore, live-bootstrap compiles Fiwix with:

#define RAMDISK_DRIVES  0

In order to configure this setting, RAMDISK_DRIVES has been moved to include/fiwix/config.h.

A major way that linux boot differs from multiboot is the way that the size of the kernel and ram drive are conveyed to the Fiwix kernel for kexec. This is done by writing the length of the linux kernel in bytes to the ram drive as a 32-bit little-endian number followed by the length of the ram drive as another 32-bit little endian number. Then the kernel and ram drive are written to the
ram drive, sequentially, starting right after the lengths.

I am open to feedback on this design. An alternative is to configure multiple ram drives but that would require additional kexec kernel parameters. Moreover, the size of the linux kernel and initramfs must be specified exactly in the linux boot protocol but those sizes will not be known until after Fiwix is booted so we cannot use boot parameters to tell Fiwix those sizes. So, those sizes need to be conveyed to the Fiwix kernel somehow as part of initiating the kexec. The design I came up with allows using the same kexec kernel parameters that multiboot uses.

The Linux boot parameters takes more than one page so the size of the boot memory was increased to two pages and the location adjusted accordingly.

Note that the disabling IRQs does not work for booting Linux so I moved that code to after the kexec code.

Note that some of the code to prepare the linux boot parameters was adapted from the Limine project and so the copyright notices reflect that.

Test files

The following files are used to aid in testing:

linux-4.9.10
kexec-linux.c
initramfs.cpio

Unfortunately I could not upload initramfs.cpio.gz because at 133MB it was too large for github. I'm not sure how to send it to you but you can email me at grick23 at gmail.com and perhaps we can arrange something. Or you can follow the following instructions to generate the file yourself. In the mean time it is possible to test kexec without a initramfs as explained later.

Note, for future reference, the following section describes how these test files were produced.

Producing test files

Clone and get ready to run the live-bootstrap project:

git clone https://github.com/fosslinux/live-bootstrap
cd live-bootstrap

In the file steps/jump/linux.sh replace this line:

    kexec-linux "/dev/ram1" "/boot/linux-4.9.10" "!$(command -v gen_init_cpio) /initramfs.list"

With this:

    $(command -v gen_init_cpio) /initramfs.list > /initramfs.cpio || true
    sync; read line
    kexec-linux "/dev/ram1" "/boot/linux-4.9.10" "!$(command -v gen_init_cpio) /initramfs.list"

Start live-bootstrap:

./rootfs.py --qemu

After 60 minutes or so live-bootstrap will hit the read line command and stop for input.
You will see the following on the screen:

linux-4.9.10: install to fakeroot.
linux-4.9.10: postprocess binaries.
linux-4.9.10: creating package.
linux-4.9.10: cleaning up.
linux-4.9.10: installing package.
linux-4.9.10: build successful
Unrecognized nod format '/dev/sda 600 0 0 b Hr Lr' line 7071
Unrecognized nod format '/dev/sda1 600 0 0 b Hr Lr' line 7072
Unrecognized nod format '/dev/sda2 600 0 0 b Hr Lr' line 7073
Unrecognized nod format '/dev/sda3 600 0 0 b Hr Lr' line 7074
Unrecognized nod format '/dev/sdb 600 0 0 b Hr Lr' line 7075
Unrecognized nod format '/dev/sdb1 600 0 0 b Hr Lr' line 7076
Unrecognized nod format '/dev/sdb2 600 0 0 b Hr Lr' line 7077
Unrecognized nod format '/dev/sdc 600 0 0 b Hr Lr' line 7078
Unrecognized nod format '/dev/sdc1 600 0 0 b Hr Lr' line 7079
Unrecognized nod format '/dev/sdc2 600 0 0 b Hr Lr' line 7080
Unrecognized nod format '/dev/sdc3 600 0 0 b Hr Lr' line 7081

At this point, you must go to the qemu console by pressing ctrl-a followed by c.
At the qemu prompt type the following commands, which will copy the ram disk to a file:

QEMU 4.2.1 monitor - type 'help' for more information
(qemu) pmemsave 0x001c6000 0x48000000 initrd.suspend
(qemu) quit

Now you can mount the ram disk and extract the linux kernel image, the ram drive cpio file used by linux, and the kexec-linux.c launching code:

cd ~/live-bootstrap
mkdir mnt
sudo mount -t ext2 initrd.suspend mnt
cp mnt/boot/linux-4.9.10 ~/
cp mnt/initramfs.cpio ~/
sudo umount mnt
cp steps/kexec-linux-1.0.0/files/kexec-linux.c ~/

Note that for testing on FiwixOS you'll need to change the following line in kexec-linux.c from this:

        reboot(RB_HALT_SYSTEM);

to this:

        system("halt");

The kexec-linux.c program I am providing here is also different in that it does not require providing an initial ram drive image.

Testing kexec linux

Once you have the test files, you can try to kexec Linux from Fiwix.

You'll need to build the Fiwix kernel with the CONFIG_KEXEC parameter.

echo "#define CONFIG_KEXEC" > include/fiwix/custom_config.h
make clean
make CONFFLAGS="-DCUSTOM_CONFIG_H"

You'll need to copy the test files to the Fiwix hard drive.

Note: I'm using the same FiwixOS-3.2-i386.raw downloaded from Fiwix.org except that the S0 terminal has been uncommented near the bottom of /etc/inittab to enable /dev/ttyS0 so text / serial console works with qemu.

BEWARE: RUNNING THIS TEST WILL REFORMAT THE HARD DRIVE!
BEWARE: RUNNING THIS TEST WILL REFORMAT THE HARD DRIVE!
BEWARE: RUNNING THIS TEST WILL REFORMAT THE HARD DRIVE!

You'll need to use the correct qemu command line to launch Fiwix:

qemu-system-i386 \
    -enable-kvm \
    -nographic \
    -machine pc \
    -cpu 486 \
    -drive file=FiwixOS-3.2-i386.raw,format=raw,if=ide,cache=writeback,index=0 \
    -nic user,ipv6=off,model=e1000 \
    -kernel fiwix \
    -append "console=/dev/ttyS0 root=/dev/hda2 kexec_proto=linux kexec_size=280000 kexec_cmdline=\"init=/init console=ttyS0\"" \
    -m size=4G

cc -m32 kexec-linux.c -o kexec-linux
./kexec-linux /dev/ram0 ./linux-4.9.10 ./initramfs.cpio
# After about 30 seconds linux should boot and start running gcc commands

Note that an initramfs image is optional. If not used, a size of zero should be written to the ram drive.
I used this to boot Linux directly with a hard drive root using the FiwixOS 3.2 image.
Yes, Linux can boot FiwixOS!

The steps required for this are to change the qemu append parameter:

    -append "console=/dev/ttyS0 root=/dev/hda2 kexec_proto=linux kexec_size=280000 kexec_cmdline=\"root=/dev/sda2 init=/sbin/init console=ttyS0\"" \

After booting Fiwix, change device from hda to sda:

# sed -i 's/hda/sda/g' /etc/rc.d/rc.sysinit

Then run kexec-linux without an initramfs file name:

# ./kexec-linux /dev/ram0 ./linux-4.9.10

kexec-linux.c.gz
linux-4.9.10.gz

Bad interprocess communication when using select() on UNIX sockets

The implementation of select() on UNIX sockets seems a bit buggy. After #83 things were improved but there is still some bad communication between two processes using the select() system call.

The following are two programs downloaded from here, that help to test and see the problem.

Define the SOCKETNAME to just "mysocket" in both programs, and compile them.

How to test

Boot your FiwixOS and after login in console execute the server program. Then from two serial tty or two console tty execute the client program on each tty. Once you have the three programs running, go to one of the clients and type hello and press ENTER, you should see hello in the other client. Try the same multiple times and also from the other client. You'll see that what you type does not always appear in the other side.

If you want to have two serial lines under QEMU (ttyS0 and ttyS1) add the following lines:

       -chardev pty,id=pciserial \
       -device pci-serial,chardev=pciserial \
       -serial pty

Support syscall mmap2

Please support syscall mmap2.
mmap2 is similar to mmap except that the offset argument is multiplied by 4096 to compute the effective offset.
This supports very large offsets for very large files.
The mmap2 system call is used by the musl C library to implement mmap, so mmap2 is needed for musl even if large offsets are not needed.

mmap2 takes 6 arguments which requires CONFIG_SYSCALL_6TH_ARG to be set.
For this reason, mmap2 is defined behind CONFIG_MMAP2.

For testing, both CONFIG_SYSCALL_6TH_ARG and CONFIG_MMAP2 should be set.
A custom_config.h can be used for this purpose like so:

echo '#define CONFIG_SYSCALL_6TH_ARG' > include/fiwix/custom_config.h
echo '#define CONFIG_MMAP2' >> include/fiwix/custom_config.h
make clean
make CONFFLAGS="-DCUSTOM_CONFIG_H"

A test program for mmap2 and a test script to build and run the test program is attached.
The test script produces an input file with random data and the runs the test program.
The test program mmap's the input file and output file with an offset of 4096.
The test program then copies data from the input file to the output file using memory.

The test script then extracts the contents of the input and output file starting with offset 4096 and verifies that they are the same.
mmap2-test.c.gz
mmap2.sh.gz

Create a .gitignore?

Hey there, I built Fiwix but I noticed that Git was tracking files that shouldn't be getting into the source tree, such as object files.
Could you look into creating a .gitignore file to ensure in the future nothing unwanted gets into the source tree? I ran make clean and it seems to only delete *.o files, fiwix and System.map.gz. You could add these files to a gitignore, newline-delimited.

ex.

*.o
fiwix
System.map.gz

Support syscall getdents64

testgetdents64.c.gz
The getdents64 syscall is needed for musl support because musl defaults to using the 64-bit versions of this system call.

Test program is attached. This following will output information about the files in the current directory.

cc -m32 testgetdents64.c -o testgetdents64
./testgetdents64

Fiwix does not build with newest gcc cross-compiler

Following the instructions here, I am trying to build Fiwix with the newest gcc cross-compiler but it fails with many errors like this:

ld: kernel/init.o:(.bss+0x0): multiple definition of `vcbuf'; kernel/gdt.o:(.bss+0x30): first defined here

Starting in version 10, gcc has adopted -fno-common as a default option. See here for details.

There is quite a bit of discussion on whether common variables are allowed by "the standards" and it appears to me that supporting them is "optional" for a C compiler. I have my own opinion on the use of common variables which I'll share if asked but I'll defer to your judgment on how to resolve this.

If you think gcc is being too strict, the "-fcommon" option can be added to the CFLAGS and the problem is resolved.

Alternatively, the variables can be declared once in .c files and declared extern in headers. Also, the "-fno-common" flag could be added to CFLAGS to ensure the problem is flagged in the future for older versions of gcc. This is substantially more work as there are over a dozen variables to address.

I'm happy to submit submit a PR for either option.

Pass through 64-bit PAE memory entries as part of kexec

Fiwix passes on a memory table as part of the kexec process.
These memory entries should include 64-bit ranges to support PAE
This will allow Linux to use more memory in the live-bootstrap project.

The forthcoming PR was originally submitted to the rick-masters fork of Fiwix by @Googulator:
rick-masters#1

Ideas for memory management for large initrd ramdisks

Hello @mikaku and @rick-masters,

I'm opening a new issue to continue discussion from #78 (comment) so that the topic is more easily followed. I'm repeating below that last post for completeness here:

Can you explain in a bit more detail what the problem is here? It seems to me that MultiBoot first loads the optional initrd into physical RAM just after the kernel text/data (just before kpage_dir). Then mem_init remaps that initrd RAM above PAGE_OFFSET before loading the final kernel GDT, right?
So how is initrd taking up user space virtual addresses? Aren't all the initrd addresses in KVA?

Initially the Fiwix kernel only had the virtual memory split 3/1 (3GB user / 1GB kernel) and so the maximum available physical memory was 1GB, regardless if your PC had more memory. In this case the RAMdisk drives or the initrd were limited to a maximum of around 950MB, but if you tried to use all the memory, you cannot even login to the system because there would not be enough memory for the user applications.

In #34, @rick-masters suggested a patch to increase the limit of the size in the initrd images to be more than 1GB, which is a requirement for their Bootstrappable project. But somehow we agreed that the patch was a bit tricky.

So, in my to attempt to fix this, I presented a new path, the support to have the virtual memory split 2/2 (2GB user / 2GB kernel) with the new kernel option CONFIG_VM_SPLIT22 (disabled by default). This way you can have easily an initrd of 1.2GB in size and you still have around 750MB for the user space.

Recently though, people in the Bootstrappable project complained that with very large initrd files (let's say 1.5GB) they only have around 450MB for user, even when the system has 4GB of physical memory. They ask if there would be a change to use the unused 2GB of physical memory for the initrd files and the rest for the user space.

Thanks for the background information, very interesting.

I continue thinking how can I implement this in a decent way (no tricky) but so far I'm failing on this.

If I understand the problem correctly, Fiwix can be configured for 2/2 (2GB user / 2GB kernel) or 3/1 (3GB user / 1 GB kernel) but the kernel virtual address space size is the limiting factor for the amount of physical memory managed, as currently all physical memory is permanently mapped into kernel VA, including initrd/ramdisks.

Thus, with a 2/2 system and 1.5G initrd, the max physical memory allowed and mappable is 2GB, which then leaves only .5GB left for user space (less kernel text/data). Likewise, with a 3/1 system, max physical RAM is 1GB with even less ramdisk and user space available.

Why do RAMdisks have to be permanently mapped into KVA? They're a normal block driver, and accessed only through the FS ops table. It would seem that the ramdisk driver could be easily updated to use a pre-specified set of PTEs, say a single PT managing 4MB of memory, that could be updated by the ramdisk driver routine to access ramdisk data outside of otherwise mapped kernel memory. This memory would be allocated directly above _last_data_addr, and not be included in kstat.physical_pages.

Depending on the total actual physical RAM available, the kernel could allocate what is needed or specified for user space in kstat.physical_pages, then use addresses just above that for RAMdisks, using a single PT to access the entire ramdisk addressable range for caching purposes. Of course, any Multiboot initrd would have to be copied from lower memory once, but this could be done by the ramdisk init routine, leaving the lower memory available for user space and not marked PAGE_RESERVED.

The ramdisk PTEs would operate very much like how the framebuffer mmap now works, by containing an otherwise unknown-to-kernel physical memory address (along with the PAGE_NOALLOC bit, although not strictly needed).

This design removes all ramdisk contents from the physical memory previously managed by Fiwix, at the cost of possibly updating PTE entries outside the "cached" (or should I call it "bank-switched") window into ramdisk contents. Since most frequently accessed ramdisk contents would be cached in the buffer system anyways, the benefit of having much more memory available for user space outweighs any speed decrease, it would seem.

Does this idea sound worth pursuing? What do you think?

Check diff_dev variable before accessing tmp_inode

The following code is problematic:

Fiwix/kernel/syscalls/getcwd.c

Lines 99 to 106 in 74753ab

 diff_dev = up->dev != cur->dev; 

 if(diff_dev) { 

 if(parse_namei(d_ptr->d_name, up, &tmp_ino, 0, FOLLOW_LINKS)) { 

 /* keep going if sibling dirents fail */ 

 break; 

 } 

 } 

 if((d_ptr->d_ino == cur->inode && !diff_dev) || (tmp_ino->inode == cur->inode && diff_dev)) {

Consider the situation in which diff_dev is zero (false). In this case, tmp_ino will NOT be set by parse_namei (nor any other previous code).
However, in the if statement at the bottom, tmp_ino is accessed before checking diff_dev and in the case where diff_dev is zero, tmp_ino will be unitialized, resulting in a possible fault.

This issue causes page faults failures in live-bootstrap like so:

 +> tcc -c putenv_stub.c                                                    
 +> tcc -static -o /usr/bin/make getopt.o getopt1.o ar.o arscan.o commands.o default.o dir.o expand.o file.o function.o implicit.o job.o main.o misc.o  
 +> make --version                                                          
                                                                            
Page Fault at 0x000002b6 (reading) with error code 0x00000000 (0b0)

Note that the specific scenario in which this occurs is not consistent due to timing issues and is difficult to create a simple test case for. It is also possible that gcc might optimize this code to check diff_dev first but tcc may not perform the same optimization. So, unfortunately I can only provide a PR which fixes the issue but I don't have a test case other than running live-bootstrap.

There is obviously no downside to checking diff_dev first before checking the other criteria.
A && B->C is always the same as B->C && A and can only be safer so this improvement should be self-evident.

Live-bootstrap does not work on hard drive but does on ram drive

I am using a kernel which can be built this way:

git clone https://github.com/rick-masters/fiwix fiwix-1.4.0-lb-73f4c28
cd fiwix-1.4.0-lb-73f4c28
git checkout fiwix-1.4.0-lb-73f4c28
make CONFFLAGS="-DCONFIG_MMAP2 -DCONFIG_64BIT_SYSCALLS -DNR_PROCS=4096 -DCHILD_MAX=4096 -DOPEN_MAX=1536 -DNR_OPENS=1536 -DDEFAULT_ROOT_FLAGS=0 -DINIT_PROGRAM=\"\\\"/init\\\"\" -DUTS_SYSNAME=\"\\\"Linux\\\"\""

By the way, this is a fork based on your commit 73f4c28 from March 22, 2023. This is the most recent commit that I was able to get working with my changes. If I include the next commit 58ca771 then live-bootstrap fails with page faults in the cp command. I'm looking into that but for now I am unable to provide a fork based on the latest code. However, most of your recent ATA/PCI changes are in the fork I have provided.

With that, testing with hard drive does not work:

git clone https://github.com/rick-masters/live-bootstrap
cd live-bootstrap
git checkout kernel-bootstrap-v2-fiwix
git submodule update --init --recursive
cp ~/fiwix-1.4.0-lb-73f4c28/fiwix ./kernel
./rootfs.py --qemu --kernel kernel --kernel-fiwix-hd

The hard drive variation fails in various ways. Usually it is a lockup but it may also be a page fault or qemu may simply exit for an unknown reason.

Testing with ram drive does work:

git clone https://github.com/rick-masters/live-bootstrap
cd live-bootstrap
git checkout kernel-bootstrap-v2-fiwix
git submodule update --init --recursive
cp ~/fiwix-1.4.0-lb-73f4c28/fiwix ./kernel
./rootfs.py --qemu --kernel kernel --kernel-fiwix-rd

A successful run ends after Linux has been compiled and there is a mount error "Only root can do that". This is a normal ending for Fiwix because Fiwix cannot perform a Linux kexec.

I should note that for both cases there may be a long delay (about a minute) after the "Booting from ROM..." message is displayed. I am not sure why there is a delay and it may depend on your version of qemu but it does start eventually. This delay is an unrelated problem.

For hard drive testing, A difference in behavior can be produced by changing include/fiwix/ata.h:

From this:

#define WAIT_FOR_DISK   (1 * HZ)

To this:

#define WAIT_FOR_DISK   (100 * HZ)

This change will improve the failure rate from 100% to 15% for me. It may appear to work but it does fail regularly so this is not a full solution. For me this change also causes a startup delay for Fiwix of over 6 minutes!

Some information on live-bootstrap may be helpful. Fundamentally live-bootstrap is just creating a drive image and then launching qemu. The scripts that launch qemu are named bootfiwix-rd and bootfiwix-hd and can be found in the root of the live-bootstrap repository. The ram drive and hard drive variations use the exact same ext2 file system. The only difference is that with the hard drive method an MBR with partition table and some padding sectors are prepended to the file system and the qemu options are different.

Fault on accessing minimal user stack with syscall

If a user process tries to access a stack page for the first time using a system call, it may cause a page fault.

This problem was seen with the make program while building linux but the setup to reproduce the problem would be impractical, so I created a small test program. It took quite a bit of investigation and testing to isolate the specific problem.

The following code can be used to trigger the problem. Admittedly, this test is not something anyone would run into casually, but I do believe it this is "valid" code that should work and represents what I think make was doing when it faulted.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/stat.h>
#include <errno.h>

int main() {
        char filename[3670] = "stacktest";
        struct stat statinfo;

        /*
        puts("hello");
        __asm__ (
                        "mov $0x6A,%eax\t\n"
                        "mov $0xbffff00a,%ebx\t\n"
                        "mov $0xbfffefc8,%ecx\t\n"
                        "int $0x80"
                );
        printf("my size is %u\n", statinfo.st_size);
        */
        printf("filename is 0x%08x\n", (unsigned int) filename);
        printf("statinfo is 0x%08x\n", (unsigned int) &statinfo);
}

You may need to do perform some manual adjustments to make this work.
First, compile and run the program as is:

cc stacktest.c -o stacktest
./stacktest
filename is 0xbffff00a
statinfo is 0xbfffefc8

Note the address of filename and statinfo. It is important that filename and statinfo are in different memory pages. You may need to adjust the size of filename in order to make this happen.

Now change the assembly code (if necessary) to match those addresses for setting %ebx and %ecx, respectively, and uncomment that code.

You should be able to run the program now and get the correct size of the file:

cc stacktest.c -o stacktest
./stacktest
hello
my size is 586769
filename is 0xbffff00a
statinfo is 0xbfffefc8

The assembly code is executing a stat system call and is the equivalent of stat(filename, &statinfo);.

Now, edit the code and remove the line puts("hello");. Compile and run again. For me, this produces a segfault.

The issue is that the stack page for statinfo is not mapped in.
Normally, any access of an unmapped stack page from user land would cause the page to be mapped in by a page fault. The puts("hello"); call triggers this mechanism when it pushes a call frame onto the stack. However, when that is removed, the program is using a system call to write to that memory for the first time. In this case, the fault occurs in the kernel and it appears the normal mechanism for growing the stack does not trigger and a page fault terminates the program instead.

It seems there are a couple of ways to resolve this. At first, I thought the problem was simply that a lack of sufficient stack space was being mapped for the user program. I developed a solution with that in mind. I wrote a change in which the user program
is mapped a significant chunk of free stack from the start of execution. (See attached PR). This indeed resolves the problem.

After trying to create a test case I came to understand the current mechanism for growing the user stack dynamically on a fault, which examines the user process %esp register. It might be possible to create a similar mechanism for growing the stack when a
fault occurs during a system call. On the other hand, I don't see a downside to just setting up more stack memory in the vma to begin with.

	call get_last_boot_addr
	popl %ecx /* restore Multiboot magic value */
	popl %ebx /* restore Multiboot info structure */
	andl $0xFFFFF000, %eax /* page aligned */
	addl $0x3000, %eax /* 2 whole pages for kernel stack */
	subl $4, %eax
	movl %eax, %esp /* set kernel stack */

	movl %eax, %esp /* set kernel stack */

	pushl %esp /* save kernel stack address */

	/* that's an non-existent directory */
	*d_res = NULL;
	errno = -ENOTDIR;

	if(level == EXT2_TIND_BLOCK) {
	if(!(buf3 = bread(i->dev, indblock[block], blksize))) {
	printk("%s(): returning -EIO\n", __FUNCTION__);
	brelse(buf);
	return -EIO;
	}
	tindblock = (__blk_t *)buf3->data;
	block = tindblock[tblock / BLOCKS_PER_IND_BLOCK(i->sb)];

	static void scan_bus(void)
	{
	int b, d, f;
	unsigned int vendor_id, device_id, class;
	unsigned char header, irq, prog_if;
	struct pci_device pci_dev;

	diff_dev = up->dev != cur->dev;
	if(diff_dev) {
	if(parse_namei(d_ptr->d_name, up, &tmp_ino, 0, FOLLOW_LINKS)) {
	/* keep going if sibling dirents fail */
	break;
	}
	}
	if((d_ptr->d_ino == cur->inode && !diff_dev) \|\| (tmp_ino->inode == cur->inode && diff_dev)) {