GithubHelp home page GithubHelp logo

fuse-backend-rs's Introduction

1. What is Cloud Hypervisor?

Cloud Hypervisor is an open source Virtual Machine Monitor (VMM) that runs on top of the KVM hypervisor and the Microsoft Hypervisor (MSHV).

The project focuses on running modern, Cloud Workloads, on specific, common, hardware architectures. In this case Cloud Workloads refers to those that are run by customers inside a Cloud Service Provider. This means modern operating systems with most I/O handled by paravirtualised devices (e.g. virtio), no requirement for legacy devices, and 64-bit CPUs.

Cloud Hypervisor is implemented in Rust and is based on the Rust VMM crates.

Objectives

High Level

  • Runs on KVM or MSHV
  • Minimal emulation
  • Low latency
  • Low memory footprint
  • Low complexity
  • High performance
  • Small attack surface
  • 64-bit support only
  • CPU, memory, PCI hotplug
  • Machine to machine migration

Architectures

Cloud Hypervisor supports the x86-64 and AArch64 architectures. There are minor differences in functionality between the two architectures (see #1125).

Guest OS

Cloud Hypervisor supports 64-bit Linux and Windows 10/Windows Server 2019.

2. Getting Started

The following sections describe how to build and run Cloud Hypervisor.

Prerequisites for AArch64

  • AArch64 servers (recommended) or development boards equipped with the GICv3 interrupt controller.

Host OS

For required KVM functionality and adequate performance the recommended host kernel version is 5.13. The majority of the CI currently tests with kernel version 5.15.

Use Pre-built Binaries

The recommended approach to getting started with Cloud Hypervisor is by using a pre-built binary. Binaries are available for the latest release. Use cloud-hypervisor-static for x86-64 or cloud-hypervisor-static-aarch64 for AArch64 platform.

Packages

For convenience, packages are also available targeting some popular Linux distributions. This is thanks to the Open Build Service. The OBS README explains how to enable the repository in a supported Linux distribution and install Cloud Hypervisor and accompanying packages. Please report any packaging issues in the obs-packaging repository.

Building from Source

Please see the instructions for building from source if you do not wish to use the pre-built binaries.

Booting Linux

Cloud Hypervisor supports direct kernel boot (the x86-64 kernel requires the kernel built with PVH support or a bzImage) or booting via a firmware (either Rust Hypervisor Firmware or an edk2 UEFI firmware called CLOUDHV / CLOUDHV_EFI.)

Binary builds of the firmware files are available for the latest release of Rust Hypervisor Firmware and our edk2 repository

The choice of firmware depends on your guest OS choice; some experimentation may be required.

Firmware Booting

Cloud Hypervisor supports booting disk images containing all needed components to run cloud workloads, a.k.a. cloud images.

The following sample commands will download an Ubuntu Cloud image, converting it into a format that Cloud Hypervisor can use and a firmware to boot the image with.

$ wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img
$ qemu-img convert -p -f qcow2 -O raw focal-server-cloudimg-amd64.img focal-server-cloudimg-amd64.raw
$ wget https://github.com/cloud-hypervisor/rust-hypervisor-firmware/releases/download/0.4.2/hypervisor-fw

The Ubuntu cloud images do not ship with a default password so it necessary to use a cloud-init disk image to customise the image on the first boot. A basic cloud-init image is generated by this script. This seeds the image with a default username/password of cloud/cloud123. It is only necessary to add this disk image on the first boot. Script also assigns default IP address using test_data/cloud-init/ubuntu/local/network-config details with --net "mac=12:34:56:78:90:ab,tap=" option. Then the matching mac address interface will be enabled as per network-config details.

$ sudo setcap cap_net_admin+ep ./cloud-hypervisor
$ ./create-cloud-init.sh
$ ./cloud-hypervisor \
	--kernel ./hypervisor-fw \
	--disk path=focal-server-cloudimg-amd64.raw path=/tmp/ubuntu-cloudinit.img \
	--cpus boot=4 \
	--memory size=1024M \
	--net "tap=,mac=,ip=,mask="

If access to the firmware messages or interaction with the boot loader (e.g. GRUB) is required then it necessary to switch to the serial console instead of virtio-console.

$ ./cloud-hypervisor \
	--kernel ./hypervisor-fw \
	--disk path=focal-server-cloudimg-amd64.raw path=/tmp/ubuntu-cloudinit.img \
	--cpus boot=4 \
	--memory size=1024M \
	--net "tap=,mac=,ip=,mask=" \
	--serial tty \
	--console off

Custom Kernel and Disk Image

Building your Kernel

Cloud Hypervisor also supports direct kernel boot. For x86-64, a vmlinux ELF kernel (compiled with PVH support) or a regular bzImage are supported. In order to support development there is a custom branch; however provided the required options are enabled any recent kernel will suffice.

To build the kernel:

# Clone the Cloud Hypervisor Linux branch
$ git clone --depth 1 https://github.com/cloud-hypervisor/linux.git -b ch-6.2 linux-cloud-hypervisor
$ pushd linux-cloud-hypervisor
# Use the x86-64 cloud-hypervisor kernel config to build your kernel for x86-64
$ wget https://raw.githubusercontent.com/cloud-hypervisor/cloud-hypervisor/main/resources/linux-config-x86_64
# Use the AArch64 cloud-hypervisor kernel config to build your kernel for AArch64
$ wget https://raw.githubusercontent.com/cloud-hypervisor/cloud-hypervisor/main/resources/linux-config-aarch64
$ cp linux-config-x86_64 .config  # x86-64
$ cp linux-config-aarch64 .config # AArch64
# Do native build of the x86-64 kernel
$ KCFLAGS="-Wa,-mx86-used-note=no" make bzImage -j `nproc`
# Do native build of the AArch64 kernel
$ make -j `nproc`
$ popd

For x86-64, the vmlinux kernel image will then be located at linux-cloud-hypervisor/arch/x86/boot/compressed/vmlinux.bin. For AArch64, the Image kernel image will then be located at linux-cloud-hypervisor/arch/arm64/boot/Image.

Disk image

For the disk image the same Ubuntu image as before can be used. This contains an ext4 root filesystem.

$ wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img # x86-64
$ wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-arm64.img # AArch64
$ qemu-img convert -p -f qcow2 -O raw focal-server-cloudimg-amd64.img focal-server-cloudimg-amd64.raw # x86-64
$ qemu-img convert -p -f qcow2 -O raw focal-server-cloudimg-arm64.img focal-server-cloudimg-arm64.raw # AArch64

Booting the guest VM

These sample commands boot the disk image using the custom kernel whilst also supplying the desired kernel command line.

  • x86-64
$ sudo setcap cap_net_admin+ep ./cloud-hypervisor
$ ./create-cloud-init.sh
$ ./cloud-hypervisor \
	--kernel ./linux-cloud-hypervisor/arch/x86/boot/compressed/vmlinux.bin \
	--disk path=focal-server-cloudimg-amd64.raw path=/tmp/ubuntu-cloudinit.img \
	--cmdline "console=hvc0 root=/dev/vda1 rw" \
	--cpus boot=4 \
	--memory size=1024M \
	--net "tap=,mac=,ip=,mask="
  • AArch64
$ sudo setcap cap_net_admin+ep ./cloud-hypervisor
$ ./create-cloud-init.sh
$ ./cloud-hypervisor \
	--kernel ./linux-cloud-hypervisor/arch/arm64/boot/Image \
	--disk path=focal-server-cloudimg-arm64.raw path=/tmp/ubuntu-cloudinit.img \
	--cmdline "console=hvc0 root=/dev/vda1 rw" \
	--cpus boot=4 \
	--memory size=1024M \
	--net "tap=,mac=,ip=,mask="

If earlier kernel messages are required the serial console should be used instead of virtio-console.

  • x86-64
$ ./cloud-hypervisor \
	--kernel ./linux-cloud-hypervisor/arch/x86/boot/compressed/vmlinux.bin \
	--console off \
	--serial tty \
	--disk path=focal-server-cloudimg-amd64.raw \
	--cmdline "console=ttyS0 root=/dev/vda1 rw" \
	--cpus boot=4 \
	--memory size=1024M \
	--net "tap=,mac=,ip=,mask="
  • AArch64
$ ./cloud-hypervisor \
	--kernel ./linux-cloud-hypervisor/arch/arm64/boot/Image \
	--console off \
	--serial tty \
	--disk path=focal-server-cloudimg-arm64.raw \
	--cmdline "console=ttyAMA0 root=/dev/vda1 rw" \
	--cpus boot=4 \
	--memory size=1024M \
	--net "tap=,mac=,ip=,mask="

3. Status

Cloud Hypervisor is under active development. The following stability guarantees are currently made:

  • The API (including command line options) will not be removed or changed in a breaking way without a minimum of 2 major releases notice. Where possible warnings will be given about the use of deprecated functionality and the deprecations will be documented in the release notes.

  • Point releases will be made between individual releases where there are substantial bug fixes or security issues that need to be fixed. These point releases will only include bug fixes.

Currently the following items are not guaranteed across updates:

  • Snapshot/restore is not supported across different versions
  • Live migration is not supported across different versions
  • The following features are considered experimental and may change substantially between releases: TDX, vfio-user, vDPA.

Further details can be found in the release documentation.

As of 2023-01-03, the following cloud images are supported:

Direct kernel boot to userspace should work with a rootfs from most distributions although you may need to enable exotic filesystem types in the reference kernel configuration (e.g. XFS or btrfs.)

Hot Plug

Cloud Hypervisor supports hotplug of CPUs, passthrough devices (VFIO), virtio-{net,block,pmem,fs,vsock} and memory resizing. This document details how to add devices to a running VM.

Device Model

Details of the device model can be found in this documentation.

Roadmap

The project roadmap is tracked through a GitHub project.

4. Relationship with Rust VMM Project

In order to satisfy the design goal of having a high-performance, security-focused hypervisor the decision was made to use the Rust programming language. The language's strong focus on memory and thread safety makes it an ideal candidate for implementing VMMs.

Instead of implementing the VMM components from scratch, Cloud Hypervisor is importing the Rust VMM crates, and sharing code and architecture together with other VMMs like e.g. Amazon's Firecracker and Google's crosvm.

Cloud Hypervisor embraces the Rust VMM project's goals, which is to be able to share and re-use as many virtualization crates as possible.

Differences with Firecracker and crosvm

A large part of the Cloud Hypervisor code is based on either the Firecracker or the crosvm project's implementations. Both of these are VMMs written in Rust with a focus on safety and security, like Cloud Hypervisor.

The goal of the Cloud Hypervisor project differs from the aforementioned projects in that it aims to be a general purpose VMM for Cloud Workloads and not limited to container/serverless or client workloads.

The Cloud Hypervisor community thanks the communities of both the Firecracker and crosvm projects for their excellent work.

5. Community

The Cloud Hypervisor project follows the governance, and community guidelines described in the Community repository.

Contribute

The project strongly believes in building a global, diverse and collaborative community around the Cloud Hypervisor project. Anyone who is interested in contributing to the project is welcome to participate.

Contributing to a open source project like Cloud Hypervisor covers a lot more than just sending code. Testing, documentation, pull request reviews, bug reports, feature requests, project improvement suggestions, etc, are all equal and welcome means of contribution. See the CONTRIBUTING document for more details.

Slack

Get an invite to our Slack channel, join us on Slack, and participate in our community activities.

Mailing list

Please report bugs using the GitHub issue tracker but for broader community discussions you may use our mailing list.

Security issues

Please contact the maintainers listed in the MAINTAINERS.md file with security issues.

fuse-backend-rs's People

Contributors

00xc avatar adamqqqplay avatar akitasummer avatar bergwolf avatar cbrewster avatar champ-goblem avatar changweige avatar eryugey avatar griff avatar h56983577 avatar imeoer avatar jiangliu avatar justxuewei avatar killagu avatar liubogithub avatar loheagn avatar matthiasgoergens avatar mofishzz avatar sctb512 avatar tim-zhang avatar uran0sh avatar weizhang555 avatar wllenyj avatar xuejun-xj avatar xujihui1985 avatar yyyeerbo avatar zhangjaycee avatar zizhengbian avatar zyfjeff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fuse-backend-rs's Issues

Conflicting PERFILE_DAX flag

Incompatible PERFILE_DAX flag. I only noticed about this project as Vivek Goyal pointed me to it on the fsdevel list and I checked used flags because of this patch https://marc.info/?l=linux-fsdevel&m=165002361802294&w=2
And then noticed that PERFILE_DAX flag is conflicting with FUSE_INIT_EXT

Looks like you are on a non-upstream kernel with patches?

linux master include/uapi/linux/fuse.h

...
#define FUSE_INIT_EXT (1 << 30)
#define FUSE_INIT_RESERVED (1 << 31)
/* bits 32..63 get shifted down 32 bits into the flags2 field */
#define FUSE_SECURITY_CTX (1ULL << 32)
#define FUSE_HAS_INODE_DAX (1ULL << 33)

Btw, any reason you are not using 1 << number for the flags? In my personal opinion so much easier to read...

XFSTests to Validate Functionality

We recently came across the xfstest suite used by the Linux kernel to test and verify filesystem patches.

https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/

This suite supports various filesystem types including fuse and virtiofs, and supports a wide range of tests to validate a number of conditions. We think it would be beneficial for this crate to integrate it as part of the testing regime, as another step to try and avoid regressions and bugs that could make it into releases.

For example, we have run this set of tests in a containerised environment making use of nydus 2.2.0 (which is using fuse-backend-rs version 1.10.0) provisioned with Kata 3.0.2. In total 18 out of 589 tests failed:

Failures: generic/007 generic/013 generic/088 generic/245 generic/257 generic/258 generic/263 generic/430 generic/431 generic/432 generic/433 generic/434 generic/504 generic/564 generic/571 generic/632 generic/637 generic/639
Failed 18 of 589 tests

Nydus 2.1.0 (fuse-backend-rs 0.9) fails 21 out of 589 tests:

Failures: generic/007 generic/013 generic/088 generic/131 generic/245 generic/247 generic/257 generic/258 generic/263 generic/430 generic/431 generic/432 generic/433 generic/434 generic/478 generic/504 generic/564 generic/571 generic/632 generic/637 generic/639
Failed 21 of 589 tests

Provisioning this is fairly straightforward and can be repeated with the following steps:

  • Start a new VM with a virtiofs device attached, one which uses this crate for its functionality
  • Log in to the VM and clone the above repo and build it
  • Mount the virtiofs device into the VM
    • fe mount -t virtiofs sharedFS /mnt
  • In the source directory start the tests against the virtiofs device
    • TEST_DIR=/mnt TEST_DEV=sharedFS ./check -virtiofs
  • The tests should run and output the failure results, you can check the test contents and expected output by checking the files under ./tests/**
    • The results after running the tests can be found under ./results/**

For context we found that the golang fuse library has run these tests in order to verify its functionality:

https://github.com/hanwen/go-fuse/issues?q=is%3Aissue+xfstest

On a side note, we have noticed with more recent versions of nydus that there have been some problems with stateful workloads, for example, MySQL and Minio have issues starting which look to be filesystem related. We are hoping that these tests will pick up any potential edges cases as understandably filesystems are very complex.

Customize the permissions of the VFS mountpoint directory,

the current permissions of the root directory are a default value, this cannot be customized.

    fn get_entry(&self, ino: u64) -> Entry {
        let mut attr = Attr {
            ..Default::default()
        };
        attr.ino = ino;
        #[cfg(target_os = "linux")]
        {
            attr.mode = libc::S_IFDIR | libc::S_IRWXU | libc::S_IRWXG | libc::S_IRWXO;
        }
        #[cfg(target_os = "macos")]
        {
            attr.mode = (libc::S_IFDIR | libc::S_IRWXU | libc::S_IRWXG | libc::S_IRWXO) as u32;
        }

The root inode of VFS is a pseudo-inode, and its current implementation always returns a default permission that cannot be customized. we want VFS to expose an interface to set the default ATTR

Bug in MacOS-CI

The macos-ci reports an error in macos_session.rs which blocks all pull requests.

Should we fix the disk type in the FuseSession struct?

error: usage of an `Arc` that is not `Send` or `Sync`
   --> src/transport/fusedev/macos_session.rs:117:19
    |
117 |             disk: Arc::new(Mutex::new(None)),
    |                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: the trait `Send` is not implemented for `Mutex<Option<*const __DADisk>>`
    = note: the trait `Sync` is not implemented for `Mutex<Option<*const __DADisk>>`
    = note: required for `Arc<Mutex<Option<*const __DADisk>>>` to implement `Send` and `Sync`
    = help: consider using an `Rc` instead or wrapping the inner type with a `Mutex`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#arc_with_non_send_sync
    = note: `-D clippy::arc-with-non-send-sync` implied by `-D warnings`

M2 async-io compilation issues

Hey team,

I'm trying to compile the project with async-io feature enabled on an M2 (macOS) and am getting the following error

/registry/src/index.crates.io-6f17d22bba15001f/io-uring-0.5.13/src/util.rs:19:42

19   |                 libc::MAP_SHARED | libc::MAP_POPULATE,
     |                                          ^^^^^^^^^^^^ help: a constant with a similar name exists: `MAP_PRIVATE`
     |
    ::: /Users/phristov/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libc-0.2.154/src/unix/bsd/apple/mod.rs:3334:1
     |
3334 | pub const MAP_PRIVATE: ::c_int = 0x0002;

Cargo.toml

[package]
name = "my-project"
version = "0.1.0"
edition = "2021"
build = "build.rs"

[dependencies]
fuse-backend-rs = { version = "0.12.0", features = ["async-io", "fuse-t"] }
futures = "0.3.30"
libc = "0.2.68"
signal-hook = "0.3.17"
async-trait = "0.1.80"

main.rs

use std::io::{Result as StdResult};
use fuse_backend_rs::api::filesystem::{AsyncFileSystem};

#[tokio::main]
async fn main() -> StdResult<()> {
  println!("hello wordl!");

    Ok(())
}

Any ideas on the best way to workaround the issue?

proposal: use mio poll instead of epoll to support macfuse

currently fuse-backend-rs is using epoll to polling event from fuse device, and epoll is platform specific, in order to support macfuse, we need to use kqueue when poll event from underlying fd, mio is a sophisticated option to support cross platform.

How to efficiently implement a drop file system

Defining a drop file system as a read only pass through file system that silently ignores write operations.

My plan is for open read, we'd do the open and forward the reads. For create / write we open behave like /dev/null
For read directory, if the directory exists read it normally
Otherwise treat it like an empty directory.

Reading and writing to a file descriptor that doesn't exist is undefined thinking that it does not throw an error, instead behaves like /dev/null unless it's faster to throw an error.

Getting information for files the don't exist Acts like /dev/null

Would be a lot easier if pass through was a trait with default imitation that actually does pass through.

I was thinking I would be to have a pass through in my struct and forward valid operations and mask failures. Also always disable write caching. My understanding is that their is only one trait I need to implement and that's BackendFileSystem or FileSystem

I was looking at the pass through file system defined here, as well as https://github.com/libfuse/libfuse/blob/master/example/passthrough_hp.cc My understanding is that passthrough_hp.cc doesn't do any unnecessary copies.

  1. Does the pass through do unnecessary copies?
  2. Do I always need synchronous io?
  3. Is there anything like memory mapping that I have to worry about? I really don't want the user to be able to write to the underlying FS under any circumstance.
  4. Should I disable write cashing?
  5. Would it be more performant to implement a black hole (acts acts like an empty directory and ignores all write and creation) and then set up an overlay?
  6. BackendFileSystem vs FileSystem

Status of async support

Hello, I am working on a virtual filesystem using fuse-backend-rs for tvix, a Rust implementation of Nix. Most of the underlying filesystem is built on top of async Rust + tokio, so it would be great to be able to use the AsyncFileSystem trait. I've tried doing this but I've hit some issues figuring out how to drive the filesystem with both FUSE and virtiofs.

For FUSE it looks like there is some code to support driving async FUSE tasks:

#[cfg(feature = "async_io")]
pub use asyncio::FuseDevTask;
#[cfg(feature = "async_io")]
/// Task context to handle fuse request in asynchronous mode.
mod asyncio {
use std::os::unix::io::RawFd;
use std::sync::Arc;
use crate::api::filesystem::AsyncFileSystem;
use crate::api::server::Server;
use crate::transport::{FuseBuf, Reader, Writer};
/// Task context to handle fuse request in asynchronous mode.
///
/// This structure provides a context to handle fuse request in asynchronous mode, including
/// the fuse fd, a internal buffer and a `Server` instance to serve requests.
///
/// ## Examples
/// ```ignore
/// let buf_size = 0x1_0000;
/// let state = AsyncExecutorState::new();
/// let mut task = FuseDevTask::new(buf_size, fuse_dev_fd, fs_server, state.clone());
///
/// // Run the task
/// executor.spawn(async move { task.poll_handler().await });
///
/// // Stop the task
/// state.quiesce();
/// ```
pub struct FuseDevTask<F: AsyncFileSystem + Sync> {
fd: RawFd,
buf: Vec<u8>,
state: AsyncExecutorState,
server: Arc<Server<F>>,
}
impl<F: AsyncFileSystem + Sync> FuseDevTask<F> {
/// Create a new fuse task context for asynchronous IO.
///
/// # Parameters
/// - buf_size: size of buffer to receive requests from/send reply to the fuse fd
/// - fd: fuse device file descriptor
/// - server: `Server` instance to serve requests from the fuse fd
/// - state: shared state object to control the task object
///
/// # Safety
/// The caller must ensure `fd` is valid during the lifetime of the returned task object.
pub fn new(
buf_size: usize,
fd: RawFd,
server: Arc<Server<F>>,
state: AsyncExecutorState,
) -> Self {
FuseDevTask {
fd,
server,
state,
buf: vec![0x0u8; buf_size],
}
}
/// Handler to process fuse requests in asynchronous mode.
///
/// An async fn to handle requests from the fuse fd. It works in asynchronous IO mode when:
/// - receiving request from fuse fd
/// - handling requests by calling Server::async_handle_requests()
/// - sending reply to fuse fd
///
/// The async fn repeatedly return Poll::Pending when polled until the state has been set
/// to quiesce mode.
pub async fn poll_handler(&mut self) {
// TODO: register self.buf as io uring buffers.
let drive = AsyncDriver::default();
while !self.state.quiescing() {
let result = AsyncUtil::read(drive.clone(), self.fd, &mut self.buf, 0).await;
match result {
Ok(len) => {
// ###############################################
// Note: it's a heavy hack to reuse the same underlying data
// buffer for both Reader and Writer, in order to reduce memory
// consumption. Here we assume Reader won't be used anymore once
// we start to write to the Writer. To get rid of this hack,
// just allocate a dedicated data buffer for Writer.
let buf = unsafe {
std::slice::from_raw_parts_mut(self.buf.as_mut_ptr(), self.buf.len())
};
// Reader::new() and Writer::new() should always return success.
let reader =
Reader::<()>::new(FuseBuf::new(&mut self.buf[0..len])).unwrap();
let writer = Writer::new(self.fd, buf).unwrap();
let result = unsafe {
self.server
.async_handle_message(drive.clone(), reader, writer, None, None)
.await
};
if let Err(e) = result {
// TODO: error handling
error!("failed to handle fuse request, {}", e);
}
}
Err(e) => {
// TODO: error handling
error!("failed to read request from fuse device fd, {}", e);
}
}
}
// TODO: unregister self.buf as io uring buffers.
// Report that the task has been quiesced.
self.state.report();
}
}
impl<F: AsyncFileSystem + Sync> Clone for FuseDevTask<F> {
fn clone(&self) -> Self {
FuseDevTask {
fd: self.fd,
server: self.server.clone(),
state: self.state.clone(),
buf: vec![0x0u8; self.buf.capacity()],
}
}
}
#[cfg(test)]
mod tests {
use std::os::unix::io::AsRawFd;
use super::*;
use crate::api::{Vfs, VfsOptions};
use crate::async_util::{AsyncDriver, AsyncExecutor};
#[test]
fn test_fuse_task() {
let state = AsyncExecutorState::new();
let fs = Vfs::<AsyncDriver, ()>::new(VfsOptions::default());
let _server = Arc::new(Server::<Vfs<AsyncDriver, ()>, AsyncDriver, ()>::new(fs));
let file = vmm_sys_util::tempfile::TempFile::new().unwrap();
let _fd = file.as_file().as_raw_fd();
let mut executor = AsyncExecutor::new(32);
executor.setup().unwrap();
/*
// Create three tasks, which could handle three concurrent fuse requests.
let mut task = FuseDevTask::new(0x1000, fd, server.clone(), state.clone());
executor
.spawn(async move { task.poll_handler().await })
.unwrap();
let mut task = FuseDevTask::new(0x1000, fd, server.clone(), state.clone());
executor
.spawn(async move { task.poll_handler().await })
.unwrap();
let mut task = FuseDevTask::new(0x1000, fd, server.clone(), state.clone());
executor
.spawn(async move { task.poll_handler().await })
.unwrap();
*/
for _i in 0..10 {
executor.run_once(false).unwrap();
}
// Set existing flag
state.quiesce();
// Close the fusedev fd, so all pending async io requests will be aborted.
drop(file);
for _i in 0..10 {
executor.run_once(false).unwrap();
}
}
}
}

However, this code is behind the async_io feature flag, even though the real feature flag is async-io. The code here also seems to refer to things that have been deleted like use crate::async_util::{AsyncDriver, AsyncExecutor}.

I was wondering if async is something that is supported or if its currently in a broken state and needs some more help to become functional again?

Large number of open files causes issues

We have a number of issues caused by the number of files that this crate opens in the context of running in Nydus.

The first issue is with workloads that perform a large number of filesystem operations, the longer the pod runs the more file descriptors that get collected. Nydus sets the rlimit on the host, but for some systems, this is capped at 2^20 (1048576) and can't go above this value. We have seen this cause issues with a workload where it enters a state in which it is in a constant crash loop and is unable to recover unless the pod is deleted and recreated. The pod constantly complains about OSError: [Errno 24] Too many open files yet the actual workload inside the VM is not reaching the descriptor limit.

When inspecting the nr-open count and comparing this to the ulimit within the Linux namespace for the pod on the host node, we see that nr-open is maxed out at the ulimit value. The majority of these files are currently in an open state under the Nydus process.

Having so many files in a constant open state also causes the kubelet CPU usage to increase drastically. This is because the kubelet runs cadvisor which collects metrics on the open file descriptors and the type of file descriptor (eg if it's a socket or a file). We recently opened an issue with cadvisor about this metric stat collection, which can be found here (google/cadvisor#3233), but it would be good to try and solve the issue at the source.

I assume the reason the open file descriptor is “cached” is so that the overhead of executing the open syscall is reduced? If this is the case, is there a way to automatically close a file descriptor if it's not used often? Something like a timeout on the descriptor so if it's not been accessed after x amount of time it gets closed, this allows it to be reopened when it's needed again.

Any thoughts or ideas would be greatly appreciated.

Support non-privileged Users

Thank you for providing fuse-backend-rs!

I'm currently transitioning from fuse_mt to fuse-backend-rs and ran into a really annoying issue. fuse_mt makes use of libfuse, which allows it to mount fuse file systems without being root. This, unfortunately, is not the case for fuse-backend-rs (the root part, not depending on libfuse is actually great as pure Rust code simplifies compilation a lot).

The secret sauce behind not requiring root permission is a set-uid program called fusermount which mounts the fuse FS on behalves of the user.

I have a working prototype of fuse-backend-rs using the fusermount mechanism for mounting in https://github.com/fzgregor/fuse-backend-rs. Though, it's still very messy, and I wanted to touch base with you before going forward. It entirely replaces the mount system call with fusermount at the moment, which might not be great in certain environments... I thought one could check whether the appropriate permissions are available and then use either mechanism.

So, what are your thoughts about this?

The prototype currently requires polachok/passfd#10 to land.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.