In order to start making our push to go end-to-end via <a href="https://github.com/yes

Note that in the cmap example above <a href="https://github.com/yeslogic/allsorts/blob

Yup. The Cmap type is defined as: <div class="hig

Another alternative might be to do something like: <div class="highlight highlight

Implement Prototype Rust Backend about fathom HOT 9 CLOSED

yeslogic commented on April 28, 2024

Implement Prototype Rust Backend

from fathom.

Comments (9)

mikeday commented on April 28, 2024 1

I'm questioning whether generating an actual Rust struct is necessary at all, although it could be helpful for relatively simple structs with fixed field layouts. Complex structs could be represented by an abstract slice with custom accessor methods.

from fathom.

mikeday commented on April 28, 2024 1

Even simple structs will need accessors anyway for endian swapping if we want zero-copy access.

from fathom.

brendanzab commented on April 28, 2024

Yeah, that may indeed be the case. We also want the output Rust API to be reasonably predicable, so changing the specification slightly should not greatly change the resulting code.

from fathom.

brendanzab commented on April 28, 2024

Note that in the cmap example above I do just that. Note that in Rust struct fields are private by default, so the only way to interact with this API is via the accessors.

from fathom.

mikeday commented on April 28, 2024

Right, so the public API doesn't hinge on there being an actual struct, just a type with the appropriate name. (Zero copy would require some lifetime trickery to make references live as long as the main buffer, I suppose).

from fathom.

brendanzab commented on April 28, 2024

Yup. The Cmap type is defined as:

#[repr(C)]
pub struct CMap {
    version: u16,
    num_tables: u16,
    encoding_records: [EncodingRecord],
}

The [EncodingRecord] means that this struct is 'dynamically sized' - ie. it can only be accessed via a pointer indirection. The only way it can be constructed is via pub fn from_buf(buf: &[u8]) -> Result<&CMap, ()>, so it will be forever forced to be immutable.

from fathom.

brendanzab commented on April 28, 2024

Another alternative might be to do something like:

pub struct CMap<'data> {
    data: &'data [u8],
}

pub fn from_buf<'data>(data: &'data [u8]) -> Result<CMap<'data>, ()> {
    ...
}

Would be interested to see how Nom does deserialization, seeing as it claims to be zero-copy too.

Edit: Seems like nom is zero copy in the sense of it does not allocate copies of data from the input iterator. But it does require you copy data into new structs in order to make use of the parsed data. For example, the gif decoder: https://github.com/Geal/gif.rs/blob/master/src/parser.rs

from fathom.

brendanzab commented on April 28, 2024

Here are some different methods I've been exploring for the by-ref output:

use std::{mem, ptr, slice};

pub struct Bmp {
    w: u64,
    h: u64,
    data: [u8],
}

impl Bmp {
    pub fn new(buf: &[u8]) -> Result<&Bmp, ()> {
        if buf.len() < Bmp::min_size() {
            Err(())
        } else {
            let bmp = unsafe { mem::transmute::<_, &Bmp>(buf) };

            if buf.len() != bmp.exact_size() {
                Err(())
            } else {
                Ok(bmp)
            }
        }
    }

    fn min_size() -> usize {
        mem::size_of::<u64>() + // w
        mem::size_of::<u64>() // h
    }

    fn exact_size(&self) -> usize {
        Bmp::min_size() +
        mem::size_of::<u8>() * self.w() as usize * self.h() as usize // data
    }

    pub fn w(&self) -> u64 {
        u64::from_be(self.w)
    }

    pub fn h(&self) -> u64 {
        u64::from_be(self.h)
    }

    pub fn data(&self) -> &[u8] {
        unsafe { slice::from_raw_parts(self.data.as_ptr(), self.w() as usize * self.h() as usize) }
    }
}

use std::{mem, ptr, slice};

pub struct Bmp {
    data: [u8],
}

impl Bmp {
    pub fn new(buf: &[u8]) -> Result<&Bmp, ()> {
        if buf.len() < Bmp::min_size() {
            Err(())
        } else {
            let bmp = unsafe { mem::transmute::<_, &Bmp>(buf) };

            if buf.len() != bmp.exact_size() {
                Err(())
            } else {
                Ok(bmp)
            }
        }
    }

    fn min_size() -> usize {
        Bmp::w_size() + Bmp::h_size()
    }

    fn exact_size(&self) -> usize {
        Bmp::min_size() + self.data_size()
    }

    fn w_size() -> usize { mem::size_of::<u64>() }
    fn h_size() -> usize { mem::size_of::<u64>() }
    fn data_size(&self) -> usize { mem::size_of::<u8>() * self.w() as usize * self.h() as usize }

    fn w_offset() -> isize { 0 }
    fn h_offset() -> isize { Bmp::w_offset() + Bmp::w_size() as isize }
    fn data_offset() -> isize { Bmp::h_offset() + Bmp::h_size() as isize }

    pub fn w(&self) -> u64 {
        unsafe {
          let ptr = self.data.as_ptr().offset(Bmp::w_offset()) as *const u64;
          u64::from_be(ptr::read(ptr))
        }
    }

    pub fn h(&self) -> u64 {
        unsafe {
          let ptr = self.data.as_ptr().offset(Bmp::h_offset()) as *const u64;
          u64::from_be(ptr::read(ptr))
        }
    }

    pub fn data(&self) -> &[u8] {
        unsafe {
          let ptr = self.data.as_ptr().offset(Bmp::data_offset()) as *const u8;
          slice::from_raw_parts(ptr, self.data_size())
        }
    }
}

Note that the generated assembly code is identical. This is very similar to what Harfbuzz is doing.

Advantages:

all bounds checking is done up-front
no reallocation occurs because it is reusing the same bytes as the buffer

Disadvantages:

requires the full buffer to be persisted in memory
requires a great deal of unsafe code under the hood to work which could be tricky to audit
have to execute 'interp' types by need, without memoization (eg. for endian-conversions)
users might want to persist part of the struct beyond the lifetime of the buffer
not compatible with streaming

from fathom.

brendanzab commented on April 28, 2024

I wanted to experiment with the above unsafe APIs because they followed Harfbuzz's technique, but using unsafe in generated code gives me the heebyjeebies. Don’t want to hit something like that Ragel bug that hit Cloudflare…

Here are some more ideas for the 'spidery api', but this time using byte slices as they are intended top be used:

//! Pixel = {
//!     r : u8,
//!     g : u8,
//!     b : u8,
//! };
//!
//! Bmp = struct {
//!     w : u64be,
//!     h : u64be,
//!     data : [Pixel; w * h],
//!     trailer_len : u32be,
//!     trailer_data : [u32be; trailer_len],
//! };

extern crate byteorder;

use byteorder::{BigEndian, ReadBytesExt};
use std::io::Cursor;
use std::mem;

pub struct BmpDataRef<'a> {
    buf: &'a [u8],
}

impl<'a> BmpDataRef<'a> {
    pub fn new(buf: &[u8]) -> BmpDataRef {
        BmpDataRef { buf }
    }
}

pub struct BmpRef<'a> {
    buf: &'a [u8],
}

impl<'a> BmpRef<'a> {
    pub fn new(buf: &[u8]) -> BmpRef {
        BmpRef { buf }
    }

    pub fn w(&self) -> u64 {
        let offset = 0;
        Cursor::new(&self.buf[offset..])
            .read_u64::<BigEndian>()
            .unwrap()
    }

    pub fn h(&self) -> u64 {
        let offset = mem::size_of::<u64>();
        Cursor::new(&self.buf[offset..])
            .read_u64::<BigEndian>()
            .unwrap()
    }

    pub fn data(&self) -> BmpDataRef {
        let offset = mem::size_of::<u64>() * 2;
        let size = mem::size_of::<u8>() * 3 * self.w() as usize * self.h() as usize;
        BmpDataRef::new(&self.buf[offset..offset + size])
    }
}

This one is a spidery api with 'staged copying'. Advantage is that it does up-front verification in the constructor, and caches the results:

//! Pixel = {
//!     r : u8,
//!     g : u8,
//!     b : u8,
//! };
//!
//! Bmp = struct {
//!     w : u64be,
//!     h : u64be,
//!     data : [Pixel; w * h],
//!     trailer_len : u32be,
//!     trailer_data : [u32be; trailer_len],
//! };

extern crate byteorder;

use byteorder::{BigEndian, ReadBytesExt};
use std::io::{self, Cursor};
use std::mem;

#[derive(Copy, Clone)]
pub struct BmpRef<'a> {
    w: u64,
    h: u64,
    data: BmpDataCursor<'a>,
}

impl<'a> BmpRef<'a> {
    pub fn new(buf: &[u8]) -> Result<BmpRef, ()> {
        let w = Cursor::new(&buf.get(0..).ok_or(())?)
            .read_u64::<BigEndian>()
            .map_err(|_| ())?;
        let h = Cursor::new(&buf.get(mem::size_of_val(&w)..).ok_or(())?)
            .read_u64::<BigEndian>()
            .map_err(|_| ())?;

        let data = BmpDataCursor::new(&buf
            .get(mem::size_of_val(&w) + mem::size_of_val(&h)..)
            .ok_or(())?);

        Ok(BmpRef { w, h, data })
    }

    pub fn w(&self) -> u64 {
        self.w
    }

    pub fn h(&self) -> u64 {
        self.h
    }

    pub fn data(&self) -> BmpDataCursor<'a> {
        self.data
    }
}

#[derive(Copy, Clone)]
pub struct BmpDataCursor<'a> {
    buf: &'a [u8],
}

impl<'a> BmpDataCursor<'a> {
    pub fn new(buf: &[u8]) -> BmpDataCursor {
        BmpDataCursor { buf }
    }
    
    pub fn get(&self, index: usize) -> Option<Pixel> {
        unimplemented!()
    }
    
    pub fn iter(&self) -> BmpDataIter {
        unimplemented!()
    }
}

pub struct BmpDataIter {
    // TODO
}

impl Iterator for BmpDataIter {
    type Item = Pixel;
    
    fn next(&mut self) -> Option<Pixel> {
        unimplemented!()
    }
}

pub struct Pixel {
    r: u8,
    g: u8,
    b: u8,
}

from fathom.

Implement Prototype Rust Backend about fathom HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs