- Passionate about exploring innovative designs
- Thrilled by challenges that push the boundaries of the web
Fun fact 1: Often make statements long and complicated ; }
Fun fact 2: ^ statement ends here
An idiomatic Go wrapper for Rust crate `lol-html` (Low Output Latency streaming HTML parser/rewriter)
Home Page: https://pkg.go.dev/github.com/coolspring8/go-lolhtml
License: BSD 3-Clause "New" or "Revised" License
Fun fact 1: Often make statements long and complicated ; }
Fun fact 2: ^ statement ends here
go-lolhtml/build/include/lol_html.h
Lines 222 to 224 in dac674c
However, when it is combined with cgo callback functions, I find that things are getting tricky.
In cgo there are multiple rules in passing pointers. Also see this wiki page. To pass Go callback functions' pointers to C, one option is https://github.com/mattn/go-pointer, which uses a map[unsafe.Pointer]interface{} (with mutex), where a key is a C pointer and the value is the actual Go pointer.
That means, when one rewriter has ended and is freed, there's one more cleanup task to do: to delete that entry in map (or it will grow constantly, though slowly. In benchmarks which constructs a lot of rewriters, this becomes obvious.). But wait! who has a reference to that pointer? The builder, and possibly rewriters. To track if all of them has finished and will not use the pointer again becomes a bit messy.
I came up with an idea, counting the times that the pointer is copied (how many C objects holds the pointer——just realized it is like reference counting when writing this issue). In this way no extra mental load is added to users of the lib (in particular, something like Free() and FreePointers()). But my implementation turns out to be error-prone and slow. Here are some fragments:
// Will an extra map become a performance issue?
var (
mutex sync.Mutex
counterStore = map[unsafe.Pointer]int{}
)
func initCount(ptrs []unsafe.Pointer) {
mutex.Lock()
for _, ptr := range ptrs {
if ptr != nil {
counterStore[ptr] = 1
}
}
mutex.Unlock()
}
func increaseCount(ptrs []unsafe.Pointer) {
mutex.Lock()
for _, ptr := range ptrs {
counterStore[ptr] += 1
}
mutex.Unlock()
}
func decreaseCount(ptrs []unsafe.Pointer) {
mutex.Lock()
for _, ptr := range ptrs {
c := counterStore[ptr]
if c <= 1 {
unrefPointer(ptr)
} else {
counterStore[ptr] = c - 1
}
}
mutex.Unlock()
}
decreaseCount(rb.pointers)
initCount([]unsafe.Pointer{doctypeHandlerPointer, commentHandlerPointer, textChunkHandlerPointer, documentEndHandlerPointer})
initCount([]unsafe.Pointer{elementHandlerPointer, commentHandlerPointer, textChunkHandlerPointer})
initCount([]unsafe.Pointer{p})
rb.pointers = append(rb.pointers, p)
increaseCount(rb.pointers)
decreaseCount(r.pointers)
Therefore, I have decided not to give users the ability to build multiple rewriters from one builder in the near future (unless someone helps me out with a nice solution! :p). And there's one benchmark, which shows that building one rewriter from scratch can be generally considered acceptable:
goos: windows
goarch: amd64
pkg: github.com/coolspring8/go-lolhtml
BenchmarkNewRewriterBuilder
BenchmarkNewRewriterBuilder/Builder
BenchmarkNewRewriterBuilder/Builder-4 3625416 297 ns/op
BenchmarkNewRewriterBuilder/BuilderWithEmptyDocumentHandler
BenchmarkNewRewriterBuilder/BuilderWithEmptyDocumentHandler-4 387134 3140 ns/op
BenchmarkNewRewriterBuilder/BuilderWithEmptyElementHandler
BenchmarkNewRewriterBuilder/BuilderWithEmptyElementHandler-4 342513 3607 ns/op
BenchmarkNewRewriterBuilder/BuilderWithElementHandler
BenchmarkNewRewriterBuilder/BuilderWithElementHandler-4 323997 3504 ns/op
BenchmarkNewRewriterBuilder/BuilderWithElementHandlerAndBuild
BenchmarkNewRewriterBuilder/BuilderWithElementHandlerAndBuild-4 161566 8348 ns/op
BenchmarkNewRewriterBuilder/Writer
BenchmarkNewRewriterBuilder/Writer-4 150004 9706 ns/op
BenchmarkNewRewriterBuilder/BuildMultipleRewriterFromOneBuilder
BenchmarkNewRewriterBuilder/BuildMultipleRewriterFromOneBuilder-4 166668 8697 ns/op
PASS
Writer is a slight wrapper function around BuilderWithElementHandlerAndBuild, doing a little more work in Go. Note that rewriters in the BuildMultipleRewriterFromOneBuilder case are not freed (I bundled the free function in C and the job of delete pointer entries, so it was impossible to even call free function without causing panic). Some time might have been wasted in memory allocation, GC or so.
I remember before I made that change, BuildMultipleRewriterFromOneBuilder had a 7000~ ns/op performance while Writer's is 10000~ ns/op.
So, well, it does seem acceptable.
I tested on macos and it works fine, I plan to try it on arm specifically armv7
In the build folder I don't see any library for arm, can you create a library for arm, or guide so I can build?
I hope it works on arm
The current version of lol-html is 0.3.3: https://github.com/cloudflare/lol-html/releases/tag/v0.3.3
Is there a plan to support it?
DocEndHandler and TextChunkHandler in particular
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.