GithubHelp home page GithubHelp logo

coolspring8 / go-lolhtml Goto Github PK

View Code? Open in Web Editor NEW
26.0 3.0 4.0 6.84 MB

An idiomatic Go wrapper for Rust crate `lol-html` (Low Output Latency streaming HTML parser/rewriter)

Home Page: https://pkg.go.dev/github.com/coolspring8/go-lolhtml

License: BSD 3-Clause "New" or "Revised" License

Go 100.00%
html html-parser css-selector cgo bindings rewriting

go-lolhtml's Introduction

Hi there, I'm CoolSpring8👋

Blog

About Me
  • Passionate about exploring innovative designs
  • Thrilled by challenges that push the boundaries of the web
Fun fact 1: Often make statements long and complicated ; }
Fun fact 2:                                            ^ statement ends here

go-lolhtml's People

Contributors

coolspring8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

go-lolhtml's Issues

construct multiple rewriters from one builder

// Builds HTML-rewriter out of the provided builder. Can be called
// multiple times to construct different rewriters from the same
// builder.

It is possible to "construct different rewriters from the same builder." And if we want to get the best performance, we should do this indeed.

However, when it is combined with cgo callback functions, I find that things are getting tricky.

In cgo there are multiple rules in passing pointers. Also see this wiki page. To pass Go callback functions' pointers to C, one option is https://github.com/mattn/go-pointer, which uses a map[unsafe.Pointer]interface{} (with mutex), where a key is a C pointer and the value is the actual Go pointer.

That means, when one rewriter has ended and is freed, there's one more cleanup task to do: to delete that entry in map (or it will grow constantly, though slowly. In benchmarks which constructs a lot of rewriters, this becomes obvious.). But wait! who has a reference to that pointer? The builder, and possibly rewriters. To track if all of them has finished and will not use the pointer again becomes a bit messy.

I came up with an idea, counting the times that the pointer is copied (how many C objects holds the pointer——just realized it is like reference counting when writing this issue). In this way no extra mental load is added to users of the lib (in particular, something like Free() and FreePointers()). But my implementation turns out to be error-prone and slow. Here are some fragments:

// Will an extra map become a performance issue?
var (
	mutex        sync.Mutex
	counterStore = map[unsafe.Pointer]int{}
)

func initCount(ptrs []unsafe.Pointer) {
	mutex.Lock()
	for _, ptr := range ptrs {
		if ptr != nil {
			counterStore[ptr] = 1
		}
	}
	mutex.Unlock()
}

func increaseCount(ptrs []unsafe.Pointer) {
	mutex.Lock()
	for _, ptr := range ptrs {
		counterStore[ptr] += 1
	}
	mutex.Unlock()
}

func decreaseCount(ptrs []unsafe.Pointer) {
	mutex.Lock()
	for _, ptr := range ptrs {
		c := counterStore[ptr]
		if c <= 1 {
			unrefPointer(ptr)
		} else {
			counterStore[ptr] = c - 1
		}
	}
	mutex.Unlock()
}
		decreaseCount(rb.pointers)

	initCount([]unsafe.Pointer{doctypeHandlerPointer, commentHandlerPointer, textChunkHandlerPointer, documentEndHandlerPointer})

	initCount([]unsafe.Pointer{elementHandlerPointer, commentHandlerPointer, textChunkHandlerPointer})

	initCount([]unsafe.Pointer{p})

		rb.pointers = append(rb.pointers, p)
		increaseCount(rb.pointers)

		decreaseCount(r.pointers)

Therefore, I have decided not to give users the ability to build multiple rewriters from one builder in the near future (unless someone helps me out with a nice solution! :p). And there's one benchmark, which shows that building one rewriter from scratch can be generally considered acceptable:

goos: windows
goarch: amd64
pkg: github.com/coolspring8/go-lolhtml
BenchmarkNewRewriterBuilder
BenchmarkNewRewriterBuilder/Builder
BenchmarkNewRewriterBuilder/Builder-4         	 3625416	       297 ns/op
BenchmarkNewRewriterBuilder/BuilderWithEmptyDocumentHandler
BenchmarkNewRewriterBuilder/BuilderWithEmptyDocumentHandler-4         	  387134	      3140 ns/op
BenchmarkNewRewriterBuilder/BuilderWithEmptyElementHandler
BenchmarkNewRewriterBuilder/BuilderWithEmptyElementHandler-4          	  342513	      3607 ns/op
BenchmarkNewRewriterBuilder/BuilderWithElementHandler
BenchmarkNewRewriterBuilder/BuilderWithElementHandler-4               	  323997	      3504 ns/op
BenchmarkNewRewriterBuilder/BuilderWithElementHandlerAndBuild
BenchmarkNewRewriterBuilder/BuilderWithElementHandlerAndBuild-4       	  161566	      8348 ns/op
BenchmarkNewRewriterBuilder/Writer
BenchmarkNewRewriterBuilder/Writer-4                                  	  150004	      9706 ns/op
BenchmarkNewRewriterBuilder/BuildMultipleRewriterFromOneBuilder
BenchmarkNewRewriterBuilder/BuildMultipleRewriterFromOneBuilder-4     	  166668	      8697 ns/op
PASS

Writer is a slight wrapper function around BuilderWithElementHandlerAndBuild, doing a little more work in Go. Note that rewriters in the BuildMultipleRewriterFromOneBuilder case are not freed (I bundled the free function in C and the job of delete pointer entries, so it was impossible to even call free function without causing panic). Some time might have been wasted in memory allocation, GC or so.

I remember before I made that change, BuildMultipleRewriterFromOneBuilder had a 7000~ ns/op performance while Writer's is 10000~ ns/op.

So, well, it does seem acceptable.

lolhtml for arm?

I tested on macos and it works fine, I plan to try it on arm specifically armv7
In the build folder I don't see any library for arm, can you create a library for arm, or guide so I can build?
I hope it works on arm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.