GithubHelp home page GithubHelp logo

nytimes / gziphandler Goto Github PK

View Code? Open in Web Editor NEW
859.0 43.0 130.0 135 KB

Go middleware to gzip HTTP responses

Home Page: https://godoc.org/github.com/NYTimes/gziphandler

License: Apache License 2.0

Go 100.00%
go middleware gzip http golang

gziphandler's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gziphandler's Issues

Gzip based on response content type

I see now GzipResponseWriter can decide if it wants to gzip the response based on response size.

I suggest also filtering on content-types. The types could be user configured or hardcoded to sane defaults

if _, ok := w.Header()[contentType]; !ok {
	// It infer it from the uncompressed body.
	w.Header().Set(contentType, http.DetectContentType(b))
}
//Use the content-type information to decide between `w.startGzip()` and `w.startNoGzip()`

If not handleContenttype it buffers entire content

IF I'm reading line 166 correctly:

if len(w.buf) >= w.minSize && handleContentType(w.contentTypes, w) && w.Header().Get(contentEncoding) == "" {

Then even if the buffer accumulates to over the minsize but it fails contenttype or content coding is set it will buffer the entire output stream before writing it out at close.

The handlecontenttype and get(contentencoding) should be the very first test on the first write and if it fails then from then on just flush directly to the real response writer and don't buffer anymore since these settings can't change once the writing has started.

Whitelist by mime type

ContentTypes(types []string) option whitelists by the full content type. In my case, the encoding and boundary don't matter to me. All I care about is the MIME type. I'd rather not have to declare all the encodings for each MIME type.

Is there an interest in accepting a PR which adds something like one of these:

  • ContentTypesMimeOnly(types []string) option
  • MimeTypes(types []string) option

This would use the standard method for parsing the mime type.

(@jprobinson, @adammck do you have thoughts on this?)

make acceptsGzip public

It would make my use of this library much easier if my code can see whether the library will consider a request for possible compression.

Would making acceptsGzip -> AcceptsGzip as a pull request be welcome?

Show the error of gw.Close()

What version of Go are you using (go version)?

go version go1.12.9 windows/amd64

What operating system and processor architecture are you using?

set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\hmc\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\hmc\go
set GOPROXY=
set GORACE=
set GOROOT=C:\go
set GOTMPDIR=
set GOTOOLDIR=C:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\hmc~1\AppData\Local\Temp\go-build408318050=/tmp/go-build -gno-record-gcc-switches

What did you do?

I startded a http server with gzip handler, with the server's param WriteTimeout set to a certain value. Then I made a request who's handle time was longer than the WriteTimeout.

What did you expect to see?

Get some logs or errors about the WriteTimeout error.

What did you see instead?

Nothing. Only saw that the connection was closed and nothing responsed.

This issue is similar to net/http: ResponseWriter.Write does not error after WriteTimeout nor is ErrorLog used. But the difference is that, gw.Close() is called in a gzip handler and it returns an error, maybe there is a way to show the error or give it to users? In my scenario, gw.Close() returned an error of internal/poll.TimeoutError, and it helped me to find my server's issue. So I think showing the error of gw.Close() helps the users.

io.Copy error when minSize > 0

In the cases when a Write is performed and minSize has not been reached yet (in essence when the check at [1] is false) the number of written bytes as returned by the method is 0. This means that when using io.Copy a ErrShortWrite is returned and the copy fails (see [2]). Perhaps this behavior should be documented, or Write should pretend len(b) bytes have been written in that case (as they will be eventually)?

[1] https://github.com/NYTimes/gziphandler/blob/master/gzip.go#L107
[2] https://golang.org/src/io/io.go#L400

The NYTimes GitHub org will be renamed to nytimes

Users / Importers of gziphandler, please note that the github.com/NYTimes org (where this repo resides) will be renamed to all lowercase github.com/nytimes. This branding change is for readability and to conform with the majority of other GitHub entities and open source projects.

When the organization name changes, users will still be able to download, import, and use any of our NYTimes libraries with the old casing (NYTimes). Therefore, there shouldn’t be any action taken on the user’s side as long as you have your dependencies managed through Go Modules or Dep.

However, once the github name changes, we will also be updating the import paths in the actual code of those libraries. Once that code is committed, we will tag a major version release. This way, users know that if they want to know this update is a breaking change and they would have to update their own import paths too.

In Go Modules, you will need to change the import path to lowercase NYTimes to nytimes and also ensure the import path includes the major version per Semantic Import Versioning rules.

The proposed date for the rename will happen during the week of March 4, 2019.

Tagged releases?

Hi! I notice that this project doesn't have any tagged releases. Would you mind adding some SemVer-compatible release tags? It would really, really help those of us using dep and similar tools.

Align Go package casing with VCS URL casing

Can we have the Go package name match casing with the version control URL casing? This subtle difference makes it harder to manage dependencies.

If backwards compatibility is a concern lol, it would actually be easier to rename the GitHub org, which users would reference just once, compared to having to rename every downstream import.

Handling HEAD requests

Right now there's not a way to respond to a HEAD request with the correct gzip headers because the headers aren't added until Write and inside the Write a writer is initialized and upon Close the gzip headers are written and you cannot have a body in a HEAD response.

"identity" Content-Encoding should also be compressed

At the moment, gziphandler only compresses responses if there is no Content-Encoding header already set in the response:

https://github.com/nytimes/gziphandler/blob/master/gzip.go#L120-L141

This is ok however the Content Encoding specification contains an identity directive [1]:

identity
Indicates the identity function (i.e., no compression or modification).
This token, except if explicitly specified, is always deemed acceptable.

This directive essentially means a no-op encoding and should be threated the same as an empty content-encoding in gziphandler, i.e. candidate for on-the-fly compression.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding#Directives

Should Content-Length be set on gzipped responses?

I see that the Content-Length header is read when a response passes through the middleware, and that it is later deleted before writing the gzipped response to prevent the EOF.

Should a Content-Length header be set after the size of the compressed content is known? What are the reasons for omitting Content-Length on gzipped responses?

Static files from ServeFile

I'm wondering if the Static files from http.ServeFile method is also supported. Especially,

It would also be 'nice to have' if the http.ServeFile method would check for the presence of xxx.gz and serve it if present and the accept header includes gzip. This is a trick that nginx does and it greatly accelerates the serving of static css, js etc files because they can be compressed just once during the build process. There's the added benefit too that they are smaller files to read, as well as needing fewer bytes on the wire

thanks

http: multiple response.WriteHeader calls

How would one use the gzip handler with non-200 responses? The following example seems like it should work but logs http: multiple response.WriteHeader calls and the response isn't compressed.

package main

import (
    "fmt"
    "net/http"
    "os"

    "github.com/gorilla/mux"
    "github.com/nytimes/gziphandler"
)

func main() {
    r := mux.NewRouter()
    r.HandleFunc("/", homeHandler)
    r.NotFoundHandler = http.HandlerFunc(notFoundHandler)
    http.Handle("/", gziphandler.GzipHandler(r))
    http.ListenAndServe(":"+os.Getenv("PORT"), r)
}

func homeHandler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "Hello, World!\n")
}

func notFoundHandler(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusNotFound)
    fmt.Fprintf(w, "404 - Not Found!\n")
}

This might be related to #5 but I'm mentioning it because I thought #16 might have fixed it. This also seems to be related to gorilla/handlers#83. Although this handler just logs the error but still returns the response successfully (although uncompressed).

Here's some debug information if this helps:

$GOPATH/src/github.com/nytimes/gziphandler master
❯ git rev-parse HEAD
44668d75e46f05932cf7c1c7a375d0765b324a0b
❯ go version
go version go1.7.1 darwin/amd64

Invalid number of bytes written reported

When I have a buffer of a known length and try to write to the gzip handler, I get more bytes written than I originally sent. It think the latest commit is causing that.

b := f.buf[f.off:]
l := len(b)           // l is 104870
n, err := w.Write(b)  // w is my gziphandler (writer)
                      // n is 105079

Swappable gzip implementation?

Hi!

We at @wongnai have forked gziphandler internally to add swappable gzip implementation. In production, we swap compress/gzip with our fork of yasushi-saito/cloudflare-zlib which result in 43% less CPU percentage used in the middleware.

We haven't open sourced anything in this project yet, as they require extensive modification to all projects to make it work. I'd like to check with upstream first whether this is something you'll be willing to merge before starting to work on open sourcing it (eg. unfork the go module name).


The changes we made are:

  • Split gzipWriterPools, poolIndex, addLevelPool and their tests into another submodule
  • Add an interface for gzip implementation:
type GzipWriter interface {
	Close() error
	Flush() error
	Write(p []byte) (int, error)
}
  • The interface doesn't directly pool the underlying GzipWriter. The pooling is expected to be transparently done by the implementor of the interface. In the existing gzip implementation, the returned gzip.Writer is wrapped in a struct that when closed, also return the writer to the pool.
  • Implementations are swapped by build tag. The default build still use compress/gzip to avoid extra non-Go dependency.

For forked cloudflare-zlib and its integration with gziphandler we may open source it later after the PR made here is merged. We removed its built-in zlib code and just simply link with installed library.

export acceptsGzip?

Is there any particular reason for not exporting the acceptsGzip function?

I'm running a static file handler that serves pre-gzipped files, and it would be really useful to have access to the logic of this function for that use case.

More of a suggestion than an issue, apologies.

Great package btw! 👍

HTTP/2 Push

Hi,

I try to make the HTTP/2 push mechanism works with gZip compression.

Do you know what needs to be done to make those two mechanisms work together.

Thanks

State of the package?

There are several open issues and no update to the source for years.

What is the state of the package?

Is there a maintained alternative?

Minimum size not compatible with template package

  • go 1.8
  • Linux 64
  • I using the package with templates
  • I expect to see the template as it should 😃
  • But I get the template with some special chars. It looks like bad decoding

The problem comes from the size limitation.

If the template package is used, it can calls Write multiple times with variable length.
In any case the response writer must be the same for the all response inside the same request. So the package as is, is not good.

I made some modifications to prevent using different response writers on successive calls.

But the size limitation become pretty pointless with template or other handlers calling multiple times the Write function.

go mod depends on 1.12

The go module currently depends on an unreleased version of go. Can we bump the requires go version down to a released version?

Range-Requests aren't properly handled

When wrapping a handler that supports HTTP Range-Requests (e.g. http.FileServer), gziphandler relays them as-is, thus violating the HTTP standard and breaking clients.

That means, currently, gziphandler compresses ranges returned by the wrapped handler instead of returning ranges of the compressed output (of the complete wrapped content).

The HTTP standard basically says that Content-Length is the size of the Content-Encoding output and that range requests specify ranges into the Content-Encoding encoded output, as well.

Expected behavior: Either (a) gziphandler filters out the Accept-Ranges: bytes headers in wrapped handler responses (and any Range: headers in passed requests), or, (b) it handles Range-Requests on its own and doesn't pass them down to the wrapped handler.

Note that implementing (b) would be kind of complicated, e.g. a range that is smaller than the configured minSize would have to trigger a compression up to the range end.

How to reproduce:

Create a handler like this:

gz_handler, err := gziphandler.NewGzipLevelAndMinSize(gzip.BestCompression, 100)
if err !=  nil {
    log.Fatal(err)
}
http.Handle("/static/", http.StripPrefix("/static/",
        gz_handler(http.FileServer(http.Dir("static")))))

Request a full compressed page:

$ curl --header 'Accept-Encoding: gzip' -v -o f http://localhost:8080/static/page.css
[..]
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Content-Encoding: gzip
< Content-Type: text/css; charset=utf-8
< Vary: Accept-Encoding
< Content-Length: 373
[..]
$ file f
f: gzip compressed data, max compression, original size 804

Note how the Content-Length is the size of the compressed content, as expected.

Getting a range:

$ curl --header 'Accept-Encoding: gzip' --header 'Range: bytes=0-300' -v -o a http://localhost:8080/static/page.css
[..]
< HTTP/1.1 206 Partial Content
< Accept-Ranges: bytes
< Content-Encoding: gzip
< Content-Range: bytes 0-300/804
< Content-Type: text/css; charset=utf-8
< Content-Length: 201
[..]

Note the following issues:

  1. The Content-Length is wrong: 201 vs. 301
  2. The Content-Range information doesn't match the Content-Length in the previous request: 373 vs. 804 (the size of the uncompressed content)

And the range isn't a prefix of the previous result:

$ cmp f a
f a differ: byte 11, line 1

Another range request:

$ curl --header 'Accept-Encoding: gzip' --header 'Range: bytes=301-372' -v -o b http://localhost:8080/static/page.css
[..]
< HTTP/1.1 206 Partial Content
< Accept-Ranges: bytes
< Content-Length: 72
< Content-Range: bytes 301-372/804
< Content-Type: text/css; charset=utf-8
[... no Content-Encoding header ...]

Similar issues as before and the range isn't gzip-compressed.

net::ERR_CONTENT_LENGTH_MISMATCH

Hey, if I take the example in the readme and try to use it with a static directory server I get a net::ERR_CONTENT_LENGTH_MISMATCH error in the browser. Here's the example with the test code I've added:

package main

import (
    "io"
    "net/http"
    "os"

    "github.com/nytimes/gziphandler"
)

func main() {
    withoutGz := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Header().Set("Content-Type", "text/plain")
        io.WriteString(w, "Hello, World")
    })

    withGz := gziphandler.GzipHandler(withoutGz)

    dirWithoutGz := http.FileServer(http.Dir("./"))
    dirWithGz := gziphandler.GzipHandler(dirWithoutGz)

    http.Handle("/", withGz)
    http.Handle("/test.jpg", dirWithGz)
    http.ListenAndServe(":"+os.Getenv("PORT"), nil)
}

If you use the dirWithoutGz handler then the image returns just fine, but the dirWithGz handler errors. Am I misunderstanding something?

Here's some debug information if this helps:

$GOPATH/src/github.com/nytimes/gziphandler master
❯ git rev-parse HEAD
44668d75e46f05932cf7c1c7a375d0765b324a0b
❯ go version
go version go1.7.1 darwin/amd64

Invalid Content-Type for small initial write.

The current method of inferring the mime type of the uncompressed data is broken when multiple calls are made to Write and the first block of data is small. The current method only considers the first call to Write and not subsequent calls. As the data is being buffered for the minSize test, it makes sense to detect the mime type across the buffer rather than the first fragment.

I noticed this in my fork which has diverged significantly, so my fix and test case won't apply cleanly, but they should provide someone a solid basis for fixing it in this repository, if someone thinks it worthwhile.

Test case was added in: tmthrgd/gziphandler@9855883
Fix was added in: tmthrgd/gziphandler@4324668

http.DetectContentType considers at most 512 bytes which is also the default minSize. So detecting the mime type over the minSize buffer provides much nicer behaviour here.

A test case that applies to this repository is below:

func TestInferContentType(t *testing.T) {
	wrapper, _ := NewGzipLevelAndMinSize(gzip.DefaultCompression, len("<!doctype html"))
	handler := wrapper(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		io.WriteString(w, "<!doc")
		io.WriteString(w, "type html>")
	}))

	req1, _ := http.NewRequest("GET", "/whatever", nil)
	req1.Header.Add("Accept-Encoding", "gzip")
	resp1 := httptest.NewRecorder()
	handler.ServeHTTP(resp1, req1)
	res1 := resp1.Result()

	const expect = "text/html; charset=utf-8"
	if ct := res1.Header.Get("Content-Type"); ct != expect {
		t.Error("Infering Content-Type failed for buffered response")
		t.Logf("Expected: %s", expect)
		t.Logf("Got:      %s", ct)
	}
}

Vary: Accept-encoding header is duplicated if inner handler sets it

If the inner HTTP handler sets a Vary: Accept-Encoding header, then as the gzip middleware will always add in the same header, the output will have two identical headers.

Here is a failing test case:

func TestEnsureVaryHeaderNoDuplicate(t *testing.T) {
	handler := GzipHandler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Add(vary, acceptEncoding)
		w.Write([]byte("test"))
		w.(io.Closer).Close()
	}))

	req := httptest.NewRequest("GET", "/", nil)
	req.Header.Set(acceptEncoding, "gzip")
	w := httptest.NewRecorder()
	handler.ServeHTTP(w, req)
	assert.Equal(t, w.Header()[vary], []string{acceptEncoding})
}

I don't think the HTTP spec explicitly disallows you from having the same key/value appearing twice in the headers but it feels tidier to only have one instance of it.

gzip ignores HTTP error codes

I am using go version go1.8 linux/amd64, and I have been trying to be able to send some data through gzip. The problem is that even if I send an error code using w.WriteHeader, this library will simply ignore it and send HTTP 200 no matter what. Is there any way that you could fix this?

Implement http.Hijacker to support interaction with Websockets

Hey there,

You wrote a fantastic library and I've been using it with pleasure so far.

I have run into an issue when using the GzipHandler and gorilla/websocket (and I saw reports of similar failures for code.google.com/p/go.net/websocket):

I run into this:

websocket: response does not implement http.Hijacker

It seems the response interface of your library does not implement http.Hijacker which is needed to use websockets.

The fix appears relatively easy and several other libraries have implemented a fix. For example:

caddyserver/caddy@05957b4

https://github.com/gin-gonic/gin/pull/105/files

Other useful thread:
gin-gonic/gin#51

I probably can work around my problem by not using the GzipHandler in bulk for all my routes but I thought you'd be interested in the problem and possibly copy the fixes above.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.