GithubHelp home page GithubHelp logo

Split a pmtiles file about pmtiles HOT 6 CLOSED

msbarry avatar msbarry commented on August 27, 2024
Split a pmtiles file

from pmtiles.

Comments (6)

bdon avatar bdon commented on August 27, 2024

Thought about this for a bit and here's what I think are the benefits/drawbacks:

  • if we were to allow split archives, we'd either want an entire leaf directory and its tiles to be in the 2nd archive, or all directories remain in the 1st archive with pointers into the 2nd for tiles. Either way, it makes writing more complicated (you can't just stream all tiles into one file in one shot) and storing filenames or file indexes in the directory entries would break the fixed-width entry design.
  • One of the value propositions of PMTiles is to hold and manage the entire archive in a single file, instead of managing thousands of different directories/files. This is obviously nice for UX reasons, like uploading and versioning data, but also has the technical benefit of working with ETag semantics over HTTP. Multiple-archives would then have multiple Tags so a single version of a resource would no longer have a single identifier. (see #24)
  • The primary target of the PMTiles design is commodity storage platforms like S3 which support an effectively unlimited object size.

The case of GitHub pages seems to be meant for versioned code/docs and some associated assets, so I don't think it's a great fit as a primary target for tile archive hosting, though being free+fast is nice and you can accomplish the same thing with expanding to directories/archives. Are there other examples out there where we need to split archives to a max piece size? 32-bit systems might be one but I'd rather not consider that in scope.

from pmtiles.

msbarry avatar msbarry commented on August 27, 2024

Agree that since the goal of pmtiles is to combine many files into one it may not make sense to split them back out again... According to https://github.com/phiresky/sql.js-httpvfs, the benefit they see for splitting a large file that you make byte range requests to from the client are:

This is needed if your hoster has a maximum file size. It can also be a good idea generally depending on your CDN since it allows selective CDN caching of the chunks your users actually use and reduces cache eviction.

Also using something like S3 is it possible to allow only range requests? A concern hosting a tileset in S3 would be a request comes from a client missing a range header and they accidentally start downloading the whole thing, which could run up bandwidth costs quickly. A split archive would partially mitigate that concern, but maybe it's not really an issue in practice?

from pmtiles.

bdon avatar bdon commented on August 27, 2024

It's not possible on raw S3 to allow only range requests. That concern is somewhat mitigated by having clients implement a rudimentary check as shown on this line: https://github.com/protomaps/PMTiles/blob/master/js/index.src.mjs#L71

In practice, it can be an issue, but it's not unique to PMTiles; the other cloud-optimized formats like COG have the same drawback. The best solution for now is to run a proxy in front of your bucket such as https://github.com/protomaps/go-pmtiles , but of course that's no longer just S3 :)

from pmtiles.

msbarry avatar msbarry commented on August 27, 2024

It's not possible on raw S3 to allow only range requests. That concern is somewhat mitigated by having clients implement a rudimentary check as shown on this line: https://github.com/protomaps/PMTiles/blob/master/js/index.src.mjs#L71

OK thanks, that check helps prevent accidental full downloads, but there's still the issue of intentional full downloads, which could start to be an issue with a 100gb full planet tileset hosted on s3 since each full download would cost the owner $10 in egress fees.

I was thinking of using pmtiles for the planetiler demo site (~500MB mbtiles file on github pages) but if splitting a pmtiles archive doesn't make sense then I can stick with the current approach of extracting all of the tiles to individual files.

from pmtiles.

bdon avatar bdon commented on August 27, 2024

OK thanks, that check helps prevent accidental full downloads, but there's still the issue of intentional full downloads, which could start to be an issue with a 100gb full planet tileset hosted on s3 since each full download would cost the owner $10 in egress fees.

Yeah, I agree the intentional linking/leeching is a concern - the basemap downloads I offer at http://protomaps.com/downloads are limited to at most a hundred or so megabytes, and my stopgap solutions for larger maps is proxy-based like above. I'm optimistic about the long-term solve here being market pressure downwards on bandwidth in the next few years, for example if/when Cloudflare R2 becomes available.

from pmtiles.

bdon avatar bdon commented on August 27, 2024

I'm going to close this issue about archive splitting for now; I think the ETag features enabled by a single file take precedence over working around max file size limits. For the planetiler demo site, I've spun up a demo tile server using https://github.com/protomaps/go-pmtiles on an unmetered bandwidth server:

https://bdon.github.io/planetiler-demo/ (endpoint http://free-tiles.protomaps.com/planetiler/{z}/{x}/{y}.pbf)

Open to suggestions on how to organize the URL structures or metadata, or access for hosting regular updates.

from pmtiles.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.