GithubHelp home page GithubHelp logo

simonw / datasette-media Goto Github PK

View Code? Open in Web Editor NEW
19.0 3.0 1.0 43 KB

Datasette plugin for serving media based on a SQL query

License: Apache License 2.0

Python 100.00%
datasette datasette-plugin datasette-io

datasette-media's Introduction

datasette-media

PyPI Changelog Tests License

Datasette plugin for serving media based on a SQL query.

Use this when you have a database table containing references to files on disk - or binary content stored in BLOB columns - that you would like to be able to serve to your users.

Installation

Install this plugin in the same environment as Datasette.

$ pip install datasette-media

HEIC image support

Modern iPhones save their photos using the HEIC image format. Processing these images requires an additional dependency, pyheif. You can include this dependency by running:

$ pip install datasette-media[heif]

Usage

You can use this plugin to configure Datasette to serve static media based on SQL queries to an underlying database table.

Media will be served from URLs that start with /-/media/. The full URL to each media asset will look like this:

/-/media/type-of-media/media-key

type-of-media will correspond to a configured SQL query, and might be something like photo. media-key will be an identifier that is used as part of the underlying SQL query to find which file should be served.

Serving static files from disk

The following metadata.json configuration will cause this plugin to serve files from disk, based on queries to a database table called apple_photos.

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath from apple_photos where uuid=:key"
            }
        }
    }
}

A request to /-/media/photo/CF972D33-5324-44F2-8DAE-22CB3182CD31 will execute the following SQL query:

select filepath from apple_photos where uuid=:key

The value from the URL - in this case CF972D33-5324-44F2-8DAE-22CB3182CD31 - will be passed as the :key parameter to the query.

The query returns a filepath value that has been read from the table. The plugin will then read that file from disk and serve it in response to the request.

SQL queries default to running against the first connected database. You can specify a different database to execute the query against using "database": "name_of_db". To execute against photos.db, use this:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath from apple_photos where uuid=:key",
                "database": "photos"
            }
        }
    }
}

See dogsheep-photos for an example of an application that can benefit from this plugin.

Serving binary content from BLOB columns

If your SQL query returns a content column, this will be served directly to the user:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select thumbnail as content from photos where uuid=:key",
                "database": "thumbs"
            }
        }
    }
}

You can also return a content_type column which will be used as the Content-Type header served to the user:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select body as content, 'text/html;charset=utf-8' as content_type from documents where id=:key",
                "database": "documents"
            }
        }
    }
}

If you do not specify a content_type the default of application/octet-stream will be used.

Serving content proxied from a URL

To serve content that is itself fetched from elsewhere, return a content_url column. This can be particularly useful when combined with the ability to resize images (described in the next section).

{
    "plugins": {
        "datasette-media": {
            "photos": {
                "sql": "select photo_url as content_url from photos where id=:key",
                "database": "photos",
                "enable_transform": true
            }
        }
    }
}

Now you can access resized versions of images from that URL like so:

/-/media/photos/13?w=200

Setting a download file name

The content_filename column can be returned to force browsers to download the content using a specific file name.

{
    "plugins": {
        "datasette-media": {
            "hello": {
                "sql": "select 'Hello ' || :key as content, 'hello.txt' as content_filename"
            }
        }
    }
}

Visiting /-/media/hello/Groot will cause your browser to download a file called hello.txt containing the text Hello Groot.

Resizing or transforming images

Your SQL query can specify that an image should be resized and/or converted to another format by returning additional columns. All three are optional.

  • resize_width - the width to resize the image to
  • resize_width - the height to resize the image to
  • output_format - the output format to use (e.g. jpeg or png) - any output format supported by Pillow is allowed here.

If you specify one but not the other of resize_width or resize_height the unspecified one will be calculated automatically to maintain the aspect ratio of the image.

Here's an example configuration that will resize all images to be JPEGs that are 200 pixels in height:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath, 200 as resize_height, 'jpeg' as output_format from apple_photos where uuid=:key",
                "database": "photos"
            }
        }
    }
}

If you enable the enable_transform configuration option you can instead specify transform parameters at runtime using querystring parameters. For example:

  • /-/media/photo/CF972D33?w=200 to resize to a fixed width
  • /-/media/photo/CF972D33?h=200 to resize to a fixed height
  • /-/media/photo/CF972D33?format=jpeg to convert to JPEG

That option is added like so:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath from apple_photos where uuid=:key",
                "database": "photos",
                "enable_transform": true
            }
        }
    }
}

The maximum allowed height or width is 4000 pixels. You can change this limit using the "max_width_height" option:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath from apple_photos where uuid=:key",
                "database": "photos",
                "enable_transform": true,
                "max_width_height": 1000
            }
        }
    }
}

Configuration

In addition to the different named content types, the following special plugin configuration setting is available:

  • transform_threads - number of threads to use for running transformations (e.g. resizing). Defaults to 4.

This can be used like this:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath from apple_photos where uuid=:key",
                "database": "photos"
            },
            "transform_threads": 8
        }
    }
}

datasette-media's People

Contributors

simonw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

ryanfox

datasette-media's Issues

Feature idea: configurable media serving domain

Serving raw data out of the database could inadvertently lead to XSS attacks, if a site allows users to insert content that is later served up raw by this plugin.

These could be avoided by configuring a separate "media serving" domain - e.g. if the plugin was running on datasette.io but the media serving domain was datasette-user-content.io.

Both domains would point at the same instance. The datasette-media plugin could be configured to only serve assets on datasette-user-content.io based on the incoming Host header.

"enable_transform" in plugin configuration

Other features will be controlled by metadata settings:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath from apple_photos where uuid=:key",
                "enable_transform": true
            }
        }
    }
}

So enable_transform turns on the ability to reformat with ?w= and ?h= and ?format= in the querystring parameters.

Split from #3

default_convert plugins metadata setting

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath from apple_photos where uuid=:key",
                "default_convert": {
                    "heic": "jpeg"
                }
            }
        }
    }
}

default_convert can be used to default to converting certain formats. Split from #6

500 error if content is blank

The 500 error is a bug in datasette-media where if content is blank it attempts to return a non-existent file instead:

# Non-image files are returned directly
if content:
return Response(
content,
content_type=content_type or "application/octet-stream",
headers={
"content-disposition": 'attachment; filename="{}"'.format(
content_filename
)
}
if content_filename
else None,
)
else:
print(
"""
asgi_send_file(
send={},
filepath={},
filename={},
content_type={} or guess_type(filepath)[0],
)

Originally posted by @simonw in simonw/til#62 (comment)

Get libheif and pyheif to install correctly in Circle CI

I'm running into the issue from this thread: carsales/pyheif#1 (comment)

https://app.circleci.com/pipelines/github/simonw/datasette-media/14/workflows/d5adc812-7681-44b0-a3bd-740bbc19bada/jobs/36

    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include -I/home/circleci/project/venv/include -I/usr/local/include/python3.6m -c build/temp.linux-x86_64-3.6/_libheif_cffi.c -o build/temp.linux-x86_64-3.6/build/temp.linux-x86_64-3.6/_libheif_cffi.o
    build/temp.linux-x86_64-3.6/_libheif_cffi.c:1109:37: error: return type is an incomplete type
     static enum heif_color_profile_type _cffi_d_heif_image_handle_get_color_profile_type(struct heif_image_handle const * x0)
                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    build/temp.linux-x86_64-3.6/_libheif_cffi.c: In function ‘_cffi_d_heif_image_handle_get_color_profile_type’:
    build/temp.linux-x86_64-3.6/_libheif_cffi.c:1111:10: warning: implicit declaration of function ‘heif_image_handle_get_color_profile_type’; did you mean ‘_cffi_d_heif_image_handle_get_color_profile_type’? [-Wimplicit-function-declaration]
       return heif_image_handle_get_color_profile_type(x0);
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ...

Design the plugin

The goal of this plugin is to support serving files (most likely images) from disk based on an incoming path, via a SQL query.

E.g. a hit to /-/media/photo/19EB8081-79E2-4018-8473-E46DDCC6B180 could be configured to execute select path from apple_photos where uuid=? and then serve up the contents of the file listed in the path column - along with the correct detected content-type.

It can support basic image resizing too via ?w=/?h=, since a common use-case will be serving up thumbnails of images.

Default to stripping EXIF data

I should default to stripping out EXIF data, because leaking latitude/longitude in GPS tags is a potential privacy violation. To allow EXIF through I can support this option:

    "strip_exif": false

Originally posted by @simonw in #3 (comment)

SQL columns controlling resize/reformat

Resizing that's specified by columns returned from the SQL query will always be respected - no additional configuration needed. Those columns will be:

  • resize_width - a width to resize to
  • resize_height - a height to resize to
  • output_format - the format to convert to and output, e.g. jpeg or png

Split from #3

Make compatible with Datasette 0.45

  File "/Users/simon/.local/share/virtualenvs/latest-datasette-with-all-plugins-PJL_Xy9e/lib/python3.8/site-packages/uvicorn/lifespan/on.py", line 48, in main
    await app(scope, self.receive, self.send)
  File "/Users/simon/.local/share/virtualenvs/latest-datasette-with-all-plugins-PJL_Xy9e/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/Users/simon/.local/share/virtualenvs/latest-datasette-with-all-plugins-PJL_Xy9e/lib/python3.8/site-packages/starlette/middleware/errors.py", line 146, in __call__
    await self.app(scope, receive, send)
  File "/Users/simon/.local/share/virtualenvs/latest-datasette-with-all-plugins-PJL_Xy9e/lib/python3.8/site-packages/datasette_media/__init__.py", line 9, in wrapped_app
    path = scope["path"]
KeyError: 'path'
ERROR:    Application startup failed. Exiting.

Switch to register_routes to fix this.

Serve mp4 video that can be viewed in a browser

I ran into problems serving mp4 files such that they could be embedded in an HTML <video> element:

<video controls width="600">
    <source src="/media/video.mp4" type="video/mp4">
</video>

I think this is due to to a lack of support for HTTP range requests. I filed a bug with Starlette here: encode/starlette#950

MVP: serving static files off disk

Based on design in #1. The first release just needs to be able to handle this:

{
    "plugins": {
        "datasette-media": {
            "photo": {
                "sql": "select filepath from apple_photos where uuid=:key"
            }
        }
    }
}

Do something smart with etags

If a database column stores md5 could I combine that with the resize query string parameters and support conditions GET, hence avoiding a resize operation?

Are CDNs smart enough to make conditional GET requests?

Could I avoid even loading the content BLOB in the SQL query if an incoming stag is present, unless it's a miss?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.