paginate-json
CLI tool for retrieving JSON from paginated APIs.
Currently works against APIs that use the HTTP Link header for pagination. The GitHub API is the most obvious example.
Usage: paginate-json [OPTIONS] URL
Fetch paginated JSON from a URL
Options:
--version Show the version and exit.
--nl Output newline-delimited JSON
--jq TEXT jq transformation to run on each page
--accept TEXT Accept header to send
--sleep INTEGER Seconds to delay between requests
--silent Don't show progress on stderr
--show-headers Dump response headers out to stderr
--header <TEXT TEXT>... Send custom request headers
--help Show this message and exit.
The --jq
option only works if you install the optional pyjq dependency.
Works well in conjunction with sqlite-utils. For example, here's how to load all of the GitHub issues for a project into a local SQLite database.
paginate-json \
"https://api.github.com/repos/simonw/datasette/issues?state=all&filter=all" \
--nl | \
sqlite-utils upsert /tmp/issues.db issues - --nl --pk=id
You can then use other features of sqlite-utils to enhance the resulting database. For example, to enable full-text search on the issue title and body columns:
sqlite-utils enable-fts /tmp/issues.db issues title body
You can use the --header
option to send additional request headers. For example, if you have a GitHub OAuth token you can pass it like this:
paginate-json https://api.github.com/users/simonw/events \
--header Authorization "bearer e94d9e404d86..."