heroku / buildpacks-python Goto Github PK

Heroku's Cloud Native Buildpack for Python applications.

License: BSD 3-Clause "New" or "Revised" License

Rust 100.00%

heroku cloud-native-buildpacks python heroku-languages

buildpacks-python's Introduction

Heroku Cloud Native Buildpack: Python

heroku/python is the Heroku Cloud Native Buildpack for Python applications. It builds Python application source code into application images with minimal configuration.

Important

This is a Cloud Native Buildpack, and is a component of the Heroku Cloud Native Buildpacks project, which is in beta. If you are instead looking for the Heroku Classic Buildpack for Python (for use on the Heroku platform), you may find it here.

Usage

Note

Before getting started, ensure you have the pack CLI installed. Installation instructions are available here.

To build a Python application codebase into a production image:

$ cd ~/workdir/sample-python-app
$ pack build sample-app --builder heroku/builder:22

Then run the image:

docker run --rm -it -e "PORT=8080" -p 8080:8080 sample-app

Application Requirements

A requirements.txt file must be present at the root of your application's repository.

Configuration

Python Version

By default, the buildpack will install the latest version of Python 3.12.

To install a different version, add a runtime.txt file to your app’s root directory that declares the exact version number to use:

$ cat runtime.txt
python-3.12.2

In the future this buildpack will also support specifying the Python version via a .python-version file (see #6).

Contributing

Issues and pull requests are welcome. See our contributing guidelines if you would like to help.

buildpacks-python's People

Contributors

Stargazers

Watchers

Forkers

miguelzamora13 subrowyamada

buildpacks-python's Issues

Support for `pre_compile` and `post_compile` steps

Just tested out v0.1.0 and noticed this was missing. It looks like you have some other issues to get it to parity with the legacy buildpack, so just dropping this one in too.

Thanks for your work on this!

Automatically configure the Gunicorn web server

The classic Heroku Python buildpack automatically configures gunicorn at runtime:
https://github.com/heroku/heroku-buildpack-python/blob/main/vendor/python.gunicorn.sh

Something similar to that should be added to the Python CNB for parity with the classic buildpack.

Internal tracking epic

Support the Pipenv package manager

The classic Python buildpack currently supports the package manager Pipenv:
https://pipenv.pypa.io
https://github.com/heroku/heroku-buildpack-python/blob/main/bin/steps/pipenv-python-version
https://github.com/heroku/heroku-buildpack-python/blob/main/bin/steps/pipenv

We should decide whether we want to still support it in the CNB, or whether Pipenv's declining usage (and mixed stability/issues upstream) mean we would rather only support Pip + Poetry instead.

Internal tracking epic

Automatic Python patch version updates

Currently the user has to either:
(a) not specify a Python version (in which case they get the default)
(b) specify an exact version (such as 3.10.5)

In order to make it easier for users to keep on an up to date Python release, it would be helpful if we also supported specifying just a major version (eg 3.11), which the buildpack would automatically map back to the latest patch release.

Error running locally built image with pack: Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding

Not sure what I'm doing wrong, it might be a dummy thing, but when trying to run a locally built image (with pack) using the heroku/builder:22, I get the error:

dcaro@vulcanus$ docker run --rm wm-lol
[uWSGI] getting INI configuration from uwsgi.ini
*** Starting uWSGI 2.0.21 (64bit) on [Tue Mar 14 19:03:17 2023] ***
compiled with version: 11.3.0 on 01 January 1980 00:00:01
os: Linux-6.1.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.7-1 (2023-01-18)
nodename: afb11a88e4cc
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /workspace
detected binary path: /layers/heroku_python/dependencies/bin/uwsgi
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to TCP address :8000 fd 3
Python version: 3.11.2 (main, Feb  8 2023, 12:54:20) [GCC 11.3.0]
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = '/layers/heroku_python/dependencies/bin/uwsgi'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/app/.heroku/python/lib/python3.11'
  sys._base_executable = '/layers/heroku_python/dependencies/bin/uwsgi'
  sys.base_prefix = '/app/.heroku/python'
  sys.base_exec_prefix = '/app/.heroku/python'
  sys.platlibdir = 'lib'
  sys.executable = '/layers/heroku_python/dependencies/bin/uwsgi'
  sys.prefix = '/app/.heroku/python'
  sys.exec_prefix = '/app/.heroku/python'
  sys.path = [
    '/app/.heroku/python/lib/python311.zip',
    '/app/.heroku/python/lib/python3.11',
    '/app/.heroku/python/lib/python3.11/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00007f6228908440 (most recent call first):
  <no Python frame>

The build process works well:

dcaro@vulcanus$ pack build --builder heroku/builder:22 wm-lol
22: Pulling from heroku/builder
Digest: sha256:eb9486a5587e666f43695ff794a28056b18033fd2f6566c81137634087d23034
Status: Image is up to date for heroku/builder:22
22-cnb: Pulling from heroku/heroku
Digest: sha256:8ed2415c4e53df324c1af95e7286f1065a9cced71ea7e449574064a2c268d55c
Status: Image is up to date for heroku/heroku:22-cnb
latest: Pulling from buildpacksio/lifecycle
Digest: sha256:f75a04887fced3ae0504a37edb2c0d29d366511cd9ede34dbb90c5282b106e79
Status: Image is up to date for buildpacksio/lifecycle:latest
===> ANALYZING
[analyzer] Restoring data for SBOM from previous image
===> DETECTING
[detector] heroku/python   0.1.0
[detector] heroku/procfile 2.0.0
===> RESTORING
[restorer] Restoring metadata for "heroku/python:python" from app image
[restorer] Restoring metadata for "heroku/python:pip-cache" from cache
[restorer] Restoring metadata for "heroku/python:shim" from cache
[restorer] Restoring data for "heroku/python:pip-cache" from cache
[restorer] Restoring data for "heroku/python:python" from cache
[restorer] Restoring data for "heroku/python:shim" from cache
===> BUILDING
[builder]
[builder] [Determining Python version]
[builder] No Python version specified, using the current default of Python 3.11.2.
[builder] To use a different version, see: https://devcenter.heroku.com/articles/python-runtimes
[builder]
[builder] [Installing Python and packaging tools]
[builder] Using cached Python 3.11.2
[builder] Using cached pip 23.0.1, setuptools 67.5.0 and wheel 0.38.4
[builder]
[builder] [Installing dependencies using Pip]
[builder] Using cached pip download/wheel cache
[builder] Running pip install
[builder] Collecting flask
[builder]   Using cached Flask-2.2.3-py3-none-any.whl (101 kB)
[builder] Collecting werkzeug
[builder]   Using cached Werkzeug-2.2.3-py3-none-any.whl (233 kB)
[builder] Collecting uwsgi
[builder]   Using cached uWSGI-2.0.21-cp311-cp311-linux_x86_64.whl
[builder] Collecting Jinja2>=3.0
[builder]   Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
[builder] Collecting itsdangerous>=2.0
[builder]   Using cached itsdangerous-2.1.2-py3-none-any.whl (15 kB)
[builder] Collecting click>=8.0
[builder]   Using cached click-8.1.3-py3-none-any.whl (96 kB)
[builder] Collecting MarkupSafe>=2.1.1
[builder]   Using cached MarkupSafe-2.1.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27 kB)
[builder] Installing collected packages: uwsgi, MarkupSafe, itsdangerous, click, werkzeug, Jinja2, flask
[builder] Successfully installed Jinja2-3.1.2 MarkupSafe-2.1.2 click-8.1.3 flask-2.2.3 itsdangerous-2.1.2 uwsgi-2.0.21 werkzeug-2.2.3
[builder]
[builder] [Discovering process types]
[builder] Procfile declares types -> web
===> EXPORTING
[exporter] Adding layer 'heroku/python:dependencies'
[exporter] Reusing layer 'heroku/python:python'
[exporter] Reusing layer 'buildpacksio/lifecycle:launch.sbom'
[exporter] Adding 1/1 app layer(s)
[exporter] Reusing layer 'buildpacksio/lifecycle:launcher'
[exporter] Adding layer 'buildpacksio/lifecycle:config'
[exporter] Reusing layer 'buildpacksio/lifecycle:process-types'
[exporter] Adding label 'io.buildpacks.lifecycle.metadata'
[exporter] Adding label 'io.buildpacks.build.metadata'
[exporter] Adding label 'io.buildpacks.project.metadata'
[exporter] Setting default process type 'web'
[exporter] Saving wm-lol...
[exporter] *** Images (f36700bd8dd2):
[exporter]       wm-lol
[exporter] Adding cache layer 'heroku/python:pip-cache'
[exporter] Reusing cache layer 'heroku/python:python'
Successfully built image wm-lol

I can run it though if I set both PYTHONHOME and LD_LIBRARY_PATH to point to the python layer dirs:

docker run --rm --env LD_LIBRARY_PATH=/layers/heroku_python/python/lib/ --env PYTHONHOME=/layers/heroku_python/python/ wm-lol

The code I'm running is https://github.com/david-caro/wm-lol (simple python app uwsgi+flask), but tested with the sample python app too with the same error

Note: moved from heroku/heroku-buildpack-python#1427

Decide whether to continue buildpack NLTK support

The classic Python buildpack supports installing NLTK corpora via an nltk.txt file:
https://devcenter.heroku.com/articles/python-nltk
https://github.com/heroku/heroku-buildpack-python/blob/main/bin/steps/nltk
https://www.nltk.org/data.html

This is a someone infrequently used feature, and it seems it could be replaced by an inline buildpack that runs a single command.

Though we'd need to check whether NLTK_DATA needs to be set, or if we can rely on the default output location.

If we do decide to drop this feature, we'll want to display migration advice in the CNB. Or if we don't drop the feature, we'll need to implement support for it in the CNB.

Support the Poetry package manager

We should support the Poetry package manager:
https://python-poetry.org/

Automatically run Django's `collectstatic` command

The classic Heroku Python buildpack automatically runs the Django collectstatic command:
https://github.com/heroku/heroku-buildpack-python/blob/main/bin/steps/collectstatic

Something similar to that should be added to the Python CNB for parity with the classic buildpack.

We should also bear in mind:

Django docs:

Internal tracking epic

Support passing arbitrary env vars to build backends (such as setuptools)

There are cases where a source distribution build for a package requires certain env vars to be set to control the build (or make it work on Heroku). For example:
heroku/heroku-buildpack-python#760

Currently the CNB (like the classic Python buildpack) doesn't pass arbitrary user-provided environment variables to pip install.

However, this means that we'll constantly be having to add special cases as they come up (like happened in the classic buildpack PR above).

Whilst wanting to prevent user-provided env vars from breaking subprocesses is worth thinking about, the blanket approach of not allowing them is IMO worse for UX than the alternative: Having a denylist of known dangerous env vars, and allowing everything else.

Note: The fix for this issue is also the same as that for #52 - however, the use-case it is in aid of is different.

Support disabling automatic Django static asset generation

Automatic Django static asset generation (running the manage.py collectstatic command) was added in #108.

At the moment there is no way to force disable the feature (beyond removing django.contrib.staticfiles from INSTALLED_APPS in the app's Django config, or removing the manage.py script), whereas the classic Python buildpack allows disabling it using DISABLE_COLLECTSTATIC=1.

That said, the new implementation performs more thorough checks to see whether the app is using the static files feature, so between that and a few other improvements (like now passing env vars to the subprocess), there should be fewer cases where the feature needs manually disabling.

Some scenarios in which the ability to disable might be needed:

the user needs to run a step between Python package installation and running collectstatic (such as patching invalid urls in CSS comments to work around https://code.djangoproject.com/ticket/21080)
the user can't debug a failure locally or at build time, and wants the build to succeed so they can debug at runtime

If we decide we should still add support for disabling automatic Django static asset generation, then we should probably use config options in project.toml rather than env vars (to encourage infrastructure as code). However, this depends on us figuring out a convention there for all of our buildpacks.

Either way, we'll also need to add deprecation (or error) messages if DISABLE_COLLECTSTATIC is set, to warn that it no longer does anything (for people migrating from the classic buildpack).

GUS-W-14109400.

Add documentation

Whilst the Python CNB already has pretty comprehensive user-facing build log output and error messages, plus lots of developer-facing code comments, we also need:

User-facing usage instructions in the readme (such as how to use with pack build)
Description of features + differences compared to the classic Heroku Python buildpack
Developer facing workflow docs (eg how to develop locally, run tests, publish a new buildpack version etc)

Internal tracking epic

Support the `.python-version` file

We should support the .python-version file as a means for specifying what Python version an app is using.

This file format (used by pyenv and others) now also supports specifying Python versions in the X.Y format (and not just X.Y.Z), making it a strong contender over the Heroku-specific runtime.txt file.

For more background, see:

Validate the SHA256 checksum of Python binaries

Currently the binaries are not validated during download from S3.

We should track the SHA256 in the manifest and validate using the streaming download/validation approach.

Internal tracking epic

Outdated Python version warnings

We should add support for displaying build log warnings/notices in the following cases:

Using a Python major version that has reached end-of-life upstream, for example Python 3.7 after June 2023.
Using a non-latest Python patch version, for example Python 3.11.1 when 3.11.2 is the latest version.
(Notice only, rather than a warning) When using a still-supported major Python version, but a newer major version is available. For example when using Python 3.10 but Python 3.11 is available. This is something the classic buildpack did not do, and would help reduce the number of users left to migrate once a Python version reaches end-of-life.

Internal tracking epic

Support the `PIP_EXTRA_INDEX_URL` env var for specifying an additional package index URL

The classic Python buildpack supports using the PIP_EXTRA_INDEX_URL env var to tell Pip/Pipenv to use an additional package index URL. The CNB currently doesn't support this feature.

See:
https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-extra-index-url
https://pip.pypa.io/en/stable/topics/configuration/#environment-variables
https://github.com/search?q=repo%3Aheroku%2Fheroku-buildpack-python%20PIP_EXTRA_INDEX_URL&type=code

GUS-W-13753676.

Finish review on initial implementation

I wasn't actually done with my review on #3. I threw out my back and wasn't able to get to it on Friday. Here's the rest of it:

https://github.com/heroku/buildpacks-python/pull/3/files#diff-e1cef540ad7956e8fb544ccfa9d7eef4517a3b12e6fa892b991b078ec8c9bdffR75

I would like to see the command logged to the end user. Above it's hard coded to "Running pip install". I would like to see the command be exactly what we run for the user so if they copy and paste (and of course modify any paths) they would get the exact same result.

Since it also looks like the results rely on PYTHONUSERBASE env var. I would like to see that in the output as well.

In addition to local debugging, it can also help buildpack maintainers spot accidental discrepancies.

https://github.com/heroku/buildpacks-python/pull/3/files#diff-e1cef540ad7956e8fb544ccfa9d7eef4517a3b12e6fa892b991b078ec8c9bdffR74 naming nit. The function streams output to the user but is named run_command. Maybe stream_command ?
https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

Same as above. We're streaming this command, but not announcing the (exact) command that's being streamed. If we don't want to announce this specific command (since it seems it's a helper command rather than something a user might expect to be run against their code) then perhaps we move to only emitting the command string in the error message.

I want the exact command run to be in the logs or the error (or both).

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

Function name is log_io_error. Function body says "unexpected error" which might not always be true. Or rather if we're saying "unexpected error" I'm reading that as synonymous with "not my fault" if I'm a customer reading it.

I imagine someone copying and pasting that function without looking too close thinking it's for handling all IO errors. For example a new file format is added and reading it generates an std::io::Error due to a permissions problem or bad symlink the customer checked in. In that case it wouldn't be so unexpected. Rename for clarity? unexpected_io_error?

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

Mentioned in Colin's PR. I would like to avoid early returns when there's only two branches. We can if/else this and eliminate the early return.

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

I think we should log_warning instead of log_info.

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

In a bit of a surprise to even me, I'm going to advocate for less testing. I think testing build logic inside of main.rs should be enough here. One fewer docker boots at CI test time.

Also we're effectively testing the output of pack here which is subject to change. If you do want to keep this test, I would scope it to only the strings you control.

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

Same as above. Doesn't need to be a docker test.

Regarding the error message (it's very good), when we say the user is missing files: It would be nice for us to give them an ls of what files we see in that directory. So at a glance they could see in one window we're looking for "requirements.txt" but they have "REQUIREMENTS.txt" (or something).

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

Comment: You could manually combine the streams yourself let output = format!("{}{}", context.pack_stdout, context.pack_stderr). I'm assuming the goal is to get all of the pack output on test failure.

Also I think you could get rid of this test. Esssentially it's testing that a bad requirements.txt file triggers a non-zero pip install. The git test above it, seems more useful as an integration test.

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

Testing caching 👍. We can make these less brittle by asserting only the individual lines (or even just parts of lines). I think asserting for "Using cached Python" and "Using cached pip" without the version numbers, would be enough to convince me. Maybe a "Using cached typing_extensions" for good measure. All the other values and numbers will cause churn on this file and possibly failures on otherwise unrelated changes (if libherokubuildpack updates it's logging style for example).

That comment applies to all integration tests. They're really nice and easy to review in this format, but I don't want to have to update 10 files every time I add an oxford comma (for example).

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

If we know that one change invalidates things, I would bet multiple changes would as well. I think this is covered in your unit tests ✂️

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

Unit test should be okay.

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

Comment: This is a good idea

https://github.com/heroku/buildpacks-python/pull/3/files#diff-ebe18e82a4ab770943fb39f7b7fec5a08762263112e0adb4e5d3727d2fd3d85bR109

Unit test should be fine.

Time to test

Not that it's a race, but time to run CI universally always goes up, and integration tests are historically one of the last things that developers are willing to delete. Right now Python is ~6min for integration tests while Ruby is ~3min.

Ideally I would like to aim for <5min for CI completion with a maximum of around 10 min. Once you hit 10 min and a random glitch causes the tests to need to be re-run then you're pushing nearly half an hour for a single change and it absolutely kills (my) productivity.

I think we should be agressive in testing and safety. I also think we should consider pruning some of these tests now, as this is otherwise the fastest this CI suite will ever execute.

Use retries/timeouts to improve reliabilty of downloads from S3

Currently the Python CNB makes a single download attempt for the Python runtime archive on S3.

We should add retries/timeouts to improve reliability in case there are network/S3 issues, similar to what the classic buildpack does.

Internal tracking epic

Support config vars in Pip requirements files

Pip supports using environment variables in requirements files:
https://pip.pypa.io/en/stable/reference/requirements-file-format/#using-environment-variables

However, the Python CNB (like the classic Python buildpack) currently doesn't pass user-provided env vars to the pip install process, so it's not possible to use this feature.

GUS-W-13648442.

heroku / buildpacks-python Goto Github PK

buildpacks-python's Introduction

Heroku Cloud Native Buildpack: Python

Usage

Application Requirements

Configuration

Python Version

Contributing

buildpacks-python's People

Contributors

Stargazers

Watchers

Forkers

buildpacks-python's Issues

Time to test

Recommend Projects

Recommend Topics

Recommend Org

Jobs