chrahunt / quicken Goto Github PK

View Code? Open in Web Editor NEW

4.0 2.0 1.0 307 KB

Make Python apps faster

License: MIT License

Python 100.00%

cli python

quicken's Introduction

quicken

Make Python tools start fast.

When a quickened script is executed the first time it starts a server in the background, paying a one time cost to speed up execution for every other execution.

Quicken only speeds up applications on Linux, but transparently falls back to executing scripts directly on unsupported platforms, with minimal overhead.

Generally, an application can benefit if:

It takes more than 100ms to start on an average machine
python -X importtime shows that the startup time is related to module importing

To see how fast an app can be, check out the latest benchmark results in CI and interpretation in wiki here.

Usage

`quicken` CLI

The quicken command can be use to quicken plain Python scripts that look like

# script.py
...

def main():
    pass


if __name__ == '__main__':
    main()

Running quicken run --file script.py followed by arguments will start the application server and run all code before if __name__ == '__main__'. For the first and subsequent commands, only the code in if __name__ == '__main__' will be executed. If the script is updated then a new server will be started.

To see the status of the server: quicken status --file script.py

To stop the server: quicken stop --file script.py

The server is identified using the full path to the script.

Note

__file__ is set to the full, resolved path to the file provided to --file, unlike Python which sets it to the path provided on the command line. This is so the code before if __name__ == '__main__' and the code after it see the same path even if changing directories or the path provided to the command.

`quicken.script`

quicken.script can wrap console_scripts as supported by several Python packaging tools.

The console_script/entrypoint format is quicken.script:module.path._.function.path. For example, if our console script is hello=hello.cli:main, then we would use helloc=quicken.script:hello.cli._.main.

Once set up, we can use helloc just like hello, but it should be faster after the first time.

Since quicken is new, it would be wise to provide a second command for testing as above, instead of only having a quicken-based command. We use a c suffix since it's a client.

If using setuptools (setup.py):

setup(
    # ...
    entry_points={
        'console_scripts': [
            'hello=hello.cli:main',
            # With quicken
            'helloc=quicken.script:hello.cli._.main',
        ],
    },
    # ...
)

If using poetry

[tools.poetry.scripts]
hello = "hello.cli:main"
# With quicken
helloc = "quicken.script:hello.cli._.main"

If using flit

[tools.flit.scripts]
hello = "hello.cli:main"
# With quicken
helloc = "quicken.script:hello.cli._.main"

`quicken.ctl_script`

Similar to the above, using quicken.ctl_script provides a CLI to stop and check the status of a quicken server.

Setuptools example:

setup(
    ...
    entry_points={
        'console_scripts': [
            'hello=hello.cli:main',
            # With quicken
            'helloc=quicken.script:hello.cli._.main',
            # Server control command
            'helloctl=quicken.ctl_script:hello.cli._.main',
        ],
    },
    ...
)

Then we can use helloctl status to see the server status information and helloctl stop to stop the application server.

Options

Quicken has several options regardless of how it is invoked:

logging - set QUICKEN_LOG_FILE to an absolute file path and debug logs will be traced to it. Note that server logs will only be traced if this environment variable is set for the command that starts the server.
idle timeout - by default any quicken server will shut down after 24 hours of inactivity. This can be changed by setting QUICKEN_IDLE_TIMEOUT to the desired time (in seconds). This will only take effect if this environment variable is set for the command that starts the server.

Why

Python command-line tools can feel slow. There are tricks that can be used to speed up startup, but implementing them in individual packages is not scalable, and can slow development. The purpose of this project is:

provide one way to speed up app startup, with a focus on strategies that can apply across a large number of applications using normal Python development conventions
find areas of improvement that can be folded back into Python itself
make it easier to focus on application logic and not startup time concerns

Limitations

Unix only.
Debugging may be less obvious for end users or contributors.
Access to the socket file implies access to the server and ability to run commands. The library tries to mandate that the directory used for runtime files is only owned by the user, for best results use XDG_RUNTIME_DIR as provided by pam_systemd or the equivalent for your distribution.

Tips

Profile import time with -X importtime, see if your startup is actually the problem. If it's not then this package will not help you.
Ensure your package can be built as a wheel, even if it's not distributed as one. When wheels are installed they create scripts that do not import pkg_resources, which can save 60ms+ depending on disk speed and caching.

Development

poetry install
poetry run pytest -ra

quicken's People

Contributors

Stargazers

Watchers

Forkers

pombredanne

quicken's Issues

Shutdown server if runtime directory is removed

Since a conforming XDG_RUNTIME_DIR will get removed on logout (but our server will not, by default), we should be good citizens and shut down the server if possible since no more clients would be able to connect.

Gracefully handle platforms that can't send fds over socket

Or determine that no such platforms are common enough to worry about for now.

The checks (from multiprocessing) are:

HAVE_SEND_HANDLE = (sys.platform == 'win32' or
                    (hasattr(socket, 'CMSG_LEN') and
                     hasattr(socket, 'SCM_RIGHTS') and
                     hasattr(socket.socket, 'sendmsg')))

Support minimal user data in runtime directory

The most likely use case for the reload_daemon parameter is restarting the server if the utility has been updated, but there is currently no mechanism for determining how long the server has been up or what utility version was in effect when it was brought up.

We should support the user providing 'user data' to be serialized at server start which will be provided back to the reload_daemon function so it is easier to determine whether a reload should be done.

This also implies that the reload_daemon parameter should only be invoked after we have determined that it is already up.

Support XDG_RUNTIME_DIR by default - only accept runtime_dir as argument

Currently we accept separate arguments for socket file and pid file - instead we should accept runtime_dir and fall back to $XDG_RUNTIME_DIR or /tmp/.quicken-{name}/ if not provided.

Support --from-path for quicken CLI

Currently it is not straightforward to do anything with quicken but quickly test a single file. --from-path would search PATH for the applicable script and run it, so an introduction could look like alias qpip="quicken run --from-path pip --". This would probably be more approachable than the less common --module suggested in #56.

Forward signals from client to server

Currently any signals sent to the cli client will not be forwarded to the server. The expected client behavior will be the same as for any normal terminal application.

Expected signals should be forwarded: those the system sends automatically (e.g. SIGHUP) and those that a user would be expected to send (e.g. SIGTERM, SIGINT)
If the client is killed (exits unexpectedly, can be indicated by broken connection) then the handling process should also be killed

Set default server timeout for console scripts, with override

24 hour default is probably OK to start before getting feedback.

Format can be:

QUICKEN_SCRIPT_SERVER_TIMEOUT=<module>=time,<module>=time,...

time by default should be seconds (float OK), later we can extend with unit.

Introductory docs

Straightforward README introduction (with clear limitations)
Explicit documentation link on PyPI package
Docs build check on PRs?

Add CLI

There's no reason we can't provide a CLI to speed up existing applications for end users. For example

# file path
$ quicken ./example.py
# modules
$ quicken -m pytest

For argument parsing, proceed as if all arguments after the path or -m arg are meant for the underlying command, for example:

$ quicken ./example.py arg1
# ->
$ ./example.py arg1
$ quicken -m pytest -s -ra
# ->
$ pytest -s -ra

For scripts:

Input is normalized to path then resolved to real_path
name is quicken.cli.file.{digest}, where digest = sha256(path)
metadata:
1. path
2. real_path (reload criteria)
3. ctime of real_path (reload criteria)
4. mtime of real_path (reload criteria)
function:
1. parse the file at real_path, extract into prelude and if __name__ == '__main__' section. Execute prelude as preparation and return conditional section as the main function.

For modules:

resolve module
name is quicken.cli.module.{digest}, where digest = sha256(module_path)
metadata:
1. module_path
2. module
3. last modified time of module_path (reload criteria)
function:
1. parse module, getting imports manually
2. return function that executes the rest of the module body of __main__.py

Client should restart server if process user/group ids have changed

Processes have:

effective group id
supplementary group ids
real group id
saved set-group-id
real user id
effective user id
saved set-user-id

Currently these are disregarded except when creating and checking the runtime directory which compares the owner to getuid (real user uid).

The expected behavior is that the runner process will execute the command with the same attributes as a given client process (or be indistinguishable).

Changes to supplementary and real group id can happen easily and must be accounted for.

This would only impact effective user/group id if the script itself is setgid/setuid. Since scripts cannot be setuid/setgid this would have to be set on:

Python itself - not likely
on an executable embedding Python - not likely to need this library
on an executable generated from a Python application packaging utility (e.g. pyInstaller) - reasonable use case, so we should at least have a path to support it in the future

This may also happen if an app using this library calls e.g. setgid during import.

Given the above:

if the real/effective/saved uid/gid for a process are different then raise an exception - better safe than leaving it undefined - this can be revisited later if there is a need for supporting it
if the supplementary group ids or real group id for a client is different than for a server, then the current server should be requested to stop and the command should be run with a newly-started server

pipx setup information

Add docs on how to use the quicken CLI with pipx.

Set environment variable when in use

The incentive to make CLI applications faster can lead to lazy loading and initialization in Python CLI applications - but this is counterproductive if using quicken. To facilitate using different strategies, we should set an environment variable so applications that already optimize for startup time can toggle e.g. lazy/eager imports based on whether quicken is enabled.

Windows tests failing occasionally

Worse, they fail silently. Link.

Support server auto-reload

One common use case for auto-reload is to bring in any changes to files as part of development. The problem with the current mechanism of reload is that knowing whether any of the relevant files are updated at client start would take so long that it's not worth it. Instead we should support auto-reload on changed files (or on callback from a user-provided function). For an autoreload example see here.

CI test failure: test_client_receiving_tstp_ttin_stops_itself

Run

Add README badges

readthedocs
Azure Pipelines
Python versions
PyPI page

Add coverage

Must account for code touched by forked processes
Coverage badge on README (codecov?)
Integrated with CI
Coverage HTML should be generated as an artifact in Azure pipeline
See advice here for example

Update LICENSE

Removing [fullname] and moving to LICENSE.txt (less hassle for Windows users)

Server should support auto-cleanup

After a configurable period of time the server should automatically shut down.

Support --module for quicken CLI

From #21.

ASCIInema intro

For example:

time pip --help
time pip --help
time pip --help
alias qpip="quicken run --from-path pip"
time qpip --help
time qpip --help
time qpip --help
alias qpipctl="quicken status --from-path pip"
qpipctl
qpipctl --json

Set sticky bit and update modification time of runtime files/directory

Per guidance on the XDG Directory Specification page.

CI test failure: test_function_timeout_works failure

Test failure. Build, PR.

Handle exceptions in server handler

We do not currently catch any exceptions from the callback function. We should emulate the Python interpreter behavior:

If SystemExit is thrown then its code member is evaluated to the exit code:
1. If code is an integer then it is used as-is
2. If code is None then the exit code is 0
3. Otherwise the exit code is 1
If an unhandled exception is thrown then the exit code is 1

Consider using PEP-0554

PEP-0554 in 3.8 provides Python-level support for sub-interpreters. This can allow running isolated code while pre-seeding the module cache with the required imports.

One downside would be our current signal propagation would need to be made part of the application protocol and the resulting behavior would be emulated by actions on the sub-interpreter instead of actually delivering a signal to the process.

macOS support

Add debug log and level environment variable for scripts

Specifically for the quicken.script helper.

Provide interface for server status and control

It is simpler to have a mechanism for external server control than it is to put everything in that might be needed (e.g. #6).

If a command app provides an app-c (app client, wrapped with this library), then it's not too much to also have an app-ctl that would allow control/query if needed.

Use black coding style

Also integrate with CI (fail PRs that don't conform) and put instructions in README. Better sooner than later.

Rewrite command-lines visible in ps to align with process roles

There are three processes involved in handling of requests: client, server, and handler. The current command lines are:

client: command line matching how it was invoked
server: command line of the client that started the server
handler: same as server

This is pretty unexpected, since most users would assume that the command-line reflects the actual work being done. A more friendly approach would be:

client: (cli client) {command line} - clearly showing the role of the process which can make it easier to identify the correct process to kill (i.e. not this one)
server: (cli server) {server name} - where {server name} is the name passed to the cli_factory decorator, also helps to identify the process
handler: {command line} - this reflects the actual work being done by the process, and is one of the first things that users of a cli would look for if grepping ps output or using pkill. Actually killing the handler process directly would also lead to the most intuitive outcome - the client that is running in the foreground of a shell would exit with the expected return code

Implementation notes:

OpenBSD: unix.stackexchange
Linux: unix.stackexchange - requires root or kernel 3.18 (which is higher than our target RedHat 7.x which has kernel 3.10).

Support --entrypoint for quicken CLI

Would be a good way to test that a tool would work as-expected if integrating with quicken directly, and should have similar overhead.

Integrate with packaging tools to allow console_scripts-like usage

The current implementation is generic enough that it can be used in place of a setuptools-generated script and be used to run packages directly.

The solution should work for (to start):

setup.py with setuptools
flit
poetry

Examples of current script designation:

setuptools console_scripts entry point:

setup(
    ...
    entry_points={
        'console_scripts': ['funniest-joke=funniest.command_line:main'],
    }
    ...
)

flit scripts:

[tools.flit.scripts]
flit = "flit:main"

poetry scripts

[tools.poetry.scripts]
poetry = 'poetry:console.run'

Usage should be about that easy and should work with both source distributions and wheels.

Document explicit requirements for applications

The following is unsupported before execution of the main entrypoint function (the function returned by the @quicken-decorated method):

threads
reading configuration from environment/args
using cwd
setting signal handlers
sub-processes
reading plugin information (from e.g. pkg_resources)

The following is unsupported:

atexit handlers

Support tools like python-hunter

Currently we do not do anything to try and propagate handlers registered with sys.settrace, and the .pth-file trick used by python-hunter also wouldn't be triggered by updating os.environ.

Normalize logging

Currently logging is configured within the library itself. This isn't a good practice.

We can have a central logging configuration to be used in scripts, cli, and tests, but the setup of that logging should not infect the rest of our application.

Requirements:

for tests:
- there should be a log file per test in the logs directory and the log level should be set to DEBUG
for scripts (script, ctl_script, _cli):
- set log level to debug based on environment variable
- create file handler and default formatter
for library
- do not propagate by default
- null handler

For now, don't worry about loading time for the logging module, or anything like request-specific logging (server using the logging configured by the client for the duration of the request handling).

CI test failure: test_command_does_not_hang_on_first_invocation

Run

Gracefully handle unsupported platforms

Our dependency on psutil means that dependent utilities will fail to install on unsupported platforms. For example, on HP-UX attempting to install results in:

Complete output from command python setup.py egg_info:
platform hp-ux11 is not supported

because of attempt to install from source distribution for psutil.

Degrade gracefully on Windows

We depend heavily on Unix-only features (socket files, fork), but user code should not have to accommodate our limitation - it should look the same on both Unix and Windows.

"test mode" for quicken CLI

There are several restrictions on scripts as mentioned in the docs. If a script uses any of these then the behavior will be inconsistent. We should have a "test mode" which will intercept calls/usages of the most common associated methods for accessing these during the import phase and output warnings. For example

starting threads - threading.Thread.start() (instantiation should be OK)
reading environment - os.environ, os.getenv, os.putenv
reading arguments - sys.argv
cwd - os.getcwd(), Path.cwd()
several more methods in os
atexit

Add CI for docs

Multiprocessing waits for child process exit on server start

When running a command for the first time it hangs at supposed "exit". Sending SIGINT results in

^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/chris/.pyenv/versions/3.7.1/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

We should disable this behavior in the parent for the server process that gets spawned.

Add library version file in runtime directory

To avoid having to worry about backward compatibility in the client/server protocol, we should write the necessary information into a file so that the library can determine whether it needs to stop and restart the server due to an older library version.

For example, in $RUNTIME_DIR/state.json:

{
  "pid": 12345,
  "timestamp": 1547000603.123456,
  "version": "0.1.0"
}

This can be used by the client to identify the current server process and kill it.

More explicit list of limitations/restrictions on the underlying processes

From the README it's not clear what the differences are for a process running normally and under quicken.

What is different:

the environment, stdin/stdout/stderr, umask, cwd, sys.argv, pid will be different between the execution of the top-level code (e.g., imports, function definitions, global variable definitions) and the execution of the entrypoint
signals to a quickened process are proxied to the actual process (source)
the actual app being executed will be a child of the command server, not the users shell
there may be a long time between execution of the top-level code and execution of the entrypoint (for edample, state retrieved at the top-level of an application may be stale by the time it is used)

What isn't different:

global changes made from the application entrypoint onward will only exist in that instance of the application

TBD:

atexit callbacks

Add tests for import time

No point to the library if import isn't fast. We need reproducible benchmarks for:

import and bypassing client/server
client start when server is up
client start server not up
no import, run test code directly

Support pytest without -s

Currently if pytest is executed without -s, then many tests fail due to e.g.:

___________________________________________________________________________________ test_runner_reloads_server_on_different_gid ____________________________________________________________________________________

    def test_runner_reloads_server_on_different_gid():
        # Given the server has been started with real gid 1
        # And the decorated function is executed with real gid 2
        # Then the server should be reloaded, and the decorated function executed
        @cli_factory(current_test_name())
        def runner():
            def inner():
                output_file.write_text(
                    f'{os.getpid()} {os.getppid()}', encoding='utf-8')
                return 0
            return inner

        with isolated_filesystem() as path:
            output_file = path / 'test.txt'

            with contained_children():
                with patch('os.getgid', lambda: 1):
>                   assert runner() == 0

tests/cli_wrapper/test_cli_wrapper.py:1114:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
quicken/lib/_decorator.py:81: in wrapper
    user_data=user_data,
quicken/lib/_lib.py:141: in server_runner_wrapper
    response = client.send(req)
quicken/lib/_client.py:18: in send
    self._client.send(request)
/home/chris/.pyenv/versions/3.7.2/lib/python3.7/multiprocessing/connection.py:206: in send
    self._send_bytes(_ForkingPickler.dumps(obj))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'multiprocessing.reduction.ForkingPickler'>, obj = <quicken.lib._protocol.Request object at 0x7f392dd83860>, protocol = None

    @classmethod
    def dumps(cls, obj, protocol=None):
        buf = io.BytesIO()
        #cls(buf, protocol).dump(obj)
>       cls(buf, protocol).dump(obj)
E       TypeError: cannot serialize '_io.FileIO' object

The reason is because sys.stdout and sys.stderr are replaced with EncodedFile by pytest, and these are not recognized as io.TextIOWrapper objects. The reason it shows up as _io.FileIO is because that is the failed member of the input objects (EncodedFile).

Options:

Support transfer of FileIO objects (or the containing EncodedFile)
- Should be easier to implement
Wrap objects with io.TextIOWrapper
- Could also fix issues like pytest-dev/pytest#4843

Since our use case is sending these entities over an fd, and there would be no way to distinguish them on the other side except type, 1 is probably the best option.

Test failures with EBADF

Builds:

20190424.8 - test_runner_inherits_environment
20190425.2 - test_exit_code_propagated_on_atexit_sys_exit

There's nothing really special about these tests, so it may be a race condition.