GithubHelp home page GithubHelp logo

quicken's Introduction

quicken

PyPI Version Documentation Build Status Python Versions

Make Python tools start fast.

When a quickened script is executed the first time it starts a server in the background, paying a one time cost to speed up execution for every other execution.

Quicken only speeds up applications on Linux, but transparently falls back to executing scripts directly on unsupported platforms, with minimal overhead.

Generally, an application can benefit if:

  1. It takes more than 100ms to start on an average machine
  2. python -X importtime shows that the startup time is related to module importing

To see how fast an app can be, check out the latest benchmark results in CI and interpretation in wiki here.

Usage

quicken CLI

The quicken command can be use to quicken plain Python scripts that look like

# script.py
...

def main():
    pass


if __name__ == '__main__':
    main()

Running quicken run --file script.py followed by arguments will start the application server and run all code before if __name__ == '__main__'. For the first and subsequent commands, only the code in if __name__ == '__main__' will be executed. If the script is updated then a new server will be started.

To see the status of the server: quicken status --file script.py

To stop the server: quicken stop --file script.py

The server is identified using the full path to the script.

Note

  1. __file__ is set to the full, resolved path to the file provided to --file, unlike Python which sets it to the path provided on the command line. This is so the code before if __name__ == '__main__' and the code after it see the same path even if changing directories or the path provided to the command.

quicken.script

quicken.script can wrap console_scripts as supported by several Python packaging tools.

The console_script/entrypoint format is quicken.script:module.path._.function.path. For example, if our console script is hello=hello.cli:main, then we would use helloc=quicken.script:hello.cli._.main.

Once set up, we can use helloc just like hello, but it should be faster after the first time.

Since quicken is new, it would be wise to provide a second command for testing as above, instead of only having a quicken-based command. We use a c suffix since it's a client.

If using setuptools (setup.py):

setup(
    # ...
    entry_points={
        'console_scripts': [
            'hello=hello.cli:main',
            # With quicken
            'helloc=quicken.script:hello.cli._.main',
        ],
    },
    # ...
)

If using poetry

[tools.poetry.scripts]
hello = "hello.cli:main"
# With quicken
helloc = "quicken.script:hello.cli._.main"

If using flit

[tools.flit.scripts]
hello = "hello.cli:main"
# With quicken
helloc = "quicken.script:hello.cli._.main"

quicken.ctl_script

Similar to the above, using quicken.ctl_script provides a CLI to stop and check the status of a quicken server.

Setuptools example:

setup(
    ...
    entry_points={
        'console_scripts': [
            'hello=hello.cli:main',
            # With quicken
            'helloc=quicken.script:hello.cli._.main',
            # Server control command
            'helloctl=quicken.ctl_script:hello.cli._.main',
        ],
    },
    ...
)

Then we can use helloctl status to see the server status information and helloctl stop to stop the application server.

Options

Quicken has several options regardless of how it is invoked:

  • logging - set QUICKEN_LOG_FILE to an absolute file path and debug logs will be traced to it. Note that server logs will only be traced if this environment variable is set for the command that starts the server.
  • idle timeout - by default any quicken server will shut down after 24 hours of inactivity. This can be changed by setting QUICKEN_IDLE_TIMEOUT to the desired time (in seconds). This will only take effect if this environment variable is set for the command that starts the server.

Why

Python command-line tools can feel slow. There are tricks that can be used to speed up startup, but implementing them in individual packages is not scalable, and can slow development. The purpose of this project is:

  1. provide one way to speed up app startup, with a focus on strategies that can apply across a large number of applications using normal Python development conventions
  2. find areas of improvement that can be folded back into Python itself
  3. make it easier to focus on application logic and not startup time concerns

Limitations

  • Unix only.
  • Debugging may be less obvious for end users or contributors.
  • Access to the socket file implies access to the server and ability to run commands. The library tries to mandate that the directory used for runtime files is only owned by the user, for best results use XDG_RUNTIME_DIR as provided by pam_systemd or the equivalent for your distribution.

Tips

  • Profile import time with -X importtime, see if your startup is actually the problem. If it's not then this package will not help you.
  • Ensure your package can be built as a wheel, even if it's not distributed as one. When wheels are installed they create scripts that do not import pkg_resources, which can save 60ms+ depending on disk speed and caching.

Development

poetry install
poetry run pytest -ra

quicken's People

Contributors

azure-pipelines[bot] avatar chrahunt avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

pombredanne

quicken's Issues

Shutdown server if runtime directory is removed

Since a conforming XDG_RUNTIME_DIR will get removed on logout (but our server will not, by default), we should be good citizens and shut down the server if possible since no more clients would be able to connect.

Gracefully handle platforms that can't send fds over socket

Or determine that no such platforms are common enough to worry about for now.

The checks (from multiprocessing) are:

HAVE_SEND_HANDLE = (sys.platform == 'win32' or
                    (hasattr(socket, 'CMSG_LEN') and
                     hasattr(socket, 'SCM_RIGHTS') and
                     hasattr(socket.socket, 'sendmsg')))

Support minimal user data in runtime directory

The most likely use case for the reload_daemon parameter is restarting the server if the utility has been updated, but there is currently no mechanism for determining how long the server has been up or what utility version was in effect when it was brought up.

We should support the user providing 'user data' to be serialized at server start which will be provided back to the reload_daemon function so it is easier to determine whether a reload should be done.

This also implies that the reload_daemon parameter should only be invoked after we have determined that it is already up.

Support --from-path for quicken CLI

Currently it is not straightforward to do anything with quicken but quickly test a single file. --from-path would search PATH for the applicable script and run it, so an introduction could look like alias qpip="quicken run --from-path pip --". This would probably be more approachable than the less common --module suggested in #56.

Forward signals from client to server

Currently any signals sent to the cli client will not be forwarded to the server. The expected client behavior will be the same as for any normal terminal application.

  1. Expected signals should be forwarded: those the system sends automatically (e.g. SIGHUP) and those that a user would be expected to send (e.g. SIGTERM, SIGINT)
  2. If the client is killed (exits unexpectedly, can be indicated by broken connection) then the handling process should also be killed

Introductory docs

  • Straightforward README introduction (with clear limitations)
  • Explicit documentation link on PyPI package
  • Docs build check on PRs?

Add CLI

There's no reason we can't provide a CLI to speed up existing applications for end users. For example

# file path
$ quicken ./example.py
# modules
$ quicken -m pytest

For argument parsing, proceed as if all arguments after the path or -m arg are meant for the underlying command, for example:

$ quicken ./example.py arg1
# ->
$ ./example.py arg1
$ quicken -m pytest -s -ra
# ->
$ pytest -s -ra

For scripts:

  1. Input is normalized to path then resolved to real_path
  2. name is quicken.cli.file.{digest}, where digest = sha256(path)
  3. metadata:
    1. path
    2. real_path (reload criteria)
    3. ctime of real_path (reload criteria)
    4. mtime of real_path (reload criteria)
  4. function:
    1. parse the file at real_path, extract into prelude and if __name__ == '__main__' section. Execute prelude as preparation and return conditional section as the main function.

For modules:

  1. resolve module
  2. name is quicken.cli.module.{digest}, where digest = sha256(module_path)
  3. metadata:
    1. module_path
    2. module
    3. last modified time of module_path (reload criteria)
  4. function:
    1. parse module, getting imports manually
    2. return function that executes the rest of the module body of __main__.py

Client should restart server if process user/group ids have changed

Processes have:

  • effective group id
  • supplementary group ids
  • real group id
  • saved set-group-id
  • real user id
  • effective user id
  • saved set-user-id

Currently these are disregarded except when creating and checking the runtime directory which compares the owner to getuid (real user uid).

The expected behavior is that the runner process will execute the command with the same attributes as a given client process (or be indistinguishable).

Changes to supplementary and real group id can happen easily and must be accounted for.

This would only impact effective user/group id if the script itself is setgid/setuid. Since scripts cannot be setuid/setgid this would have to be set on:

  1. Python itself - not likely
  2. on an executable embedding Python - not likely to need this library
  3. on an executable generated from a Python application packaging utility (e.g. pyInstaller) - reasonable use case, so we should at least have a path to support it in the future

This may also happen if an app using this library calls e.g. setgid during import.

Given the above:

  1. if the real/effective/saved uid/gid for a process are different then raise an exception - better safe than leaving it undefined - this can be revisited later if there is a need for supporting it
  2. if the supplementary group ids or real group id for a client is different than for a server, then the current server should be requested to stop and the command should be run with a newly-started server

Set environment variable when in use

The incentive to make CLI applications faster can lead to lazy loading and initialization in Python CLI applications - but this is counterproductive if using quicken. To facilitate using different strategies, we should set an environment variable so applications that already optimize for startup time can toggle e.g. lazy/eager imports based on whether quicken is enabled.

Support server auto-reload

One common use case for auto-reload is to bring in any changes to files as part of development. The problem with the current mechanism of reload is that knowing whether any of the relevant files are updated at client start would take so long that it's not worth it. Instead we should support auto-reload on changed files (or on callback from a user-provided function). For an autoreload example see here.

Add coverage

  • Must account for code touched by forked processes
  • Coverage badge on README (codecov?)
  • Integrated with CI
  • Coverage HTML should be generated as an artifact in Azure pipeline
  • See advice here for example

Update LICENSE

Removing [fullname] and moving to LICENSE.txt (less hassle for Windows users)

ASCIInema intro

For example:

time pip --help
time pip --help
time pip --help
alias qpip="quicken run --from-path pip"
time qpip --help
time qpip --help
time qpip --help
alias qpipctl="quicken status --from-path pip"
qpipctl
qpipctl --json

Handle exceptions in server handler

We do not currently catch any exceptions from the callback function. We should emulate the Python interpreter behavior:

  1. If SystemExit is thrown then its code member is evaluated to the exit code:
    1. If code is an integer then it is used as-is
    2. If code is None then the exit code is 0
    3. Otherwise the exit code is 1
  2. If an unhandled exception is thrown then the exit code is 1

Consider using PEP-0554

PEP-0554 in 3.8 provides Python-level support for sub-interpreters. This can allow running isolated code while pre-seeding the module cache with the required imports.

One downside would be our current signal propagation would need to be made part of the application protocol and the resulting behavior would be emulated by actions on the sub-interpreter instead of actually delivering a signal to the process.

Provide interface for server status and control

It is simpler to have a mechanism for external server control than it is to put everything in that might be needed (e.g. #6).

If a command app provides an app-c (app client, wrapped with this library), then it's not too much to also have an app-ctl that would allow control/query if needed.

Use black coding style

Also integrate with CI (fail PRs that don't conform) and put instructions in README. Better sooner than later.

Rewrite command-lines visible in ps to align with process roles

There are three processes involved in handling of requests: client, server, and handler. The current command lines are:

  1. client: command line matching how it was invoked
  2. server: command line of the client that started the server
  3. handler: same as server

This is pretty unexpected, since most users would assume that the command-line reflects the actual work being done. A more friendly approach would be:

  1. client: (cli client) {command line} - clearly showing the role of the process which can make it easier to identify the correct process to kill (i.e. not this one)
  2. server: (cli server) {server name} - where {server name} is the name passed to the cli_factory decorator, also helps to identify the process
  3. handler: {command line} - this reflects the actual work being done by the process, and is one of the first things that users of a cli would look for if grepping ps output or using pkill. Actually killing the handler process directly would also lead to the most intuitive outcome - the client that is running in the foreground of a shell would exit with the expected return code

Implementation notes:

  1. OpenBSD: unix.stackexchange
  2. Linux: unix.stackexchange - requires root or kernel 3.18 (which is higher than our target RedHat 7.x which has kernel 3.10).

Integrate with packaging tools to allow console_scripts-like usage

The current implementation is generic enough that it can be used in place of a setuptools-generated script and be used to run packages directly.

The solution should work for (to start):

  1. setup.py with setuptools
  2. flit
  3. poetry

Examples of current script designation:

setuptools console_scripts entry point:

setup(
    ...
    entry_points={
        'console_scripts': ['funniest-joke=funniest.command_line:main'],
    }
    ...
)

flit scripts:

[tools.flit.scripts]
flit = "flit:main"

poetry scripts

[tools.poetry.scripts]
poetry = 'poetry:console.run'

Usage should be about that easy and should work with both source distributions and wheels.

Document explicit requirements for applications

The following is unsupported before execution of the main entrypoint function (the function returned by the @quicken-decorated method):

  • threads
  • reading configuration from environment/args
  • using cwd
  • setting signal handlers
  • sub-processes
  • reading plugin information (from e.g. pkg_resources)

The following is unsupported:

  • atexit handlers

Support tools like python-hunter

Currently we do not do anything to try and propagate handlers registered with sys.settrace, and the .pth-file trick used by python-hunter also wouldn't be triggered by updating os.environ.

Normalize logging

Currently logging is configured within the library itself. This isn't a good practice.

We can have a central logging configuration to be used in scripts, cli, and tests, but the setup of that logging should not infect the rest of our application.

Requirements:

  • for tests:
    • there should be a log file per test in the logs directory and the log level should be set to DEBUG
  • for scripts (script, ctl_script, _cli):
    • set log level to debug based on environment variable
    • create file handler and default formatter
  • for library
    • do not propagate by default
    • null handler

For now, don't worry about loading time for the logging module, or anything like request-specific logging (server using the logging configured by the client for the duration of the request handling).

Gracefully handle unsupported platforms

Our dependency on psutil means that dependent utilities will fail to install on unsupported platforms. For example, on HP-UX attempting to install results in:

Complete output from command python setup.py egg_info:
platform hp-ux11 is not supported

because of attempt to install from source distribution for psutil.

Degrade gracefully on Windows

We depend heavily on Unix-only features (socket files, fork), but user code should not have to accommodate our limitation - it should look the same on both Unix and Windows.

"test mode" for quicken CLI

There are several restrictions on scripts as mentioned in the docs. If a script uses any of these then the behavior will be inconsistent. We should have a "test mode" which will intercept calls/usages of the most common associated methods for accessing these during the import phase and output warnings. For example

  • starting threads - threading.Thread.start() (instantiation should be OK)
  • reading environment - os.environ, os.getenv, os.putenv
  • reading arguments - sys.argv
  • cwd - os.getcwd(), Path.cwd()
  • several more methods in os
  • atexit

Multiprocessing waits for child process exit on server start

When running a command for the first time it hangs at supposed "exit". Sending SIGINT results in

^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/chris/.pyenv/versions/3.7.1/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

We should disable this behavior in the parent for the server process that gets spawned.

Add library version file in runtime directory

To avoid having to worry about backward compatibility in the client/server protocol, we should write the necessary information into a file so that the library can determine whether it needs to stop and restart the server due to an older library version.

For example, in $RUNTIME_DIR/state.json:

{
  "pid": 12345,
  "timestamp": 1547000603.123456,
  "version": "0.1.0"
}

This can be used by the client to identify the current server process and kill it.

More explicit list of limitations/restrictions on the underlying processes

From the README it's not clear what the differences are for a process running normally and under quicken.

What is different:

  • the environment, stdin/stdout/stderr, umask, cwd, sys.argv, pid will be different between the execution of the top-level code (e.g., imports, function definitions, global variable definitions) and the execution of the entrypoint
  • signals to a quickened process are proxied to the actual process (source)
  • the actual app being executed will be a child of the command server, not the users shell
  • there may be a long time between execution of the top-level code and execution of the entrypoint (for edample, state retrieved at the top-level of an application may be stale by the time it is used)

What isn't different:

  • global changes made from the application entrypoint onward will only exist in that instance of the application

TBD:

  • atexit callbacks

Add tests for import time

No point to the library if import isn't fast. We need reproducible benchmarks for:

  • import and bypassing client/server
  • client start when server is up
  • client start server not up
  • no import, run test code directly

Support pytest without -s

Currently if pytest is executed without -s, then many tests fail due to e.g.:

___________________________________________________________________________________ test_runner_reloads_server_on_different_gid ____________________________________________________________________________________

    def test_runner_reloads_server_on_different_gid():
        # Given the server has been started with real gid 1
        # And the decorated function is executed with real gid 2
        # Then the server should be reloaded, and the decorated function executed
        @cli_factory(current_test_name())
        def runner():
            def inner():
                output_file.write_text(
                    f'{os.getpid()} {os.getppid()}', encoding='utf-8')
                return 0
            return inner

        with isolated_filesystem() as path:
            output_file = path / 'test.txt'

            with contained_children():
                with patch('os.getgid', lambda: 1):
>                   assert runner() == 0

tests/cli_wrapper/test_cli_wrapper.py:1114:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
quicken/lib/_decorator.py:81: in wrapper
    user_data=user_data,
quicken/lib/_lib.py:141: in server_runner_wrapper
    response = client.send(req)
quicken/lib/_client.py:18: in send
    self._client.send(request)
/home/chris/.pyenv/versions/3.7.2/lib/python3.7/multiprocessing/connection.py:206: in send
    self._send_bytes(_ForkingPickler.dumps(obj))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

cls = <class 'multiprocessing.reduction.ForkingPickler'>, obj = <quicken.lib._protocol.Request object at 0x7f392dd83860>, protocol = None

    @classmethod
    def dumps(cls, obj, protocol=None):
        buf = io.BytesIO()
        #cls(buf, protocol).dump(obj)
>       cls(buf, protocol).dump(obj)
E       TypeError: cannot serialize '_io.FileIO' object

The reason is because sys.stdout and sys.stderr are replaced with EncodedFile by pytest, and these are not recognized as io.TextIOWrapper objects. The reason it shows up as _io.FileIO is because that is the failed member of the input objects (EncodedFile).

Options:

  1. Support transfer of FileIO objects (or the containing EncodedFile)
    • Should be easier to implement
  2. Wrap objects with io.TextIOWrapper

Since our use case is sending these entities over an fd, and there would be no way to distinguish them on the other side except type, 1 is probably the best option.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.