GithubHelp home page GithubHelp logo

facebook / dotslash Goto Github PK

View Code? Open in Web Editor NEW
547.0 15.0 14.0 2.51 MB

Simplified executable deployment

Home Page: https://dotslash-cli.com

License: Apache License 2.0

Rust 83.67% Python 11.18% JavaScript 4.04% CSS 1.11%

dotslash's Introduction

DotSlash: simplified executable deployment

License Build Status

DotSlash (dotslash) is a command-line tool that lets you represent a set of platform-specific, heavyweight executables with an equivalent small, easy-to-read text file. In turn, this makes it efficient to store executables in source control without hurting repository size. This paves the way for checking build toolchains and other tools directly into the repo, reducing dependencies on the host environment and thereby facilitating reproducible builds.

We will illustrate this with an example taken from the DotSlash website. Traditionally, if you want to vendor a specific version of Node.js into your project and you want to support both macOS and Linux, you likely need at least two binaries (one for macOS and one for Linux) as well as a shell script like this:

#!/bin/bash

# Copied from https://stackoverflow.com/a/246128.
DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"

if [ "$(uname)" == "Darwin" ]; then
  # In this example, assume node-mac-v18.16.0 is a universal macOS binary.
  "$DIR/node-mac-v18.16.0" "$@"
else
  "$DIR/node-linux-v18.16.0" "$@"
fi

exit $?

With DotSlash, the shell script and the binaries can be replaced with a single file named node:

#!/usr/bin/env dotslash

// The URLs in this file were taken from https://nodejs.org/dist/v18.19.0/

{
  "name": "node-v18.19.0",
  "platforms": {
    "macos-aarch64": {
      "size": 40660307,
      "hash": "blake3",
      "digest": "6e2ca33951e586e7670016dd9e503d028454bf9249d5ff556347c3d98c347c34",
      "format": "tar.gz",
      "path": "node-v18.19.0-darwin-arm64/bin/node",
      "providers": [
        {
          "url": "https://nodejs.org/dist/v18.19.0/node-v18.19.0-darwin-arm64.tar.gz"
        }
      ]
    },
    // Note that with DotSlash, it is straightforward to specify separate
    // binaries for different platforms, such as x86 vs. arm64 on macOS.
    "macos-x86_64": {
      "size": 42202872,
      "hash": "blake3",
      "digest": "37521058114e7f71e0de3fe8042c8fa7908305e9115488c6c29b514f9cd2a24c",
      "format": "tar.gz",
      "path": "node-v18.19.0-darwin-x64/bin/node",
      "providers": [
        {
          "url": "https://nodejs.org/dist/v18.19.0/node-v18.19.0-darwin-x64.tar.gz"
        }
      ]
    },
    "linux-x86_64": {
      "size": 44694523,
      "hash": "blake3",
      "digest": "72b81fc3a30b7bedc1a09a3fafc4478a1b02e5ebf0ad04ea15d23b3e9dc89212",
      "format": "tar.gz",
      "path": "node-v18.19.0-linux-x64/bin/node",
      "providers": [
        {
          "url": "https://nodejs.org/dist/v18.19.0/node-v18.19.0-linux-x64.tar.gz"
        }
      ]
    }
  }
}

Assuming dotslash is on your $PATH and you remembered to chmod +x node to mark it as executable, you can now run your Node.js wrapper exactly as you did before:

$ ./node --version
v18.16.0

The first time you run ./node --version, you will likely experience a small delay while DotSlash fetches, decompresses, and verifies the appropriate .tar.gz, but subsequent invocations should be instantaneous.

To understand what is happening under the hood, read the article on how DotSlash works.

Installing DotSlash

See the installation instructions on the DotSlash website.

License

DotSlash is licensed under both the MIT license and Apache-2.0 license; the exact terms can be found in the LICENSE-MIT and LICENSE-APACHE files, respectively.

dotslash's People

Contributors

abhinav avatar amyreese avatar andrewjcg avatar bigfootjon avatar bolinfest avatar cookiecomputing avatar danielocfb-test avatar diliop avatar facebook-github-bot avatar jagill avatar keito avatar ndmitchell avatar rodrigodesalvobraz avatar sargun avatar stepancheg avatar zertosh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dotslash's Issues

Homebrew release

It would be nice to be able to manage dotslash using homebrew on MacOS

Most other Meta's devtools are released on homebrew.

Some tar.gz archives dont decompress correctly

Take for example - https://github.com/astral-sh/ruff/releases/download/v0.4.2/ruff-0.4.2-aarch64-apple-darwin.tar.gz

When decompressing with tar - tar -xvf ruff-0.4.2-aarch64-apple-darwin.tar.gz , it outputs ruff

But with dotslash v0.41.0, it outputs DOTSLASH_CACHE/23/d6c44ba88ccffacf9de63082a597f3d2ad77f7/GNUSparseFile.0/ruff

Seems like this entry type variant might need special handling possibly

https://docs.rs/tar/latest/tar/enum.EntryType.html#variant.GNUSparse

Relevant config for dtslash

"macos-aarch64": {
      "size": 8144818,
      "hash": "blake3",
      "digest": "02b131cb0da1e157ddf1ab96cc100c25b519d052a4cf6c55602614425b2fa37d",
      "format": "tar.gz",
      "path": "GNUSparseFile.0/ruff",
      "providers": [
        {
          "url": "https://github.com/astral-sh/ruff/releases/download/v0.4.2/ruff-0.4.2-aarch64-apple-darwin.tar.gz"
        }
      ]
    }

Interestingly, running GNUSparseFile.0/ruff via dotslash fails. But running the directly downloaded binary works as expected

Fix Homebrew release

I believe the Homebrew release is not being done correctly.

For macOS, we release a universal binary:

macos:
needs: create-release
runs-on: macos-13
timeout-minutes: 30
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
targets: aarch64-apple-darwin,x86_64-apple-darwin
- run: cargo test
- run: cargo clippy
- run: cargo build --release --target aarch64-apple-darwin
- run: cargo build --release --target x86_64-apple-darwin
- run: lipo -create -output dotslash target/aarch64-apple-darwin/release/dotslash target/x86_64-apple-darwin/release/dotslash
- run: tar -czvf "dotslash-macos.${GITHUB_REF#refs/tags/}.tar.gz" dotslash
shell: bash
- name: upload release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
shell: bash
run: gh release upload "${GITHUB_REF#refs/tags/}" "dotslash-macos.${GITHUB_REF#refs/tags/}.tar.gz"

As explained here:

:::note
On macOS, we **strongly** recommend running DotSlash as a
[Universal Binary](https://developer.apple.com/documentation/apple-silicon/building-a-universal-macos-binary)
rather than an x86 or ARM64 binary. If an x86 binary is running under
[Rosetta](https://developer.apple.com/documentation/apple-silicon/about-the-rosetta-translation-environment)
on Apple Silicon and ends up spawning `dotslash`, then for consistency with the
parent process, this will ensure that the `macos-x86_64` artifact will be run.
:::

But based on my read of how the Brew formula was created:

Homebrew/homebrew-core@8e53cba

and the subsequent changes to that file:

https://github.com/Homebrew/homebrew-core/commits/master/Formula/d/dotslash.rb

It looks like it is doing a simple cargo install, so it is not creating a universal binary.

Can we work with the Homebrew maintainers to fix this?

Writing a dotslash file for a python toolchain

Given this dotslash file:

#!/usr/bin/env dotslash

{
  "name": "python-standalone",
  "platforms": {
    "macos-aarch64": {
      "size": 26705084,
      "hash": "blake3",
      "digest": "03555c515b0b59c9a8bc15386343228767f3c452c474cddc4cd8949473c30c27",
      "format": "tar.zst",
      "path": "python/install/bin/python",
      "providers": [
        {
          "url": "https://github.com/indygreg/python-build-standalone/releases/download/20240224/cpython-3.11.8+20240224-aarch64-apple-darwin-pgo-full.tar.zst"
        }
      ]
    },
    "macos-x86_64": {
      "size": 26292710,
      "hash": "blake3",
      "digest": "e7a824fdba50916674045b4d64dc07c1d172ec84d438f4cc6ba3c01e39992f56",
      "format": "tar.zst",
      "path": "python/install/bin/python",
      "providers": [
        {
          "url": "https://github.com/indygreg/python-build-standalone/releases/download/20240224/cpython-3.11.8+20240224-x86_64-apple-darwin-pgo-full.tar.zst"
        }
      ]
    },
    "linux-x86_64": {
      "size": 35135207,
      "hash": "blake3",
      "digest": "1edbb8cbde2be264dda8c531c928ff3740a377d8398584dcac7cfeac3b5e190e",
      "format": "tar.zst",
      "path": "python/install/bin/python",
      "providers": [
        {
          "url": "https://github.com/indygreg/python-build-standalone/releases/download/20240224/cpython-3.11.8+20240224-x86_64-unknown-linux-gnu-pgo-full.tar.zst"
        }
      ]
    }
  }
}

I would expect executing it with no arguments to drop me in to a Python REPL. Instead I get this error:

$ ./scripts/bin/python
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = './scripts/bin/python'
  isolated = 0
  environment = 1
  user site = 1
  safe_path = 0
  import site = 1
  is in build tree = 0
  stdlib dir = '/install/lib/python3.11'
  sys._base_executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.base_prefix = '/install'
  sys.base_exec_prefix = '/install'
  sys.platlibdir = 'lib'
  sys.executable = '/Users/dan/devel/backend2/scripts/bin/python'
  sys.prefix = '/install'
  sys.exec_prefix = '/install'
  sys.path = [
    '/install/lib/python311.zip',
    '/install/lib/python3.11',
    '/install/lib/python3.11/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00000001e8f13ac0 (most recent call first):
  <no Python frame>

I think this means that python can't find the various libraries that it wants to link against.

If I run this same binary directly from the dotslash cache it works:

~/Library/Caches/dotslash/f0/d51d6feaa418f63e844885ba229db6c8815c74/python/install/bin/python3
Python 3.11.8 (main, Feb 25 2024, 03:37:49) [Clang 17.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

For toolchains like this is it necessary to modify them to work with dotslash? I read through #6 but as far as I can tell this python archive should be the first, straightforward case described there. Curious to learn how to handle this. :)

Thanks for open sourcing dotslash! It has made my life a lot easier recently.

Provider randomization

It is currently possible to supply multiple providers for the same executable.
dotslash will try these in order, top-to-bottom.

I'd like to propose a way for a DotSlash file to override the selection strategy, e.g. "random" for pick a provider at random.

Use case:
There are N mirrors of an executable.
The load for them should be distributed somewhat randomly instead of overloading one.
The mirrors are maintained by different parties so putting a load balancer in front is not an option.

[RFC] configuration file?

Currently, DotSlash does not support any sort of configuration file. The only thing that is really "configurable" is the location of the DotSlash cache folder, which defaults to the dotslash subfolder under the operating system's cache dir, but can be overridden via the DOTSLASH_CACHE environment variable.

An important advantage of this approach is that the "fast path" in the DotSlash execution flow (i.e., a cache hit):

https://dotslash-cli.com/docs/execution/

does not have to read any files other than the DotSlash file itself. Ideally, in adding support for a DotSlash config file, we would maintain this invariant. That is, the "fast path" should not have to read a config file, but the "slow path" is allowed to.

Config Options

Ideas we have kicked around in the past include:

  • general curl configuration (which curl executable to use, proxy info, etc.)
  • configuration for existing providers (for example, perhaps the URLs for the HTTP provider should be subject to an allowlist/blocklist)
  • how to add custom providers (nothing is decided here yet: could be via shared libraries, though maybe also shelling out to another executable that takes a single JSON param)
  • whether to enable UI to show any sort of "progress bar" when fetching an artifact
  • garbage collection (which does not exist yet, but will likely need to be configurable when it does)

Location

We probably want some sort of "cascading set" of config files were one can override another, such as:

  • .dotslashconfig in a parent folder / repo root?
  • <CONFIG DIR>/dotslash
  • /etc/dotslash/config

File Format

To avoid an undue increase on the size of the DotSlash binary, we should use "jsonrc" as the config file format.

Feedbacks on migrating to full dotslash dev env

First of all, love the project! It has become my first install for non-nix projects. Here are some feedbacks from my experience so far.

  1. Tools working as expected: docker cli, node, ruff, shfmt, uv. Still waiting for .tar.xz for shellcheck and .zip for dprint, rain
    Btw, to use other commands bundled with node (e.g. npm/npx), I either need to duplicate node file and replace node with these commands. Or use workaround file for npm/npx/pnpm/pnpx/yarn (this supports symlink too).
#!/usr/bin/env node
require(process.execPath.replace("/bin/node", "/lib/node_modules/corepack/dist/lib/corepack.cjs"))
    .runMain([require("path").basename(process.argv[1]), ...process.argv.slice(2)])
  1. Tested dotslash in WSL, macos, container (Debian 12), Github Action, AWS Cloud Shell (Fedora), and Google Cloud Shell (Debian 11). It only failed at debian 11 which has older glibc 2.31. If you are open to publish dotslash to pip later, building for manylinux or with cross will lower glibc version requirement and make dotslash more portable.

  2. Support one line install in dockerfile or cloudshell without curl | sh

curl -LSfs https://github.com/facebook/dotslash/releases/latest/download/dotslash-$(uname | tr DL dl)-$(uname -m).tar.gz | tar fxz - -C ~/.local/bin/

or in powershell

cmd /c "curl.exe -LSfs https://github.com/facebook/dotslash/releases/download/latest/dotslash-windows.tar.gz | tar fxz - -C .local\bin"

To support up-to-date one-liner, will need to remove version and replace arm64 with aarch64 in filename. Rename ubuntu-22.04 to linux since it works in fedora/debian with newer glibc too.
https://github.com/facebook/dotslash/releases/latest/download/dotslash-ubuntu-22.04.arm64.v0.2.0.tar.gz becomes https://github.com/facebook/dotslash/releases/latest/download/dotslash-linux-aarch64.tar.gz.
If pinned version is desired, it is still available by url https://github.com/facebook/dotslash/releases/download/v0.2.0/dotslash-linux-aarch64.tar.gz

  1. will be nice to have experimental command update-version which replaces version, updates metadata, and performs json format
dotslash -- update-version node 18.19.0 20.11.1

Support Google Cloud Storage provider

Rust has good support for GCS, including application-default credentials etc

Would be open to make a PR -- just wanted to check in advance if it would be accepted

Please don't retag releases

Downstream packager here. Every time an upstream project retags something that already went out we have to do a lot of extra work to verify what changed and why and that it isn't malicious. Versioned tags are meant to be immutable. Please just bump the patch version and try again if/when something goes wrong with the release process or a bug is found early on.

Add aarch64 to ubuntu release

Would you please consider adding aarch64 binary release for linux VM running on Apple Silicon without rosetta2? Thanks

Misplaced "format" entry not flagged as error

I accidentally put the "format" config entry in the provider section, instead of the next level up inside "platforms". This was not detected as a config error, it just failed at runtime when the file wasn't unpacked.

wildcards/latest for github release?

Congrats on the release!

Any interest in contributions supporting wildcard/latest matching for github releases? e.g. specify a pattern like 7.0.* for bazel, call the github api to see what is available, then download the metadata (hash etc) and content for the latest one matching (currently 7.0.2)

This would allow using dotslash to replace "get and run latest matching X" usages of bazelisk and similar downloader/caching tools.

Essentially would be adding a "find url and other metadata" step for cases where people trust the release stream to do the right thing for them given the pattern

Feature request: a "check" command that verifies the config for all platforms

It would be nice to have a โ€œcheckโ€ command that verifies the download, hash etc for every defined platform. Ideally Iโ€™d have some CI that checks this, but for interactive dev tools thatโ€™s not always the case and regardless would be nice to have a quick check before waiting for CI.

[RFC] Custom Providers

Proposal for Custom Providers

Today, the docs state:

At the time of this writing, there is no way to add custom providers without forking DotSlash.

But we have already seen interest in custom providers, so it seems like we should start discussing possible solutions. Note this will likely require some sort of configuration file, which, as a reminder, we would like to avoid having to read in the case of a cache hit.

While the design for the configuration file is still under discussion, let's assume for the moment that at least two locations for provider-specific data are supported:

  • $XDG_STATE_HOME/dotslash/provider/ where providers installed by the user live
  • /etc/dotslash or some location for system-wide configuration. In practice, an entity might push enterprise-wide providers to this folder with the expectation that end-users should not write this folder directly.

Provider is an executable

Today, the things a provider needs to know are:

  • the (platform, artifact-entry) pair being requested
  • the path where the fetched artifact should be written
  • [optional] what UI, if any, is the provider allowed to present to the user to communicate fetch progress

One option would be to pass everything thing needs to the executable via a single JSON argument and then stream the stdout from the provider invocation directly to the path where the artifact should be written. This way, the provider does not get any direct knowledge about the layout of the $DOTSLASH_CACHE.

Because the provider can be an executable, it makes sense for the provider to be a DotSlash file. For example, we could have:

$XDG_STATE_HOME/dotslash/provider/<provider-name>

where <provider-name> is the name of the DotSlash file, which must also match the "type" used in the "providers" section of a DotSlash file. (The "name" in the DotSlash file should probably also be required to match.) Note that this file will always be executed by DotSlash itself, so there is no need for any special Windows stuff.

How to install a provider

A simple option is to support a subcommand like dotslash -- install-provider URL_TO_PROVIDER that would fetch the specified URL, verify it contains a DotSlash file, and then write it to $XDG_STATE_HOME/dotslash/provider/<provider-name>, as appropriate.

Another option (we'll call this the "DotSlash Inception" option) would be to enable a DotSlash file to include metadata about how to obtain a provider referenced in the file. Example:

{
  "name": "example-cli",
  "providers": {
    "my-custom-cas": {
      "size": 40660307,
      "hash": "blake3",
      "digest": "6e2ca33951e586e7670016dd9e503d028454bf9249d5ff556347c3d98c347c34",
      // Must be a single DotSlash file?
      "format": "gz",
      // No need to specify path because it must be my-custom-cas?
      "providers": [
        {
          "url": "https://example.com/my-custom-cas"
        }
      ]
    }
  }
  "platforms": {
    "linux-x86_64": {
      /* size, hash, digest, format, path */
      "providers": [
        {
          "type": "my-custom-cas",
          "id": "72b81fc3a30b7bedc1a09a3fafc4478a1b02e5ebf0ad04ea15d23b3e9dc89212"
        }
      ]
    }
  }
}

The idea is that when example-cli is run for the first time, DotSlash sees that it should use the my-custom-cas provider. If the user does not have it installed, DotSlash can use the information in the providers section to install the provider first and then use it to fetch example-cli.

There are a lot of questions on how strict we might be on the requirements for a provider. There are also questions around how to know when to install a new version of a provider, or what to do if multiple DotSlash files try to provide different implementations of a provider (particularly with respect to defending against attackers).

Suggestion: allow leading `#` comments in dotslash files

Sometimes you need to hand-craft a dotslash config file and sometimes it's nice to add some commentary about said file.

For example I was authoring a config for gofmt and wanted to document how I got all the values to make it easy for the next person to update it in future.

Currently when I added a # line immediately after the hashbang, dotslash emitted the following:

dotslash error: problem with `/path/to/gofmt`
caused by: failed to parse DotSlash file
caused by: expected value at line 1 column 1

Latest release and docs should match

๐Ÿ‘‹ It looks like the docs match main, but not the latest version.

This means that the docs show that dotslash supports .tar.xz however 0.2.0 does not. If I build against main, I'm able to use tar.xz.

Or could main be released so I can have people pull dotslash and use my tar.xz packages?

Bootstrapping dotslash

Hello! Congrats on open sourcing the project.
I just discovered it, and I have a question around best practices with bootstrapping.

From my point of view, dotslash helps make a monorepo (or even a regular repo) more self-contained: commit dependency executable scripts to the repository, and let dotslash manage them.
However, there's still one issue: you have to have dotslash installed to get the rest.

I'm starting this issue to discuss that:
How could a repository bootstrap dotslash so that it can use it, without having every contributor install it.

"That's a minimum requirement" is a completely valid position here.
Feel free to close the issue if that's the case.

An alternative that I can see this working is to use a shell script to bootstrap (similar to pantsw, buckw, bazelisk):
Provide a script that will download or build a specific version of dotslash, and cache it somewhere.
Basically a lightweight version of dotslash's own functionality.
(This is also close to the approach used by a similar tool, Hermit FWIW.)

I'm curious about the maintainers' thoughts about what their preferred approach would be here.

S3 Provider

For my use-case, I have internal artifacts that are in S3 and need auth to fetch. Would ya'll accept a contribution to add S3 as a provider?

Add guide for writing a `Provider`?

It looks like new providers could be added in the following way:

  1. Define a provider type which implements the Provider trait (see GitHubReleaseProvider)
  2. Wire it up in the get_provider impl for DefaultProviderFactory in main.rs.

It would be nice to have this reflected in the documentation!

If y'all are open to contributions, I'd be happy to take a crack at it. Looks like the docs themselves are built with Docusaurus, which shouldn't be difficult to work with.

My only question to validate at the moment is the proper handling of the provider config. It looks like the intent is that the provider config is given as a valid "loosely parsed" JSON value, which the provider itself is then able to deserialize into any specific structure it likes. Do I have that right?

Case study: rustc as a dotslash executable, with sysroots

@ashleygwilliams was interested in seeing this written up.

The straightforward way is for a single dotslash artifact to package both the toolchain binaries and all the sysroots:

  • rustc (dotslash executable)
    • bin/
      • rustc.linux.x86_64 (entry point)
    • lib/
      • rustlib/
        • x86_64-unknown-linux-gnu/
          • lib/
            • libstd.rlib
            • ...
        • aarch64-apple-ios/
          • ...
        • ... many others

When this dotslash-based rustc is run, dotslash would download and unpack all of those sysroots, both of which are slow operations. But it works. Rustc knows to look in ../lib/rustlib for sysroots.

A better way avoids having to download and unpack unused sysroots, while still having sysroots downloaded/cached/synchronized through dotslash, and available for a large range of target platforms.

  • rustc (dotslash executable)
    • sysroot-multiplexer.linux.x86_64 (entry point)
    • bin/
      • rustc.linux.x86_64
    • lib/
      • rustlib/
        • x86_64-unknown-linux-gnu/
          • lib/
            • libstd.rlib
            • ...
      • aarch64-apple-ios (dotslash executable)
        • sysroot-multiplexer.linux.x86_64 (entry point)
        • lib/
          • rustlib/
            • aarch64-apple-ios/
              • lib/
                • libstd.rlib
                • ...
      • thumbv7em-none-eabihf (dotslash executable)
        • ...
      • ... many others

In this approach, the native sysroot is always available by default (for proc macros), but sysroots for cross-compilation are managed as follows. When the dotslash-based rustc is run, instead of the real rustc being the entry point ("path" in the JSON), there is a tiny program for which the source is below. It parses the rustc command line to find what --target you are building for, spawns a subprocess corresponding to the dotslash executable responsible for the sysroot for that target, waits for the subprocess to print a directory path on its stdout, and uses that directory path as the --sysroot argument for invoking the real rustc. The sysroot subprocess behaves as follows: when dotslash has downloaded and unpacked and executed it, it just prints out its own location, which is some path within the dotslash cache, then blocks until stdin is closed by the parent process to indicate the sysroot is done being accessed. The sysroot subprocess must remain running the entire time the parent process is accessing the sysroot, so that the sysroot does not get evicted by dotslash gc.


// Main.rs for `sysroot-multiplexer`. Cross-compiling with `-Zbuild-std` and Zig's
// linker produces Linux, macOS, and Windows binaries that are 77K-156K.

use std::env;
use std::env::consts::{ARCH, EXE_EXTENSION, OS};
use std::ffi::{OsStr, OsString};
use std::fs;
use std::io;
use std::io::{BufRead, BufReader, ErrorKind, Read, Write};
use std::path::Path;
use std::process::{self, Command, Stdio};

fn main() {
    if let Err(err) = try_main() {
        let _ = writeln!(io::stderr(), "sysroot-multiplexer error: {err}");
        process::exit(1);
    }
}

fn try_main() -> io::Result<()> {
    let current_exe = env::current_exe()?;
    let dir = current_exe.parent().unwrap();

    let bin_rustc = dir.join("bin").join("rustc").with_extension(EXE_EXTENSION);
    if bin_rustc.exists() {
        return invoke_rustc_with_sysroot(dir, &bin_rustc);
    }

    let rustlib = dir.join("lib").join("rustlib");
    if rustlib.exists() {
        return print_sysroot_and_wait(dir);
    }

    let msg = format!(
        "neither bin/rustc nor lib/rustlib exist in {}",
        dir.display()
    );
    Err(io::Error::new(ErrorKind::Other, msg))
}

// Guess a sysroot based on what *this* sysroot-multiplexer executable was
// compiled for.
fn guess_host_for_target_triple() -> &'static str {
    match (OS, ARCH) {
        ("linux", "x86_64") => "x86_64-unknown-linux-gnu",
        ("linux", "aarch64") => "aarch64-unknown-linux-gnu",
        ("macos", "x86_64") => "x86_64-apple-darwin",
        ("macos", "aarch64") => "aarch64-apple-darwin",
        ("windows", "x86_64") => "x86_64-pc-windows-msvc",
        _ => panic!("what kind of computer is this... {OS}/{ARCH}"),
    }
}

// Check if something looks like a path.
//
// We blindly use target triples as path components to the sysroot executable.
// This check is to avoid executing something that would give a cryptic error
// message.
//
// Rust does not define what characters are valid for target triples, so we just
// concern ourselves with basic path parts. See
// <https://rust-lang.github.io/rfcs/0131-target-specification.html>
fn is_path_like(triple: &OsStr) -> bool {
    let mut comps = Path::new(triple).components();
    // A "Normal" component is not a ".", ".." or "/".
    !(matches!(comps.next(), Some(std::path::Component::Normal(_))) && comps.next().is_none())
}

// A more sophisticated parser would be more correct. There's code in
// <https://crates.io/crates/rustflags> to fully parse a rustc command line.
//
// Rustc does not do recursive argsfile expansion, despite the original PR
// (rust-lang/rust#63175) implying so. Also, the [code][] as it exists today
// doesn't look like it does that, and experimentation confirms it:
//
//   $ rustc --version
//   rustc 1.65.0 (897e37553 2022-11-02)
//
//   $ rustc @<(echo '--version')
//   rustc 1.65.0 (897e37553 2022-11-02)
//
//   $ rustc @<(echo @<(echo '--version'))
//   error: couldn't read @/dev/fd/10: No such file or directory (os error 2)
//
//   error: aborting due to previous error
//
// [code]: https://github.com/rust-lang/rust/blob/19423b59440f/compiler/rustc_driver/src/args.rs
fn parse_rustc_args<I>(args: I, cmd: &mut Command) -> (bool, Option<OsString>)
where
    I: IntoIterator,
    I::Item: Into<OsString>,
{
    let mut expect_target = false;
    let mut has_sysroot = false;
    let mut target = None;

    let mut parse_arg = |arg: &OsStr, arg_str: Option<&str>| {
        if expect_target {
            target = Some(arg.to_owned());
        }
        if let Some(arg) = arg_str {
            if arg == "--sysroot" || arg.starts_with("--sysroot=") {
                has_sysroot = true;
            }
            if let Some(found_target) = arg.strip_prefix("--target=") {
                target = Some(OsString::from(found_target));
            }
            expect_target = arg == "--target";
        }
    };

    for arg in args {
        let arg = arg.into();
        let arg_str = arg.to_str();
        if let Some(argsfile) = arg_str.and_then(|x| x.strip_prefix('@')) {
            // Let rustc itself complain that an argsfile can't be read.
            if let Ok(content) = fs::read_to_string(argsfile) {
                for line in content.lines() {
                    parse_arg(OsStr::new(line), Some(line));
                }
            }
        } else {
            parse_arg(&arg, arg_str);
        }
        cmd.arg(arg);
    }

    (has_sysroot, target)
}

fn invoke_rustc_with_sysroot(dir: &Path, rustc: &Path) -> io::Result<()> {
    let mut cmd = Command::new(rustc);

    let (has_sysroot, target) = parse_rustc_args(env::args_os().skip(1), &mut cmd);

    // TODO: Are there other cases we can skip a --sysroot flag? Stuff like
    // `rustc --version` and `rustc --help` do not need it. Various of the
    // `rustc --print` options probably also do not.
    let mut sysroot_child = None;
    if !has_sysroot {
        let target = match target.as_deref() {
            Some(target) if !is_path_like(target) => target,
            Some(_) | None => OsStr::new(guess_host_for_target_triple()),
        };

        // Some sysroots are included by default (the host one for example),
        // check if it exists first before invoking `sysroot_exe`.
        let target_lib_dir = dir.join("lib").join("rustlib").join(target).join("lib");
        if !target_lib_dir.is_dir() {
            let sysroot_exe = dir.join("lib").join(target).with_extension(EXE_EXTENSION);

            let mut child = Command::new(sysroot_exe)
                .stdin(Stdio::piped())
                .stdout(Stdio::piped())
                .stderr(Stdio::inherit())
                .spawn()
                .map_err(|spawn_error| {
                    if spawn_error.kind() == ErrorKind::NotFound {
                        let msg = format!("no sysroot found for target {target:?}");
                        io::Error::new(ErrorKind::NotFound, msg)
                    } else {
                        spawn_error
                    }
                })?;

            let mut buf_read = BufReader::new(child.stdout.as_mut().unwrap());
            let mut line = String::new();
            buf_read.read_line(&mut line)?;
            if line.is_empty() {
                child.wait()?;
                process::exit(1);
            }

            let sysroot_value = line.trim();
            cmd.arg("--sysroot");
            cmd.arg(sysroot_value);

            sysroot_child = Some(child);
        }
    }

    let exit_status = cmd.spawn()?.wait()?;
    if let Some(mut sysroot_child) = sysroot_child {
        drop(sysroot_child.stdin.take()); // close it
    }

    process::exit(exit_status.code().unwrap_or(1));
}

fn print_sysroot_and_wait(dir: &Path) -> io::Result<()> {
    #[cfg(unix)]
    let dir = std::os::unix::ffi::OsStrExt::as_bytes(dir.as_os_str());
    #[cfg(not(unix))]
    let dir = dir.to_str().expect("non-utf8 path :(").as_bytes();

    let mut stdout = io::stdout().lock();
    stdout.write_all(dir)?;
    stdout.write_all(b"\n")?;
    stdout.flush()?;

    // Block until someone closes our stdin.
    let mut stdin = io::stdin().lock();
    let mut buf = [0u8; 1024];
    while stdin.read(&mut buf)? > 0 {}

    Ok(())
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.