GithubHelp home page GithubHelp logo

mamba-org / rattler Goto Github PK

View Code? Open in Web Editor NEW
228.0 11.0 43.0 36.15 MB

Rust crates to work with the Conda ecosystem.

License: BSD 3-Clause "New" or "Revised" License

Rust 88.35% Python 11.29% HTML 0.24% CSS 0.10% Batchfile 0.01% Shell 0.01%
conda rust

rattler's Introduction

banner

Rattler: Rust crates for fast handling of conda packages

License Build Status Project Chat Pixi Badge docs main python docs main

Rattler is a library that provides common functionality used within the conda ecosystem (what is conda & conda-forge?). The goal of the library is to enable programs and other libraries to easily interact with the conda ecosystem without being dependent on Python. Its primary use case is as a library that you can use to provide conda related workflows in your own tools.

Rattler is written in Rust and tries to provide a clean API to its functionalities (see: Components). With the primary goal in mind we aim to provide bindings to different languages to make it easy to integrate Rattler in non-rust projects.

Rattler is actively used by pixi, rattler-build, and the https://prefix.dev backend.

Showcase

This repository also contains a binary (use cargo run to try) that shows some of the capabilities of the library. This is an example of installing an environment containing cowpy and all its dependencies from scratch (including Python!):

Installing an environment

Give it a try!

Before you begin, make sure you have the following prerequisites:

  • A recent version of git
  • A recent version of pixi

Follow these steps to clone, compile, and run the rattler project:

# Clone the rattler repository along with its submodules:
git clone --recursive https://github.com/mamba-org/rattler.git
cd rattler

# Compile and execute rattler to create a JupyterLab instance:
pixi run rattler create jupyterlab

The above command will execute the rattler executable in release mode. It will download and install an environment into the .prefix folder that contains jupyterlab and all the dependencies required to run it (like python)

Run the following command to start jupyterlab:

# on windows
.\.prefix\Scripts\jupyter-lab.exe

# on linux or macOS
 ./.prefix/bin/jupyter-lab

Voila! You have a working installation of jupyterlab installed on your system! You can of course install any package you want this way. Try it!

Contributing ๐Ÿ˜

We would love to have you contribute! See the CONTRIBUTION.md for more info. For questions, requests or a casual chat, we are very active on our discord server. You can join our discord server via this link.

Components

Rattler consists of several crates that provide different functionalities.

  • rattler_conda_types: foundational types for all datastructures used withing the conda eco-system.
  • rattler_package_streaming: provides functionality to download, extract and create conda package archives.
  • rattler_repodata_gateway: downloads, reads and processes information about existing conda packages from an index.
  • rattler_shell: code to activate an existing environment and run programs in it.
  • rattler_solve: a backend agnostic library to solve the package satisfiability problem.
  • rattler_virtual_packages: a crate to detect system capabilities.
  • rattler_index: create local conda channels from local packages.
  • rattler: functionality to create complete environments from scratch using the crates above.
  • rattler-lock: a library to create and parse lockfiles for conda environments.
  • rattler-networking: common functionality for networking, like authentication, mirroring and more.
  • rattler-bin: an example of a package manager using all the crates above (see: showcase)

You can find these crates in the crates folder.

Additionally, we provide Python bindings for most of the functionalities provided by the above crates. A python package py-rattler is available on conda-forge and PyPI. Documatation for the python bindings can be found here.

What is conda & conda-forge?

The conda ecosystem provides cross-platform, binary packages that you can use with any programming language. conda is an open-source package management system and environment management system that can install and manage multiple versions of software packages and their dependencies. conda is written in Python. The aim of Rattler is to provide all functionality required to work with the conda ecosystem from Rust. Rattler is not a reimplementation of conda. conda is a package management tool. Rattler is a library to work with the conda ecosystem from different languages and applications. For example, it powers the backend of https://prefix.dev.

conda-forge is a community-driven effort to bring new and existing software into the conda ecosystem. It provides tens-of-thousands of up-to-date packages that are maintained by a community of contributors. For an overview of available packages see https://prefix.dev.

rattler's People

Contributors

0xbe7a avatar aochagavia avatar baszalmstra avatar beckermr avatar benjaminlowry avatar benmoss avatar clement-chaneching avatar dependabot[bot] avatar dholth avatar github-actions[bot] avatar hadim avatar hofer-julian avatar iamthebot avatar jaimergp avatar johnhany97 avatar johnwillliam avatar manulpatel avatar mariusvniekerk avatar nichmor avatar orhun avatar pavelzw avatar ruben-arts avatar schmelczer avatar tdejager avatar travishathaway avatar tusharsadhwani avatar vlad-ivanov-name avatar wackyator avatar wolfv avatar yeungonion avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rattler's Issues

jlap improvements

When executing JLAP one can notice the sync execution of the patching step. We should wrap the deserialization, patching and serialization steps in tokio::task::spawn_blocking.

We prototyped this yesterday and one of the challenges is the ownership & lifetime of some of the variables.

One workaround would be to use the https://docs.rs/async-scoped/latest/async_scoped/ crate

rattler_solve, NUL characters and robustness

We are currently using libsolv as solver backend in rattler_solve through FFI calls. When passing the information about the available packages to libsolv, parts of the data (such as a package's build string or license) need to be converted from a Rust string to a NUL-terminated string (using the CString Rust type). This assumes that the original string does not contain NUL characters, which is not always guaranteed.

In its current state, rattler is unable to solve an environment if the repodata.json for a particular channel includes strings with \u0000. Adding the following package to test-data/channels/dummy/linux-64/repodata.json, for instance, causes all tests related to that file to fail:

"baz-1.0-unix_py36h1af98f8_2\u0000.tar.bz2": {
  "build": "unix_py36h1af98f8_2\u0000",
  "build_number": 1,
  "depends": [
    "__unix"
  ],
  "license": "MIT",
  "license_family": "MIT",
  "md5": "bc13aa58e2092bcb0b97c561373d3905",
  "name": "bar",
  "sha256": "97ec377d2ad83dfef1194b7aa31b0c9076194e10d995a6e696c9d07dd782b14a",
  "size": 414494,
  "subdir": "linux-64",
  "timestamp": 1605110689658,
  "version": "1.2.3"
}

I see two alternatives to go on about this:

  1. Ignore the issue. Other C-based tools in the conda ecosystem would probably break as well if there are NUL escapes in repodata.json, and maybe we can just trust the maintainers of a channel to take that into consideration.
  2. Use a custom method to construct CStrings that are always valid, by replacing NUL characters with something else (similar to String::from_utf8_lossy). I will push a PR for this shortly, so you can see how it would look like.

Use `rstest` for extraction

Instead of iterating over all archives in the package extraction tests we can use the rstest crate to create individual test cases for all archives. This will give us better insight into when a test failed into which exact package caused the issue. It will likely also give us a little better overview of which test cases take a long time.

With using cases from rstest we do lose the ability to use a glob pattern to find all test cases but I think it would also be good to not test all package archives contained in this repository per se, but use more targeted tests with some problematic cases.

So instead of something like this (current implementation):

#[test]
fn test_extract_conda() {
    let temp_dir = Path::new(env!("CARGO_TARGET_TMPDIR"));
    println!("Target dir: {}", temp_dir.display());

    for file_path in
        find_all_archives().filter(|path| ArchiveType::try_from(path) == Some(ArchiveType::Conda))
    {
        // ..
    }
}

We do something like this:

#[rstest]
#[case("conda-22.11.1-py38haa244fe_1.conda")]
#[case("mamba-1.1.0-py39hb3d9227_2.conda")]
fn test_extract_conda(#[case] input: &str) {
    // ..
}

We can also use rstest_reuse to reuse some of the test cases.

Implement layered package caches

Conda and Mamba both support having multiple cache directories. Rattler currently only supports a single directory. The "layered" cache can have multiple readable directories and only one writable directory. Conda/Mamba tries to write a magical file to all these directories at startup to determine which one is writable.

Rattler should also facilitate "layered" package caches. We could introduce a trait for PackageCache that is implemented for both the current implementation as well as a layered version.

`jinja2 >2.10*` results in a crash

Unfortunately, there is at least one instance of a MatchSpec that currently crashes rattler:

jinja2 >2.10* in jupyter-server.

We could either parse this as 2.10* or we patch the repodata.

version: access element range / bump elements range

In rattler-build we have the pin_subpackage and pin_compatible functions. They work by specifying a max_pin and min_pin with x.x.x syntax (where each x stands for one version element).

If we have a package A with version 1.2.3.4 and we use

pin_subpackage(A, max_pin="x.x", min_pin="x.x")

we are expected to convert this to A >=1.2,<1.3

For this it would be ideal if we could extract sub-versions based on an index (e.g. version[..2] to get the first two elements, and then also version[..2].bump() to create the upper bound).

Implement the `add_pip_to_python` weirdness

conda / mamba have a flag to add pip as a python dependency. This isn't done in the repodata to prevent a circular dependency situation (I think) but the package manager does add it "on the fly".

This is useful because most people / developers expect pip to be installed alongside Python and it's also good if it's pulled in if another package that depends on Python is installed.

E.g. installing some noarch package like rich should also automatically install pip.

Use default_cache_dir in create

Currently a rattler user has to choose the cache dir but I think it would be convenient if we export the default value uses (dirs::cache_dir().join("rattler/cache")).

Double check that libabseil matchspec matches a package

I thought that we had to change the parsing of matchspecs but after double checking it seems to be correct. However, we had some resolve issues with libabseil and constraints of the following sort:

libabseil 20230125.0 cxx17* should match libabseil-20230125.0-cxx17_hb7217d7_0.

I'll try to add a test for this

libsolv-rs is choosing slightly older libgfortran5 (on macos-arm64)

When installing numpy on macOS ARM64, the new libsolv implementation picks libgfortran5-12.2.0-h0eea778_32.

Conda and mamba choose libgfortran5-12.3.0-ha3a6a3e_0.

I wonder if that has to do with libgfortran-5.0.0-12_3_0_hd922786_0 vs. libgfortran-5.0.0-12_2_0_hd922786_32. If we compare / sort by build number vs. build string we might pick the older one (because it has a much higer build number).

register environments globally?

Since environments link to files from a central package cache, we need a mechanism to clean that cache from time to time. For that, it would be great to register existing environments in a json file or similar somewhere. Maybe we should have this mechanism in rattler (if we expect the rattler default cache directory to be used often).

Remove dependency on `extendhash`

Some parts of rattler use the extendhash crate. However, the same functionality is also provided by the md-5 and sha2 crates which are also used. Since md-5 and sha2 seem more complete we should get rid of extendhash.

Use the `#[sorted]` macro on more conda json types

For reproducibility reasons we want to ensure that fields are serialized in a deterministic (alphabetical) order.

To achieve this, we have a macro in rattler_macros that ensures that all struct fields are sorted alphabetically.

We need to apply this to more structs, such as

  • IndexJson
  • AboutJson
  • possibly more

Channel in `RepoDataRecord` is a simple String

Should we either create a Channel type that we can use in RepoDataRecord and store it or a smart pointer to a shared / cached channel? Or should we type it as an URL?

It would be nice to be able to call a .url() and .name() method on the channel to get the short or expanded versions.

Invalidate `>=3.8<3.9`

What I meant to write was >=3.8,<3.9 but missing the , gives you really weird behavior as it only uses the >=3.8.

Conda invalidates this request, as should rattler.

Implement the `jlap` format

It would be nice to implement the jlap format for partial repodata updates (using a specific sequenece of JSON patch applications).

That makes interactive usage faster because only a subset of the entire data needs to be fetched over the network.

Implement parsing key value pairs of matchspec without quotes

Currently, when using key value pairs in matchspecs we require there to be quotes around the values. This shouldn't actually be necessary, it would be nice if we could refactor the matchspec parsing a little to make sure this is not required.

So our parser currently only accepts:

foo[build="py2*"]

but it should also support

foo[build=py2*]

This can be a little tricky in the case of version specs because in this case its ambiguous if the comma indicates that a next key should follow or if its part of the version spec. We should check how conda handles this.

foo[version=1.3,2.0]

Handle "file://" URLs in package cache

The default reqwest client doesn't seem to handle file URLs. I think we could fix that in the package cache implementation (or should we make a client wrapper?)

Package cache locking

Multiple processes are able to write to the package cache at the same time. At this point, they are completely unaware of each other. This could cause problems when multiple processes try to write to the same cache.

To work around this issue we want to introduce file locking. Conda and Mamba both already have a system in place to facilitate this. We should mimic their behavior for compatibility to ensure that when both Rattler and Conda/Mamba write to the cache there are no issues.

@wolfv about the mamba implementation:

The different operating systems (UNIX and Windows) support methods to lock files (e.g. on unix it's fcntl whatever). We started out with the idea of writing the PID of the locking process into the file but that was pretty brittle. We can just rely on the OS to make sure a file is locked or unlocked.

There are a few crates for this but I don't know how good they are.

Network drives don't support file locking so we would need a different solution there.

RunExportsJson - parse into MatchSpec?

I wonder if we should parse the contents of RunExportsJson as Vec<MatchSpec>?

The only issue I have is that we can't guarantee roundtrip parsing (e.g. I think the output of matchspec to str is sometimes different, but equivalent, from what it parses).

Support conda-lock v2 file format

The file format for conda-lock seems to have changed slightly. Instead of having all locked packages under packages its now first grouped by platform.

Make PythonFormatter pub?

We also need to dump a JSON struct in the rattler-build part, and ideally it looks the same as the Python JSON dumps. Could be useful to reuse the PythonFormatter there, too.

Test extraction using an expected hash

Currently, the tests for package extraction just check if the process doesn't error out. This doesn't really test if a package is properly extracted.

The tests for validating packages do use the same code to extract a package and validate its content but it would be nice if we could compute a sha256 hash of an extracted package and validate that that is indeed what we would expect.

We can use the rstest crate to create test cases with the package to extract and an expected hash as an input. e.g.:

#[rstest]
#[case("some_package.conda", "somecomplexshahashnumber")]
fn validate_extracted_packages(#[case] path: &str, #[case] hash: &str) {
 // ...
}

When developing this feature some care is required to ensure that symlinks are properly hashed. I think it would be better to hash the link itself instead of the content it points to. This is because ../a and ./../a refer to the same file but are different symlinks.

Test: Check invalidation of matchspecs that are formatted incorrect

We're looking to bolster the robustness of our Rattler's matchspec implementation, particularly in the area of input parsing. While we currently have some testing mechanisms in place, we'd like to ensure we are fully covered, especially with respect to certain error scenarios.

Two primary errors that need to be checked for are:
StringMatcherParseError
ParseMatchSpecError

We recommend taking a look at Conda's tests for some inspiration:
https://github.com/conda/conda/blob/9e8425844a28ffad0c4a3adcf28a2e769f965947/tests/models/test_match_spec.py

This is a fantastic opportunity for anyone looking to make their first contribution, as this issue is primarily about adding more tests. We welcome incremental progress, so don't feel pressured to craft a massive Pull Request - a single test at a time is perfectly fine!

If you have any questions or need any assistance, feel free to ask here or join our conversation on discord. We're excited to see your contributions!

Parse `platform` into `Option<Platform>` instead of `Option<String>`

In the IndexJson struct we could parse platform (or rather the subdir field) into the Platform enum (with an escape hatch for Platform::Other(String) I would argue.

Similarly I would say that we can remove arch and subdir fields, and only reconstruct them for serialization?

Basically, the relationship is as follows: subdir is <platform> - <arch> with the special that for 64 arch is x86_64 and for 32 arch is x86.

force docs on public functions and structs.

All public methods in rattler crates should have proper docs. We can force this with #![warn(missing_docs)] or even #![deny(missing_docs)]. Some crates already have this but we need to play catchup for some of the others.

PathsJson, make sure that we serialize in alphabetical order

To improve the reproducibility of packages, we should serialize the paths in alphabetical order. This means we need to sort the Vec<PathsEntry> before serializing alphabetically on the _path attribute.

Similarly the has_prefix, files text files should always be sorted deterministically (alphabetically).

Bindings for Python

We want to also expose the awesomeness of this library to Python so it because easy to use from within the Python ecosystem as well.

  • We can start by binding small library parts to Python like version, matchspec, etc.
  • It would be nice to use Python async for the async parts (like downloading).
  • We can use pyo3 to generate the bindings.

Typed checksums

Currently we're storing SHA256 and MD5 hashes as strings. I am wondering if it would be better to store them as typed byte arrays and de-/serialize them at the serde level only?

`~=2.4,<4` is evaluated wrongly

The MatchSpec currently also matches 3.1 or other versions. In fact, the <4 should be completely ignored, and the matchspec should match only 2.4.*.

Remove `usize` from types in conda_package_types

I think usize is platform dependent and will be an u32 on 32 bit systems and a u64 on 64 bit systems.

I think that doesn't make much sense for structs that are parsed or written to from JSON since it'd be better to strictly encode the maximum size for these integers regardless of the host CPU architecture.

WDYT @baszalmstra ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.