GithubHelp home page GithubHelp logo

oxen-ai / oxen-release Goto Github PK

View Code? Open in Web Editor NEW
841.0 11.0 12.0 126.21 MB

Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.

Home Page: https://oxen.ai

License: Apache License 2.0

Python 71.53% Rust 28.47%
artificial-intelligence data-science machine-learning python rust version-control

oxen-release's Introduction

Oxen.ai Logo Oxen.ai Logo

๐Ÿ‚ What is Oxen?

Oxen is a lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.

The interface mirrors git, but shines in many areas that git or git-lfs fall short. Oxen is built from the ground up for data, and is optimized to handle large datasets, and large files.

oxen init
oxen add images/
oxen add annotations/*.parquet
oxen commit "Adding 200k images and their corresponding annotations"
oxen push origin main

Oxen is comprised of a command line interface, as well as bindings for Rust ๐Ÿฆ€, Python ๐Ÿ, and HTTP interfaces ๐ŸŒŽ to make it easy to integrate into your workflow.

๐ŸŒพ What kind of data?

Oxen is designed to efficiently manage large datasets, including those with large individual files, for example CSV files with millions of rows. It also handles datasets comprising millions of individual files and directories such as the complete collection of ImageNet images.

๐Ÿš€ Built for speed

One of the main reasons datasets are hard to maintain is the pure performance of indexing the data and transferring the data over the network. We wanted to be able to index hundreds of thousands of images, videos, audio files, and text files in seconds.

Watch below as we version hundreds of thousands of images in seconds ๐Ÿ”ฅ

oxen cli demo

But speed is only the beginning.

โœ… Features

Oxen is built around ergonomics, ease of use, and it is easy to learn. If you know how to use git, you know how to use Oxen.

  • ๐Ÿ”ฅ Fast (efficient indexing and syncing of data)
  • ๐Ÿง  Easy to learn (same commands as git)
  • ๐Ÿ’ช Handles large files (images, videos, audio, text, parquet, arrow, json, models, etc)
  • ๐Ÿ—„๏ธ Index lots of files (millions of images? no problem)
  • ๐Ÿ“Š Native DataFrame processing (index, compare and serve up DataFrames)
  • ๐Ÿ“ˆ Tracks changes over time (never worry about losing the state of your data)
  • ๐Ÿค Collaborate with your team (sync to an oxen-server)
  • ๐ŸŒŽ Remote Workspaces to interact with the data without downloading it
  • ๐Ÿ‘€ Better data visualization on OxenHub

๐Ÿฎ Learn The Basics

To learn what everything Oxen can do, the full documentation can be found at https://docs.oxen.ai.

๐Ÿง‘โ€๐Ÿ’ป Getting Started

You can install through homebrew or pip or from our releases page.

๐Ÿ‚ Install Command Line Tool

brew tap Oxen-AI/oxen
brew install oxen

๐Ÿ Install Python Library

pip install oxenai

โฌ‡๏ธ Clone Dataset

Clone your first Oxen repository from the OxenHub.

oxen clone https://hub.oxen.ai/ox/CatDogBBox

โญ๏ธ Every GitHub star gives an ox its wings

No really.

We hooked up the GitHub webhook for stars to an OxenHub Repository. Learn how we did it and go find your own in our ox/FlyingOxen repository.

oxen repo with wings

๐Ÿค Support

If you have any questions, comments, suggestions, or just want to get in contact with the team, feel free to email us at [email protected]

๐Ÿ‘ฅ Contributing

This repository contains the Python library that wraps the core Rust codebase. We would love help extending out the python interfaces, the documentation, or the core rust library.

Code bases to contribute to:

If you are building anything with Oxen.ai or have any questions we would love to hear from you in our discord.

Build ๐Ÿ”จ

Set up virtual environment:

# Set up your python virtual environment
$ python -m venv ~/.venv_oxen # could be python3 
$ source ~/.venv_oxen/bin/activate
$ pip install maturin
# Install rust
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Run maturin
$ maturin develop

Why build Oxen?

Oxen was build by a team of machine learning engineers, who have spent countless hours in their careers managing datasets. We have used many different tools, but none of them were as easy to use and as ergonomic as we would like.

If you have ever tried git lfs to version large datasets and became frustrated, we feel your pain. Solutions like git-lfs are too slow when it comes to the scale of data we need for machine learning.

If you have ever uploaded a large dataset of images, audio, video, or text to a cloud storage bucket with the name:

s3://data/images_july_2022_final_2_no_really_final.tar.gz

We built Oxen to be the tool we wish we had.

Why the name Oxen?

"Oxen" ๐Ÿ‚ comes from the fact that the tooling will plow, maintain, and version your data like a good farmer tends to their fields ๐ŸŒพ. Let Oxen take care of the grunt work of your infrastructure so you can focus on the higher-level ML problems that matter to your product.

oxen-release's People

Contributors

albertvillanova avatar benartuso avatar eric-laurence avatar gschoeni avatar jcelliott avatar lilianxr avatar mathi0750 avatar welpo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oxen-release's Issues

Bad CPU type in executable

I just tried to install it on my macOS using homebrew and when I try to use, i get this error message.

$ oxen
-bash: /usr/local/bin/oxen: Bad CPU type in executable

My mac is an Intel Core i7 from 2018
Im running Version 13.0.1 (22A400)

Let me know if you need any other information

Not clear how to unstage a directory

Reproduction steps

Add a directory to oxen

$ mkdir images
$ oxen add images
$ oxen status
On branch main -> d4927984fd596e95

Directories to be committed
  added: images

Remove the directory. Oxen still thinks something is modified.

$ rm -r images
$ oxen status
On branch main -> d4927984fd596e95

Directories to be committed
  added: images

I tried oxen reset <file> and that didn't work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.