GithubHelp home page GithubHelp logo

nux-ai / vectors Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 6 KB

Toolkit designed for developers to evaluate, select, and deploy embedding models. It streamlines the lifecycle from model evaluation to data embedding and querying.

Home Page: http://nux.ai

Python 100.00%
atlas-search embedding-models vector-search

vectors's Introduction

Embedding Model Evaluation & Integration Toolkit

Welcome to the Embedding Model Evaluation & Integration Toolkit, an open-source project designed to streamline the end-to-end lifecycle of embedding model evaluation, index creation, and querying embeddings.

Our mission is to provide a robust and simple-to-use interface for developers to leverage the power of embedding models across various applications, from natural language processing to vector search databases.

๐Ÿš€ Purpose

The toolkit aims to empower developers by simplifying the process of:

  • Evaluating different embedding models to find the best fit for your specific dataset and query patterns.
  • Creating efficient indices for fast retrieval.
  • Querying embeddings to unlock insights and patterns within your data.

๐Ÿ“˜ Lifecycle Walkthrough

1. Evaluate and Select a Model

Jumpstart your project by evaluating potential models against your data and criteria.

# /utilities/evaluate.py
evaluate_instance = Evaluate(
    model="all-MiniLM-L6-v2", 
    testing_set=[{'text': 'Sample text 1'}, {'text': 'Sample text 2'}]
)

evaluate_instance.evaluate(
    query="Example query", 
    acceptance_criteria={['Sample text 2', 'Sample text 1'],
    order="specific"
)

2. Mount the Selected Model(s) via HTTP

Easily integrate models into your workflow with HTTP endpoints.

Mount the model:

curl -X POST http://localhost:5000/mount_model \
     -H "Content-Type: application/json" \
     -d '{"model_name": "all-MiniLM-L6-v2"}'

Retrieve an embedding:

curl -X POST http://localhost:5000/get_embedding \
     -H "Content-Type: application/json" \
     -d '{"text": "Example text for embedding."}'

3. Create Your Index

Optimize data retrieval with custom indices tailored to your model's embeddings.

# /cloud/mongodb.py
atlas = Atlas(field_names_and_dims, "index_keyword_map_test")
atlas.create_index()

4. Load Data Using Selected Model

Embed and store your data efficiently using the model of your choice.

# /utilities/load.py
data_loader = DataLoader("your_db_name", "your_collection_name")
data_loader.load(data_mapping)

5. Design Query and Evaluate Results

Unleash the full potential of your data with powerful querying capabilities.

client.collection.aggregate([
  {
    '$vectorSearch': {
      'index': 'default',
      'path': 'plot_embedding_384',
      'queryVector': 'lorem ipsum',
      'numCandidates': 150,
      'limit': 10
    }
  },
  {
    '$project':
    {
      'plot': 1, 
      'title' : 1
    }
  }
])

๐Ÿ—บ Library Roadmap

We're constantly looking to expand the toolkit's capabilities, with plans to include:

  • Each time an embedding model is changed:
    • Spark job to paralellize re-embedding
    • Migration of previous vectors to S3
  • Migration of vectors from other stores (ex Pinecone to Mongo)
  • Federated KNN querying capabilities.
  • Containerization of embedding models for ease of deployment.

๐ŸŒŸ Why Contribute?

Contributing to this toolkit not only helps improve a project at the forefront of embedding technology but also connects you with a community of like-minded developers. Whether you're looking to:

  • Enhance your understanding of embedding models and their applications.
  • Share your expertise and learn from others in the field.
  • Drive innovation in embedding model evaluation and integration.

We welcome contributions of all forms, from code improvements and feature additions to documentation and examples!

๐Ÿ›  How to Contribute

  1. Fork the repository: Start with a personal copy of the project.
  2. Pick an issue or propose a feature: Look for open issues or suggest new ideas.
  3. Submit a pull request: Implement your changes and submit a PR for review.

vectors's People

Contributors

esteininger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.