GithubHelp home page GithubHelp logo

kagisearch / vectordb Goto Github PK

View Code? Open in Web Editor NEW
565.0 565.0 25.0 1.18 MB

A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.

Home Page: https://vectordb.com

License: MIT License

Python 100.00%
ai artificial-intelligence llm llms machine-learning

vectordb's People

Contributors

awas666 avatar bkiat1123 avatar ofek avatar radare avatar tamamushi-iro avatar unixfreaxjp avatar vprelovac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

vectordb's Issues

Running on m1 Mac

I ran into issues when trying to install vectordb on an m1 Mac. Here is my solution in case future humans run into something similar.

# deps required version 3.9
conda create -n myenv python=3.9
conda activate myenv

# tensorflow_text is not officially built for m1 Macs yet: https://github.com/tensorflow/text/issues/89
pip install https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon/releases/download/v2.12/tensorflow_text-2.12.0-cp39-cp39-macosx_11_0_arm64.whl
pip install vectordb2

Vector distance

Is there a way that the query could return the distance of the vectors from the search, maybe even a minimum threshhold?

Memory file broken, no metadata

Ever since the change that split the metadata from the embeddings, it can't be loaded from disk anymore, as only the embeddings get saved an not the metadata.

This makes everything work just fine when using it from Ram, but completely breaks as soon as you try to load from storage.

Not sure what is a clean solution here, as saving two files would also be a hassle...

We could make a single dictionary outa the two of them when saving and splitting again once loaded?

tensorflow_text. not available

(venv) 0$ pip install .
Processing /Users/pancake/prg/vectordb
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: torch>=1.9.0 in /Users/pancake/prg/r2ai/venv/lib/python3.12/site-packages (from vectordb2==0.1.9) (2.2.2)
Requirement already satisfied: transformers>=4.10.0 in /Users/pancake/prg/r2ai/venv/lib/python3.12/site-packages (from vectordb2==0.1.9) (4.39.3)
Requirement already satisfied: numpy>=1.21.0 in /Users/pancake/prg/r2ai/venv/lib/python3.12/site-packages (from vectordb2==0.1.9) (1.26.4)
Requirement already satisfied: scikit-learn>=0.24.0 in /Users/pancake/prg/r2ai/venv/lib/python3.12/site-packages (from vectordb2==0.1.9) (1.4.2)
Requirement already satisfied: scipy>=1.7.0 in /Users/pancake/prg/r2ai/venv/lib/python3.12/site-packages (from vectordb2==0.1.9) (1.13.0)
Requirement already satisfied: sentence-transformers in /Users/pancake/prg/r2ai/venv/lib/python3.12/site-packages (from vectordb2==0.1.9) (2.6.1)
Requirement already satisfied: faiss-cpu in /Users/pancake/prg/r2ai/venv/lib/python3.12/site-packages (from vectordb2==0.1.9) (1.8.0)
INFO: pip is looking at multiple versions of vectordb2 to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement tensorflow-text (from vectordb2) (from versions: none)
ERROR: No matching distribution found for tensorflow-text
(venv) 1$

TypeError

While executing:
memory = Memory(chunking_strategy={"mode": "sliding_window", "window_size": 128, "overlap": 16}, embeddings='TaylorAI/bge-micro-v2')

Got :
TypeError: Memory.init() got an unexpected keyword argument 'embeddings'

ValueError: not enough values to unpack (expected 2, got 1)

i'm using vectordb in to index data documentation from different sources, but sometimes i get those backtraces. x.shape only contains one element (0,) this issue happens only when there 's nothing saved (or the data saved is too small)

146$ ./venv/bin/python
Python 3.11.6 (main, Oct  2 2023, 13:45:54) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import vectordb
Warning: mprt could not be imported. Install with 'pip install git+https://github.com/vioshyvo/mrpt/'. Falling back to Faiss.
>>> a=vectordb.Memory()
>>> a.search("riscv", top_n=5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/vectordb/memory.py", line 68, in search
    indices = self.vector_search.search_vectors(query_embedding, embeddings, top_n)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/vectordb/vector_search.py", line 60, in search_vectors
    indices = call_search(query_embedding, embeddings, top_n)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/vectordb/vector_search.py", line 26, in run_faiss
    index.add(vectors)
  File "/Users/pancake/prg/r2ai/venv/lib/python3.11/site-packages/faiss/class_wrappers.py", line 226, in replacement_add
    n, d = x.shape
    ^^^^
ValueError: not enough values to unpack (expected 2, got 1)
>>>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.