Comments (3)
Hi @harshakhmk, we support these formats. Any new dataset must be in either of those formats and then you just need to add it to datasets.json.
Btw, how are you downloading the datasets? Can you share links?
from vector-db-benchmark.
I have experimented with MNIST dataset https://www.kaggle.com/datasets/oddrationale/mnist-in-csv
from vector-db-benchmark.
Okay. You need to do some processing on the dataset and make it aligned with the formats I mentioned above. You can refer to my implementation for the OpenAI 1M embeddings dataset on HuggingFace.
Also, it doesn't make sense to run it directly on MNIST image features. It's not a valid dataset for running any kind of search benchmarks.
I'm closing this issue for now. Let us know if you have any other issues :)
from vector-db-benchmark.
Related Issues (20)
- Elastic client timeout should be configurable.
- Elastic vector limit should be 4096 instead of 2048 HOT 1
- Support pulling embedding from any Huggingface dataset
- Standardize format of search params in engine configs
- Is this support to test pgvector HOT 1
- Automate testing of PRs across different engines HOT 1
- Is it possible to use this tool offline? HOT 2
- Track memory usage with RSSAnon HOT 1
- Add support to wait for server to start in all the engines
- Standardize all `*-default` configs and add `*-debug` with parallel = 1 for easy debugging.
- Add recall metric HOT 1
- OpenSearch search run should handle rate-limiting / 429 HTTP errors HOT 1
- qdrant's bencnmark is reporting an extremely high latencies for on-disk index qith 140M vectors HOT 1
- Introduce Vespa in benchmarks
- Qdrant Internal Server Error on recreate timeout for > 40M vectors: Waiting for Consensus Operation Commit Failed HOT 1
- Low precision numbers reported for filtering dataset with Opensearch HOT 3
- [question] How to draw a picture like the one displayed on the README webpage HOT 1
- Poetry fails to install dependencies for vector-db-benchmark.
- Poetry fails to install dependencies for vector-db-benchmark.
- The maximum file size limit is incorrect when using the S3 protocol for uploads. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vector-db-benchmark.