Comments (16)
Can I get your discord id, so that I can share the data with you.
from qdrant-client.
Hey
It can be caused by 2 reasons:
- you have a very large payload
- you don't do batching
the former can be solved either by reducing the size of a payload or increasing allowed json size limit
while the latter can be solved by introducing batching
also setting prefer_grpc=True while instantiating a qdrant client would probably solve this problem, but it would be just a way to cure symptoms rather than the problem itself
from qdrant-client.
is there a way to increase json payload limit in qdrant db somehow? Because I have tried batching as well, but still getting the same error.
from qdrant-client.
It is not recommended to increase the limit, could you please show how you are trying to upload it?
Which methods do you use?
How do you do batching, what is the batch size?
from qdrant-client.
I loaded documents using llama_index document loaders, and then converted them to nodes.
The problem here I think is, the length/size of the metadata of the nodes(payload for qdrant vectors) is greater than the allocated limit. I implemented the same thing with the documents and it's working but not with the nodes.
sentence_node_parser = SentenceWindowNodeParser.from_defaults(window_size=3, window_metadata_key="window", original_text_metadata_key="original_text")
nodes = sentence_node_parser.get_nodes_from_documents(documents)
vector_store = QdrantVectorStore(client=client, collection_name="collection_name", batch_size=20) #here, specified the batch size
storage_context = StorageContext.from_defaults(vector_store=vector_store)
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-mpnet-base-v2')
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embeddings, chunk_size=512, chunk_overlap=50)
index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)
from qdrant-client.
Could you maybe try it with batch_size=1
?
If it succeeds, could you check what is actually stored in the payload? Maybe this code stores an original document and not only chunks?
from qdrant-client.
Tried that, it isn't working.
from qdrant-client.
Any breakthrough?
from qdrant-client.
Hi @arnavroh45 , sorry for the delay, haven't had time to look deeper into it yet
@Anush008 maybe you could take a look at it please?
I am not that familiar with llama index, but the error usually occurs when there are either problems with batch size or with payload loading.
from qdrant-client.
Hi @arnavroh45.
I rechecked the batching implementation. Looks fine.
I'll try reproducing the issue.
from qdrant-client.
Okay
from qdrant-client.
Hey @arnavroh45. I tried reproducing this with SentenceWindowNodeParser as per your snippet.
The upload worked fine for me. I assume it has to do with the documents from your data in specific. Could you give some info about the data?
from qdrant-client.
I am performing web scraping using selenium and then storing the data in the following format. Then converted the documents into nodes using the provided code. Then adding the nodes to the vector db which gave the error.
Document Format:
Document(id_='dac4c09b-5e50-4c8e-a3de-01ac624ab8e4', embedding=None, metadata={'title': 'title', 'source': 'sourcet'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='dd623d76410ed65006c454c4fbf93511baa6bb1bef1936d5a352191b62a0c1bc', text='text', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')
Conversion from document to nodes:
sentence_node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text")
nodes = sentence_node_parser.get_nodes_from_documents(documents)
Adding into vector db:
client = QdrantClient(url="url")
vector_store = QdrantVectorStore(client=client, collection_name="collection_name", batch_size=10)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)
Error:
UnexpectedResponse: Unexpected Response: 400 (Bad Request)
Raw response content:
b'{"status":{"error":"Payload error: JSON payload (41900140 bytes) is larger than allowed (limit: 33554432 bytes)."},"time":0.0}'
from qdrant-client.
@arnavroh45, the document schema is very similar to what I had in my attempt at reproducing.
But this would be almost not possible to debug without the actual data which seems to be the cause, since the batch upload seems to work fine and you even tried setting it to 1 point.
from qdrant-client.
Can this be possible that the length of my metadata exceeds the size limit allowed for metadata?
from qdrant-client.
Anush008
from qdrant-client.
Related Issues (20)
- [Backward compatibility] Allow extra fields for internal pydantic models HOT 1
- Create a dataframe dataset object HOT 6
- Fine-grained timeout for httpx HOT 2
- individual shard_key when adding points HOT 4
- Add information about requested URL to UnexpectedResponse exception to simplify troubleshooting
- how to get shard keys for collection when using custom sharding HOT 1
- upload_collection can't be launched with parallel and without explicit ids
- Regression: payloads cannot contain python builtin datetime objects in 1.7.1 HOT 2
- Unable to close grpc_channel. Connection was interrupted on the server side HOT 12
- grpc options are not parsed correctly when https is set
- PointStruct is very slow HOT 8
- update scoring in local mode in discovery api HOT 1
- Missing import statement in documentation (Get Started) HOT 2
- Local Qdrant db Error on loading: KeyError: '__pydantic_fields_set__' HOT 4
- query_text param not working for qdrant_client.search HOT 8
- Upgrade fastembed version from 0.1.1 to 0.2.1 (latest) HOT 3
- Deleting points by ID not working HOT 3
- Trigger nighly tests against latest qdrant dev build
- Tracking issue: local mode for Qdrant v1.8 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qdrant-client.