I am trying to store LlamaIndex Documents in the qdrant database(docker). When I try s

Hey It can be caused by 2 reasons: you have

Could you maybe try it with batch_size=1 ? <p dir=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Error: Payload Limit Exceeded about qdrant-client HOT 16 CLOSED

arnavroh45 commented on August 16, 2024

Error: Payload Limit Exceeded

from qdrant-client.

Comments (16)

arnavroh45 commented on August 16, 2024 1

Can I get your discord id, so that I can share the data with you.

from qdrant-client.

joein commented on August 16, 2024

Hey

It can be caused by 2 reasons:

you have a very large payload
you don't do batching

the former can be solved either by reducing the size of a payload or increasing allowed json size limit
while the latter can be solved by introducing batching

also setting prefer_grpc=True while instantiating a qdrant client would probably solve this problem, but it would be just a way to cure symptoms rather than the problem itself

from qdrant-client.

arnavroh45 commented on August 16, 2024

is there a way to increase json payload limit in qdrant db somehow? Because I have tried batching as well, but still getting the same error.

from qdrant-client.

joein commented on August 16, 2024

It is not recommended to increase the limit, could you please show how you are trying to upload it?

Which methods do you use?
How do you do batching, what is the batch size?

from qdrant-client.

arnavroh45 commented on August 16, 2024

I loaded documents using llama_index document loaders, and then converted them to nodes.
The problem here I think is, the length/size of the metadata of the nodes(payload for qdrant vectors) is greater than the allocated limit. I implemented the same thing with the documents and it's working but not with the nodes.

sentence_node_parser = SentenceWindowNodeParser.from_defaults(window_size=3, window_metadata_key="window", original_text_metadata_key="original_text")
nodes = sentence_node_parser.get_nodes_from_documents(documents)
vector_store = QdrantVectorStore(client=client, collection_name="collection_name", batch_size=20) #here, specified the batch size
storage_context = StorageContext.from_defaults(vector_store=vector_store)
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-mpnet-base-v2')
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embeddings, chunk_size=512, chunk_overlap=50)
index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)

from qdrant-client.

joein commented on August 16, 2024

Could you maybe try it with batch_size=1?

If it succeeds, could you check what is actually stored in the payload? Maybe this code stores an original document and not only chunks?

from qdrant-client.

arnavroh45 commented on August 16, 2024

Tried that, it isn't working.

from qdrant-client.

arnavroh45 commented on August 16, 2024

Any breakthrough?

from qdrant-client.

joein commented on August 16, 2024

Hi @arnavroh45 , sorry for the delay, haven't had time to look deeper into it yet

@Anush008 maybe you could take a look at it please?

I am not that familiar with llama index, but the error usually occurs when there are either problems with batch size or with payload loading.

from qdrant-client.

Anush008 commented on August 16, 2024

Hi @arnavroh45.
I rechecked the batching implementation. Looks fine.
I'll try reproducing the issue.

from qdrant-client.

arnavroh45 commented on August 16, 2024

Okay

from qdrant-client.

Anush008 commented on August 16, 2024

Hey @arnavroh45. I tried reproducing this with SentenceWindowNodeParser as per your snippet.
The upload worked fine for me. I assume it has to do with the documents from your data in specific. Could you give some info about the data?

from qdrant-client.

arnavroh45 commented on August 16, 2024

I am performing web scraping using selenium and then storing the data in the following format. Then converted the documents into nodes using the provided code. Then adding the nodes to the vector db which gave the error.

Document Format:
Document(id_='dac4c09b-5e50-4c8e-a3de-01ac624ab8e4', embedding=None, metadata={'title': 'title', 'source': 'sourcet'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='dd623d76410ed65006c454c4fbf93511baa6bb1bef1936d5a352191b62a0c1bc', text='text', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')

Conversion from document to nodes:
sentence_node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text")
nodes = sentence_node_parser.get_nodes_from_documents(documents)

Adding into vector db:
client = QdrantClient(url="url")
vector_store = QdrantVectorStore(client=client, collection_name="collection_name", batch_size=10)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)

Error:
UnexpectedResponse: Unexpected Response: 400 (Bad Request)
Raw response content:
b'{"status":{"error":"Payload error: JSON payload (41900140 bytes) is larger than allowed (limit: 33554432 bytes)."},"time":0.0}'

from qdrant-client.

Anush008 commented on August 16, 2024

@arnavroh45, the document schema is very similar to what I had in my attempt at reproducing.

But this would be almost not possible to debug without the actual data which seems to be the cause, since the batch upload seems to work fine and you even tried setting it to 1 point.

from qdrant-client.

arnavroh45 commented on August 16, 2024

Can this be possible that the length of my metadata exceeds the size limit allowed for metadata?

from qdrant-client.

Anush008 commented on August 16, 2024

Anush008

from qdrant-client.

Error: Payload Limit Exceeded about qdrant-client HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs