Comments (2)
The unhelpful error is happening in adding field properties in the backend, because the mapping being added to "my_field_1"
is set to normal text like
'my_field_1': {'type': 'text'}}
and then it looks for non-existent key properties
when trying to set it to a dict.
Because of this, I am weighing 2 possible solutions here, and I would like input on which is better @pandu-k @tomhamer @wanliAlex @jn2clark :
Solution 1. Let the bad mapping (field 1 is both dict and text) get to OpenSearch, then pass down the error from there:
We can do this by using:
new_index_properties["my_field_1"]["properties"] = dict()
to bypass the key error caused by the non-existent new_index_properties["my_field_1"]["properties"]
Pros:
- Only documents affected by this bad mapping fail and return errors.
- Same as implementation as our other type mismatch errors, OS determines the error message.
- Returns a 400
Cons: - Error message is not intuitively helpful, but it might be fine. It says:
{'errors': True,
'index_name': 'my-multimodal-index',
'items': [{'_id': '1',
'error': {'reason': 'object mapping for [__chunks.my_field_1] tried to '
'parse field [image] as object, but found a '
'concrete value',
'type': 'mapper_parsing_exception'},
'status': 400},
{'_id': '2', 'result': 'created', 'status': 201}],
'processingTimeMs': 10380.51678500051}
Solution 2. Catch the problem in backend -> add_customer_field_properties
and return our own error
We can simply do this by making sure there is no overlap between normal and multimodal mappings:
for field in customer_field_names:
if field in multimodal_combination_fields:
raise InvalidArgError()
Pros:
- We can make a match more helpful error message:
Status code: 400
Your add_documents call contains documents with field `my_field_1` as both text and multimodal object. A field may only be 1 of these. For more info on using multimodal indexing, see: https://marqo.pages.dev/0.0.17/#creating-and-searching-indexes-with-multimodal-combination-fields
- We can ensure no OS complications by rejecting the mistake on our end
Cons: - The docs would never reach OpenSearch, add docs call will fail
from marqo.
Feedback.
Option 2:
Because Marqo is eventually consistent, we can't guarantee that the index info is up-to-date. For example, another user deletes the index and recreates it with different mappings.
We can still provide this nice error, but we need to also need to handle cases where we don't yet detect and an error, and have to propagate the Marqo-OS error (like option 1).
For Option 1, we can still translate the Marqo-os error into something more helpful (__chunks is an internal concept, that users may unfamiliar with). For example:
Field `image` is given as an object, but it is already defined as another type.
Note: ensure that the Marqo-os error parsing logic needs to be very resilient (catch all keyerrors and type errors while parsing), otherwise the parsing logic may result in an annoying-to-debug 5xx error.
from marqo.
Related Issues (20)
- [BUG] Can't start Marqo on arm64 HOT 5
- Squash text tensors into a single vector [ENHANCEMENT]
- [ENHANCEMENT] Add EVA-CLIP support HOT 1
- Give warning/error when out of disk[ENHANCEMENT]
- [ENHANCEMENT] filtered search on hnsw HOT 3
- [BUG] OOM during indexing HOT 2
- Multiprocessing: killed process during model download [BUG]
- Transferring state from a Windows machine to Linux [BUG] HOT 1
- [BUG] filters dont work on lexical searches HOT 4
- [ENHANCEMENT] A better error message when model loading failed HOT 1
- [ENHANCEMENT] Do a md-5 check for model downloading process HOT 2
- [ENHANCEMENT] Pre-warming models API Endpoint
- Aggregations [ENHANCEMENT] HOT 3
- [ENHANCEMENT] Automatically use 'best' available device when none (or an invalid one) is specified HOT 1
- Squash pip freeze on startup [ENHANCEMENT] HOT 1
- [BUG] Outdated reference for documentation in error message when indexing a malformed document
- How are query vectors combined for search? HOT 2
- [ENHANCEMENT] Pydantic Class for Add Docs with Device
- [ENHANCEMENT] Remove some unused functions from pytorch_utils
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from marqo.