Comments (5)
Let's reconsider a jina documentstore again. Possibly even a "JinaRetriever" later on top.
Some rough steps:
- Checkout Jina CRUD REST API endpoints: https://api.jina.ai/rest/#operation/search_api_search_post
- Implement a
JinaDocumentStore
with the basic methods:
- write_documents()
- get_document_by_id()
- query()
- query_by_embedding()
- get_all_documents()
- delete_all_documents()
(see also the BaseDocumentStore for expected signatures)
- Supply a simple snippet to start a jina test instance in a docker and test Haystack integration
Contributions very welcome :)
from haystack.
Hey @hanxiao ,
Sure, happy to explore some synergies.
One idea could be to use combine the QA functionality of haystack with the efficient backend implemented in Jina (incl. DB, pipelines, deployment ...).
Two options come into my mind:
A. Add Jina as an alternative to Elasticsearch in Haystack
- Implement a JinaDocumentStore in haystack (to index text documents / embeddings / ...)
- Implement a JinaRetriever to find candidate documents via Jinas encoders etc.
- Stick it together with haystack's
Reader
to get aFinder
B. Add Haystack to Jina as "Encoders"
This is less clear to me yet, as I haven't investigated Jina in detail yet. From our discussion, I understood that you would first need to extend the pipeline in Jina to allow an "extra step" after retrieval of our search results that basically executes our Reader
to extract the granular span answer. A second modification might be to support two encoders (one for question, one for documents). A rough sketch could be:
- Use Haystack model(s) as encoders in Jina (one for questions, one for docs)
- Retrieve search results "as usual" via Jina
- Add an extra container with one of Haystack's
Reader
that gets retrieved results and extracts span answer
What do you think?
from haystack.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs.
from haystack.
perfect, let me create a mirror ticket in our repo as well: jina-ai/jina#2128
from haystack.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.
from haystack.
Related Issues (20)
- Support more Embedders in Haystack 2.x
- Support more DocumentStores in Haystack 2.x HOT 2
- Dumping a Pipeline to yaml is broken for certain kwargs HOT 1
- Extend evaluation to LLMs
- Port selected 1.x components to 2.x HOT 4
- Support multimodal use cases HOT 1
- Publish benchmarks & sizing recommendations HOT 1
- Design REST API
- Address user feedback on 2.0 beta
- Agents / callbacks in 2.x
- `ComponentMeta.__call__` ignores keyword only parameters
- docs: create guide/map to choose Generators HOT 2
- feat: Add `page_number` to meta of Documents in `DocumentSplitter` HOT 9
- Update the `DefaultConverter` of `PyPDFToDocument` to keep page breaks
- feat: Add split by `page` to `DocumentSplitter` HOT 3
- docs: Azure embedders
- docs: Azure generators
- Control Behavior of Conversational Agent HOT 2
- docs: NamedEntityExtractor
- Document Store docs pages revamp
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from haystack.