GithubHelp home page GithubHelp logo

Comments (2)

julian-risch avatar julian-risch commented on June 8, 2024 1

@nickprock In Haystack 2.0, you can just set the id of the document. So you could have a custom component in an indexing pipeline just before the DocumentWriter or you could create a wrapper around one of the other components, such as a FileConverter.

A component could look like the following

from copy import deepcopy
from typing import List
from haystack import Document, component

@component
class DocumentIDCreator:
    # This is a pass-through component that returns the same Documents but with different ids
    @component.output_types(documents=List[Document])
    def run(self, documents: List[Document]):
        updated_documents = []
        for doc in documents:
            id = calculate_id(doc, id_hash_keys) # using your code above
            updated_documents.append(Document(id=id, content=doc.content, meta=deepcopy(doc.meta), embedding=doc.embedding))

        return {"documents": updated_documents}

Please let us know whether that can solve your problem so that we can close the issue. 🙂

from haystack.

nickprock avatar nickprock commented on June 8, 2024 1

Thanks @julian-risch

from haystack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.