talkdai / dialog Goto Github PK

View Code? Open in Web Editor NEW

344.0 8.0 43.0 1.84 MB

RAG LLM Ops App for easy deployment and testing

Home Page: https://dialog.talkd.ai

License: MIT License

Dockerfile 2.38% Makefile 0.83% Python 93.18% Shell 2.60% Mako 1.01%

chatgpt langchain llm nlp nltk

dialog's Introduction

talkd/dialog

For programmers, who are interested in AI and are deploying RAGs without knowledge on API development, Dialog is an App to simplify RAG deployments, using the most modern frameworks for web and LLM interaction, letting you spend less time coding and more time training your model.

This repository serves as an API focused on letting you deploy any LLM you want, based on the structure provided by dialog-lib.

We started focusing on humanizing RAGs (making the answer scope very delimited and human-like sounding), but we are expanding for broader approaches to improving RAG deployment and maintenance for everyone. Check out our current architecture below and, for more information, check our documentation!

Running the project for the first time

We assume you are familiar with Docker, if you are not, this amazing video tutorial will help you get started. If you want a more detailed getting started, follow the Quick Start session from our docs for setup.

To run the project for the first time, you need to have Docker and Docker Compose installed on your machine. If you don't have it, follow the instructions on the Docker website.

After installing Docker and Docker Compose, clone the repository and run the following command:

cp .env.sample .env

Inside the .env file, set the OPENAI_API_KEY variable with your OpenAI API key.

Then, run the following command:

docker-compose up

it will start two services:

db: where the PostgresSQL database runs to support chat history and document retrieval for RAG;
dialog: the service with the API.

Tutorials

We've written some tutorials to help you get started with the project:

Also, you can check our documentation for more information.

Our Sponsors

We are thankful for all the support we receive from our sponsors, who help us keep the project running and improving. If you want to become a sponsor, check out our Sponsors Page.

Current Sponsors:

Github Accelerator	Buser

Using Open-WebUI as front-end

In partnership with Open-WebUI, we made their chat interface our own as well, if you want to use it on your own application, change the docker-compose file to use the docker-compose-open-webui.yml file:

docker-compose -f docker-compose-open-webui.yml up

Maintainers

We are thankful for all of the contributions we receive, mostly reviewed by this awesome maintainers team we have:

made with 💜 by talkd.ai

dialog's People

Contributors

Stargazers

Watchers

dialog's Issues

not externalise that it's bot

Bot must respond like a human, it's fine not to know, but not to externalise that it's a model (AI)

{"message":"Desculpe, mas como sou um modelo de linguagem de IA, não tenho acesso a informações específicas como números de telefone. Recomendo que você verifique o site da empresa ou entre em contato com o serviço de atendimento ao cliente para obter o número correto do WhatsApp."}

"relationship" of the answer to the question

Have identification field (parent) about which question is the answer

the parent field is populated with the LLM answer

whatsapp support

WhatsApp documentation

EPIC: #37

{
  "object": "whatsapp_business_account",
  "entry": [{
      "id": "WHATSAPP_BUSINESS_ACCOUNT_ID",
      "changes": [{
          "value": {
              "messaging_product": "whatsapp",
              "metadata": {
                  "display_phone_number": "PHONE_NUMBER",
                  "phone_number_id": "PHONE_NUMBER_ID"
              },
              # specific Webhooks payload            
          },
          "field": "messages"
        }]
    }]
}

Webhooks payloads can be up to 3MB.

more exemples

Extensible LLM template

I'm working currently on a way that we can extend and deploy any LLM on Dialog API without having to reimplement the code and touch the existing code base.

The approach I'm aiming is basically an Abstract class with some default methods. Example:

from typing import List, Dict

from langchain.chains.llm import LLMChain
from langchain.memory.chat_memory import BaseChatMemory
from langchain.prompts import ChatPromptTemplate

class AbstractLLM:
    def __init__(self, config):
        """
        :param config: Configuration dictionary

        The constructor of the AbstractLLM class allows users to pass
        a configuration dictionary to the LLM. This configuration dictionary
        can be used to configure the LLM temperature, prompt and other
        necessities.
        """
        if config is None or not isinstance(config, dict):
            raise ValueError("Config must be a dictionary")

        self.config = config
        self.prompt = None

    def get_prompt(self, input) -> ChatPromptTemplate:
        """
        Function that generates the prompt for the LLM.
        """
        raise NotImplementedError("Prompt must be implemented")

    @property
    def memory(self) -> BaseChatMemory:
        """
        Returns the memory instance
        """
        raise NotImplementedError("Memory must be implemented")

    @property
    def llm(self) -> LLMChain:
        """
        Returns the LLM instance. If a memory instance is provided,
        the LLM instance should be initialized with the memory instance.

        If no memory instance is provided, the LLM instance should be
        initialized without a memory instance.
        """
        raise NotImplementedError("LLM must be implemented")

    def preprocess(self, input: str) -> str:
        """
        Function that pre-process the LLM input, enabling users
        to modify the input before it is processed by the LLM.

        This can be used to add context or prefixes to the LLM.
        """
        return input

    def generate_prompt(self, input: str) -> str:
        """
        Function that generates the prompt using PromptTemplate for the LLM.
        """
        return input

    def postprocess(self, input: str) -> str:
        """
        Function that post-process the LLM output, enabling users
        to modify the output before it is returned to the user.
        """
        return input

    def process(self, input: str):
        """
        Function that encapsulates the pre-processing, processing and post-processing
        of the LLM.
        """
        processed_input = self.preprocess(input)
        output = self.llm({
            "user_message": processed_input,
        })
        processed_output = self.postprocess(output)
        return processed_output

And our LLM as an example (it's not yet tested):

from llm.llm import AbstractLLM
from llm.memory import generate_memory_instance

from typing import List

from langchain.chains.llm import LLMChain
from langchain.memory.chat_memory import BaseChatMemory

from typing import List

from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.prompts import (ChatPromptTemplate, HumanMessagePromptTemplate,
                               MessagesPlaceholder,
                               SystemMessagePromptTemplate)
from sqlalchemy import select

from learn.idf import categorize_conversation_history
from llm.memory import generate_memory_instance
from models import CompanyContent
from models.db import session
from settings import PROMPT, EMBEDDINGS_LLM, FALLBACK_PROMPT


def generate_embeddings(documents: List[str]):
    """
    Generate embeddings for a list of documents
    """
    return EMBEDDINGS_LLM.embed_documents(documents)

def generate_embedding(document: str):
    """
    Generate embeddings for a single instance of document
    """
    return EMBEDDINGS_LLM.embed_query(document)

def get_most_relevant_contents_from_message(message, top=5):
    message_embedding = generate_embedding(message)
    possible_contents = session.scalars(
        select(CompanyContent)
        .filter(CompanyContent.embedding.l2_distance(message_embedding) < 1)
        .order_by(CompanyContent.embedding.l2_distance(message_embedding).asc())
        .limit(top)
    ).all()
    return possible_contents


class DialogLLM(AbstractLLM):
    @property
    def memory(self) -> BaseChatMemory:
        if self.config.get("session_id"):
            return generate_memory_instance(
                session_id=self.config.get("session_id"),
                parent_session_id=self.config.get("parent_session_id")
            )
        return None

    def preprocess(self, input: str) -> str:
        """
        Function that pre-process the LLM input, enabling users
        to modify the input before it is processed by the LLM.

        This can be used to add context or prefixes to the LLM.
        """
        return input


    def generate_prompt(self):
        relevant_contents = get_most_relevant_contents_from_message(input, top=1)

        if len(relevant_contents) == 0:
            prompt_templating = [
                SystemMessagePromptTemplate.from_template(FALLBACK_PROMPT),
                HumanMessagePromptTemplate.from_template("{user_message}"),
            ]
            relevant_contents = []
        else:
            suggested_content = "Contexto: \n".join(
                [f"{c.question}\n{c.content}\n" for c in relevant_contents]
            )

            prompt_templating = [
                SystemMessagePromptTemplate.from_template(PROMPT.get("header")),
                MessagesPlaceholder(variable_name="chat_history"),
            ]

        if len(relevant_contents) > 0:
            prompt_templating.append(
                SystemMessagePromptTemplate.from_template(
                    f"{PROMPT.get('suggested')}. {suggested_content}"
                )
            )

        question_text = PROMPT.get("question_signalizer")
        prompt_templating.append(HumanMessagePromptTemplate.from_template(f"{question_text}" + ":\n{user_message}"))

        self.prompt = ChatPromptTemplate(messages=prompt_templating)

    @property
    def llm_chain(self) -> LLMChain:
        llm_config = {
            k: v
            for k, v in self.config.items()
            if k in ["openai_api_key", "model_name", "temperature"]
        }
        conversation_options ={
            "llm": ChatOpenAI(
                **llm_config
            ),
            "prompt": self.prompt,
            "verbose": self.config.get("verbose", False)
        }

        if self.memory:
            buffer_config = {
                "chat_memory": self.memory,
                "memory_key": "chat_history",
                "return_messages": True,
                "k": self.config.get("memory_size", 5)
            }
            conversation_options["memory"] = ConversationBufferWindowMemory(
                **buffer_config
            )

        return LLMChain(**conversation_options)

    def postprocess(self, input: str) -> str:
        """
        Function that post-process the LLM output, enabling users
        to modify the output before it is returned to the user.
        """
        return input

    def process(self, input: str):
        """
        Function that encapsulates the pre-processing, processing and post-processing
        of the LLM.
        """
        processed_input = self.preprocess(input)
        self.generate_prompt(processed_input)
        output = self.llm_chain({
            "user_message": processed_input,
        })
        processed_output = self.postprocess(output)
        return processed_output

temperature in toml

The temperature is defined in hard code 0.2, we need to take it to the configuration (in TOML), making the project customizable

dialog/src/llm/__init__.py

Line 23 in 012c73c

temperature=0.2,

[prompt]
temperature = 0.2

if not set, it should be temperature 0.2 by default

Add response time into logs

Large prompts or extensive documents (knowledge base) can significantly lengthen the LLM's response time for certain tasks, and it would be helpful to observe this response time in the terminal while running uvicorn. It will be common for a developer to try different prompts to check the rensponse time.

Add content recommendation endpoints based on L2 Distance of the existing embeddings

Maybe we could add an endpoint that enables users to fetch similar content based on their inquiry. This could possibly add more options for answers on the front-end.

AWarning: Did not recognize type 'vector' of column 'embedding'

/Users/avelino/projects/avelino/talkd/dialog/src/dialog/models/__init__.py:34: SAWarning: Did not recognize type 'vector' of column 'embedding'
  __table__ = Table("contents", metadata, autoload_with=engine)
Traceback (most recent call last):
  File "/Users/avelino/projects/avelino/talkd/dialog/.venv/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3790, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "index.pyx", line 152, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 181, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'primary_key'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/avelino/projects/avelino/talkd/dialog/src/load_csv.py", line 65, in <module>
    load_csv_and_generate_embeddings(args.path)
  File "/Users/avelino/projects/avelino/talkd/dialog/src/load_csv.py", line 44, in load_csv_and_generate_embeddings
    new_keys = set(df["primary_key"]) - set(df_in_db["primary_key"])
                                            ~~~~~~~~^^^^^^^^^^^^^^^
  File "/Users/avelino/projects/avelino/talkd/dialog/.venv/lib/python3.11/site-packages/pandas/core/frame.py", line 3893, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/avelino/projects/avelino/talkd/dialog/.venv/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3797, in get_loc
    raise KeyError(key) from err
KeyError: 'primary_key'

when fallback config does not exist in toml

related: #36

fallback prompt is not defined in toml it doesn't find the index

Add docs on alembic migration

Add docs on how to run migrations manually.

plugin: error initial load

We're having a problem uploading the dialog (api), it seems to me that our "documentation" (README.md, toml configuration topic) is out of date.

prompt:

[prompt]
header = """Você é um operador de atendimento chamada Lerolero"""

suggested = "Aqui está um possível conteúdo que pode ajudar o usuário de uma melhor forma."

backtrace:

  File "/usr/local/Cellar/[email protected]/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.p
y", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/avelino/projects/avelino/talkd/dialog/src/main.py", line 6, in <module>
    from dialog.llm import get_llm_class
  File "/Users/avelino/projects/avelino/talkd/dialog/src/dialog/llm/__init__.py", line 6, in <module>
    from .default import DialogLLM
  File "/Users/avelino/projects/avelino/talkd/dialog/src/dialog/llm/default.py", line 6, in <module>
    from dialog.llm.memory import generate_memory_instance
  File "/Users/avelino/projects/avelino/talkd/dialog/src/dialog/llm/memory.py", line 4, in <module>
    from dialog.models import Chat, ChatMessages
  File "/Users/avelino/projects/avelino/talkd/dialog/src/dialog/models/__init__.py", line 3, in <module>
    from .db import engine, Base
  File "/Users/avelino/projects/avelino/talkd/dialog/src/dialog/models/db.py", line 1, in <module>
    from dialog.settings import DATABASE_URL
  File "/Users/avelino/projects/avelino/talkd/dialog/src/dialog/settings.py", line 16, in <module>
    PLUGINS = config("PLUGINS", cast=Csv(), default=[])
...
...
...
AttributeError: 'list' object has no attribute 'read'

add timestamp on messages table

Nowadays we are saving messages without their datetime. We should figure it out on how to add this field to langchain's table.

Add tests on the platform

Adds test coverage in the platform

fastapi config recommendation

https://fastapi.tiangolo.com/advanced/settings/#settings-in-another-module

it is important to look at how fastapi recommends structuring the configuration

docker image distribution

distribute docker image via github packages ghcr.io/talkdai/dialog:latest

ref: https://github.com/prest/prest/blob/1f78715ca7711a9812dda8081d7b18e1e6589dda/.github/workflows/build.yml#L48

Broker run.sh

When I run docker compose up I receive the following error:

etc/run.sh: line 6: : command not found
etc/run.sh: line 11: : command not found

The problem in script happens because the condition if "${ENVVAR}"; then attempts to execute the content of the variable ENVVAR as a command.

langchain version upgrade, to `+v0.1.7`

Today we are using version ^0.0.3333 of LangChain, and we need to update to the latest version. This update may potentially break many implementations made in the dialog.

It is a ~~"risk"~~ that comes with using a library developed by a large community.

Adds support for custom library installation on Plugins

We need to enable plugins to have its own requirements.txt, enabling better dependency handling and not editing the main dialog container file.

send memory to LLM

We're not sending the memory to llm, i.e. LLM doesn't have the conversation history, so it can't respond in a personalized way based on the conversation history.

reference https://python.langchain.com/docs/modules/memory/adding_memory

The PostgresChatMessageHistory class is not an instance of BaseMemory, it inherits from ABC (abstract class).

possible solution

implement conversion of the message list to an instance of the BaseMemory class

model in toml

The model is defined in hard code gpt-3.5-turbo, we need to take it to the configuration (in TOML), making the project customizable

dialog/src/llm/__init__.py

Line 22 in 012c73c

model_name="gpt-3.5-turbo",

[prompt]
model_name = "gpt-3.5-turbo"

if not set, it should be model gpt-3.5-turbo by default

empty database is not populating the db

Traceback (most recent call last):
  File "/Users/avelino/projects/avelino/talkd/dialog/.venv/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3790, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "index.pyx", line 152, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 181, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'primary_key'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/avelino/projects/avelino/talkd/dialog/src/load_csv.py", line 69, in <module>
    load_csv_and_generate_embeddings(args.path)
  File "/Users/avelino/projects/avelino/talkd/dialog/src/load_csv.py", line 48, in load_csv_and_generate_embeddings
    new_keys = set(df["primary_key"]) - set(df_in_db["primary_key"])
                                            ~~~~~~~~^^^^^^^^^^^^^^^
  File "/Users/avelino/projects/avelino/talkd/dialog/.venv/lib/python3.11/site-packages/pandas/core/frame.py", line 3893, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/avelino/projects/avelino/talkd/dialog/.venv/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3797, in get_loc
    raise KeyError(key) from err
KeyError: 'primary_key'

categorização por sessão - o que foi conversado na sessão

é importante ter palavras-chave sobre o que foi conversado na sessão para análise futura e retreinar a base de conhecimento

Adds API Token support

We need to add some auth to our API so we can keep it secure. Maybe a JWT would work here, or some other sort of bearer token.

replace `pgvector` with `timescaledb`

improving vector performance ref

https://www.timescale.com/ai

Add hot reload for environment variables

Adds hot reload on environment variables, making them dynamic. If changed, they must be set to the new value assigned in the system.

This will enable plugin hot-swap and other projects.

New Docker image not installing dependencies correctly

Broken docker image

When I try to run a container with the latest docker image available on ghrc I have the following error

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/app/etc/run.sh": stat /app/etc/run.sh: no such file or directory: unknown.

prompt por sub-categoria

carregar campos de categoria e sub-categoria
criar nó para header de subcategoria no arquivo de configuração (toml)

Adds OpenAI API support

Basically we need to add support to the OpenAI request and response, so we emulate their API.

message bucket: collect messages and send in a single prompt

The POST /chat/{chat_id} receives one message at a time, when it receives it, it processes it in the llm and returns a message to you, example:

QA 1: hello how are you?
QA 2: I need help, can you help me, please?
QA 3: how do I perform operation XYZ on the web page?

This flow of conversation is common in a chat environment (e.g. WhatsApp), where the user breaks the line. What the user would like to receive is only the answer to QA 3, the other messages are introductory ("presentation").

As implemented today, we answer one message at a time:

Reply from QA1: Hi, how can I help?
Reply from QA2: It would be a pleasure to help you, how can I help you?
Reply from QA3: You should access, ... the answer to the question

the answer that matters is QA3, QA1 and QA2 are ~~"duplicated"~~

solution

Parameter in the endpoint (POST, create message) called message bucket, which activates intelligence to collect messages in the backend and make a single call to the LLM sending the collection of messages.

I can think of a solution to collect requests and if no message is received at X after the last message received, call the LLM aggregating all the messages not sent.

it's not the best solution, but it's the solution that comes to mind at first - this issue is to discuss the best solution, probably the proposed solution is not the best

improve prompt (logic) for not identifying as AI - "Peço desculpas, mas como assistente de IA"

QA: qual telefone de compra pelo whatsapp?
answer: Peço desculpas, mas como assistente de IA, não tenho acesso a informações atualizadas em tempo real. Recomendo que você verifique o número de telefone atualizado para compras pelo WhatsApp no site oficial da Buser ou entre em contato com o suporte da empresa para obter essa informação.

Broken PLUGINS default config

When I try to run the application with default PLUGINS config I receive the following error:

 File "/app/src/settings.py", line 16, in <module>
    PLUGINS = config("PLUGINS", cast=Csv(), default=[])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/decouple.py", line 248, in __call__
    return self.config(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/decouple.py", line 107, in __call__
    return self.get(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/decouple.py", line 101, in get
    return cast(value)
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/decouple.py", line 286, in __call__
    return self.post_process(transform(s) for s in splitter)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/decouple.py", line 286, in <genexpr>
    return self.post_process(transform(s) for s in splitter)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/shlex.py", line 300, in __next__
    token = self.get_token()
            ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/shlex.py", line 109, in get_token
    raw = self.read_token()
          ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/shlex.py", line 140, in read_token
    nextchar = self.instream.read(1)
               ^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'read'

Classify conversations with LLMs

Currenty, we generate keywords from the conversation as a start point to classify the conversation. The classification of conversation can have several downstream uses, such as support allocation, prioritization of emergencies, and understanding what topics drive your support, etc.

LLMs are also good at classification, and we can take advantage of langchain's implementation for tagging documents. The ideia is to allow the user to specify a prompt and tags description, and then at each request an async task will tag the conversion as we already do here.

slow request for gpt when prompt is large

when the prompt is too large, the request to the GPT is too slow

example of what I'm calling a large prompt (2674 characters):

You are a virtual assistant who answers any question about traveling with XXX. 
You receive a question and a set of information related to XXX.
If the user's question requires you to provide specific information from the information provided, give your answer based only on the examples below (Documents section). Do NOT generate an answer that is NOT written in the examples (documents) provided.
If you do not find the answer to the user's question with the examples provided below, reply that you did not find the answer in the information and propose that they reformulate their query with more details or contact human support. In these cases, inject the text '__ANSWER_NOT_FOUND__' at the end of your answer, to indicate to the backend that this is a case in which you were unable to answer.
If you notice that the question deals with critical cases during the trip, such as accidents, mechanical failure of the bus (when the bus "breaks down"), harassment, fights or road checks, regardless of whether you were able to answer or not, inject the term '__EMERGENCY__' at the end of the text for our backend to capture this case.
Use bullet points if you need to make a list, only if necessary.

Recalling the business model: XXX connects travelers with executive charter companies ('XXX Passage' sales model or charter model) to sell trips. 
We also have a second business model where we resell tickets from bus companies (Ticket Resale model), so be aware of the different rules when this is the case.
Therefore, whenever the answer has different rules for these two business models, explain both rules clearly, preferably using bullet points. In the charter model, 
some partners accept that we 'plot' the bus with XXX's branding (in pink with our logo), but other partners use buses with their own existing branding.

Use this XXX manifesto as inspiration for Buser's role in society:
'We believe that life is better when we have choices. 
In XXX, the vast majority of cities are connected by just one bus company, and our airports and railways are insufficient. As a result, we are forced to pay dearly for the lack of choice. Many Brazilians take their chances with clandestine transportation because they can't afford traditional options.
XXX is a safe and modern alternative. We connect people who want to travel to the same destination with executive charter companies. Our shared and sustainable technology promotes mobility in Brazil, creating a new option for safe, quality transportation at fair prices.
As has already happened with private app-based transportation in cities, it's time for intercity transportation to change for the better. And for good.

embedding: regenerate and popularize vector when changing scope

our csv has the following fields:

category
subcategory
question
content

we can use the first 3 fields as a "key", making it simpler to know which embedding we need to rewrite and which we need to delete (if it is removed from the csv).

`process_user_intent`: this is very logical

llm/__init__ needs to be refactored, thinking about the API and how it will be consumed, removing as much cognitive load as possible (isolating responsibilities).

Add setup for dev container

Dev Container is very helpful for those who prefer VS Code :)

Suggest the "data" folder to store .toml and knowledge base

In the project's root directory, should we indicate in the README that .toml and the knowledge base (.csv) should be in a data/ folder? It's already ignored by .gitignore to avoid their versioning.

Add plugins support to talkd dialog

We should be able to make dialog extensible without messing around the original code. We can set this as a dependency injection and setting the original router to support it as importing modules. This needs to be carefully set so we don't open any security breaches such as evals and stuff.

Maybe we could use importlib as described: https://stackoverflow.com/questions/64916281/importing-modules-dynamically-in-python-3-x

Customizable CORS configurations

I think that we can have some CORS configurations on settings.py like CORS_ALLOW_ORIGINS, CORS_ALLOW_METHODS and CORS_ALLOW_HEADERS.

Add support for multi-dataset data

Now, if you want to load data into dialog, you must use one single CSV that is the source of truth from the table you are loading, we could add a new approach enabling multiple CSVs, ie., so we can split datasets into FAQ, Technical Questions, Historical Questions and etc.

webhook support

In order for talkd/dialog to be a simple-to-use solution (API), we need to implement webhook support for "market standard" solutions, i.e. support the format of the main tools, for example:

whatsapp
openai

It is possible to make the adoption of talkd/dialog simple, such as changing the webhook URL and tokens, so that everything "magically" continues to work.

hallucination: when asking a question outside the content base you are responding to, we must limit

QA: how to make carrot cake?
answer: To make a carrot cake, you will need the following ingredients: grated carrots, flour, sugar, eggs, vegetable oil, baking powder, baking soda, cinnamon, and salt. Mix all the dry ingredients together in one bowl, and the wet ingredients in another. Then, gradually add the dry mixture to the wet mixture, stirring until well combined. Pour the batter into a greased cake pan and bake in a preheated oven at 350°F (175°C) for about 30-35 minutes, or until a toothpick inserted into the center comes out clean. Let it cool before frosting with cream cheese frosting. Enjoy!

if the knowledge base is about a subject that has how to prepare cake with the ingredient, it's OK to give a cake recipe using the ingredient that the knowledge base has, but if it's a bus company and the knowledge base has nothing related to carrot cake, I'd expect an answer along those lines:

oops, I think you're asking a question that's way out of the context of a service, is there anything else I can help you with?

I'm thinking of two solutions in configuration (toml):

memory in prompt
last X messages at the prompt (not the entire history)

docker: build error

errors:

~/p/a/t/dialog ❯❯❯ docker compose up                                                                                                                                                                                        ...
...
...
 => CACHED [web stage-1 3/6] COPY --from=dependencies /dependencies/requirements.txt ./requirements.txt                                                                                                                        0.0s
 => ERROR [web stage-1 4/6] COPY /bundle/ /bundle/                                                                                                                                                                             0.0s
------
 > [web stage-1 4/6] COPY /bundle/ /bundle/:
------
failed to solve: failed to compute cache key: failed to calculate checksum of ref b9ec112a-a178-474a-9e8e-ed1fa40ad3bb::dq7mjpoo4myi052qz0z310xmq: "/bundle": not found
~/p/a/t/dialog ❯❯❯

dialog-web-1  |   File "/app/main.py", line 6, in <module>
dialog-web-1  |     from .models.db import session, engine
dialog-web-1  | ImportError: attempted relative import with no known parent package
dialog-web-1 exited with code 1

need:

when the container comes up, it must process the load_data (of makefile), docker ENTRYPOINT
- update the database with the new embeds
- if there is an "index" we should update

`DATABASE_URL` env var

https://devcenter.heroku.com/changelog-items/438

we won't use heroku, but this is a standard for applications running in the cloud, avoid using user/pass