cassioml / cassio Goto Github PK
View Code? Open in Web Editor NEWA framework-agnostic Python library to seamlessly integrate Cassandra with ML/LLM/genAI workloads
License: Apache License 2.0
A framework-agnostic Python library to seamlessly integrate Cassandra with ML/LLM/genAI workloads
License: Apache License 2.0
Walkthrough of user experience - https://drive.google.com/file/d/1w5otdPItPGA9tDo-j282P1UCdDT6z2og/view
When developing the NoSQL assistant, the the prompt template that was used did not work well because the directives "e.g. Use information from vector search results to answer your question", otherwise answer "I don't know"" did not work properly because the template was not properly formatted by using ''' around the vector search results.
This task is not a straight forward task because the prompt template needs to work under various scenarios, not just Q&A scenarios. (User might be having a chat with bot and not even asking a question).
Prompt template will also have to take into account items such as chat history and ability to perform caching.
This should be straightforward. A key-value store backed by Cassandra in the kvstore implementations in langchain.
Why should the session_id (e.g. the user identity) be specified only at class instantiation time?
So my web app serves thousands of users and I have to instantiate thousands of these classes.
Consider adding a session_id
parameter to methods (get messages, put, etc) so that a single class will "statelessly" work on the whole table and serve all users, no?
Add simple validation code to avoid people from passing arbitrary stuff or even CQL injection there!
Map the space of metadata exact search combined with ANN and redesign the vector class (or a variation thereof) so that it will support that kind of search.
Possibly compare with the metadata capabilities other vector DBs offer and try to make them available at cassIO level.
Currently, langchain out of box only allows inserting a single column with a single embedding.
Add the ability to store multiple columns of data into a vector store at the same time.
Currently: each table abstraction class requires session and keyspace.
Proposal: make them optional and have them default to a cassio-global session & keyspace.
This would be set with cassio.init(DB parameters)
- this init method having various forms and essentially being a friction-removal utility function (both for cassandra clusters and cloud connections).
vectors are large and don’t change often, the overhead of doing 2x the work to see if the replicas agree is not a good tradeoff
if we can easily make it configurable, great; if not, just make it L_O across the board
Classes abstracting table access are ad-hoc things now, designed after langchain needs. This task is about capturing generalizations and share the code in a system of mixins/subclasses (tbd) with hierarchical responsibilities re: generation of CQL and receiving method parameters. E.g. vector/nonvector, clustering/nonclustering, etc.
Langchain uses some of these, but there is a "rectangle to complete" conceptually:
Draft for a class system (conceptually):
Anant.us and DataStax are hosting LLM bootcamps using CassIO: https://kono.io/bootcamp/ . We should make sure that it is on the website
A delete_many
method accepting a list of IDs, that internally does concurrent.
Or (not ideal perhaps) at least make the delete async and have the langchain layer (or equivalent) handle that concurrency (compare #14 for the same discussion). At the moment there's a loop at the langchain level and deletes are serialized.
DataStax is currently working on a NoSql Assistant that is representative of the canonical chatbot. DataStax should package up the application into a demo that can demonstrate the power of CassIO. The demo should include the following:
The key trick fits different-arity choices of the "key" (as abstract concepts) into a single table, i.e.
abstract key = [['name', 'city','age'], ['John', 'Rome', 123]]
becomes the (always 2-)tuple of two strings
cache_key = "['John', 'Rome', 123]"
Possibly cumbersome and/or confusing.
Pro: fits heterogeneous stuff on the same table
Con: essentially repeats what the C* partitioner does
Additional comments (by @jbellis. The first one is not entirely clear to me)
PRIMARY KEY (( key_desc, cache_key ))
This is fine, however, if you have many smaller caches you’re better off allowing key_desc being the only partition key. (Since then Cassandra can restrict the queries to just the replicas owning that partition.) This is fine for 1.0 but we may end up wanting to expose this either directly (partitioned boolean) or indirectly (expected cache size parameter that lets us make the decision under the hood)
self.keyDesc = '/'.join(self.keys)
IMO we’d be better served by just providing a cache name parameter and let the caller decide how to build it
CassIO expects the Session to have the named-tuple Row factory (i.e. the rows are returned as Row
objects from CQL queries).
Sometimes, however, for other reasons users stray off the default and set the row factory to e.g. dict_factory
. Then, when passing the session to cassIO, boom.
At least check and give an error, or work around this by either:
Cannot assume passed iterable have a len(), nor that they are indexable. So, batched iterators to the rescue.
At the time of implementation, something was not yet on DB-side. Things have changed.
CassIO currently supports Langchain. We should add support for LlamaIndex next.
Integration options:
Vector Store - https://gpt-index.readthedocs.io/en/latest/how_to/integrations/vector_stores.html
Index Stores - https://gpt-index.readthedocs.io/en/latest/how_to/storage/index_stores.html
A data connector (i.e. Reader) ingest data from different data sources and data formats into a simple Document representation (text and simple metadata).
https://gpt-index.readthedocs.io/en/latest/api_reference/readers.html
This would enable a broad class of "reading from Cassandra" use cases.
Currently, there is no easy way to determine the distance thresholds for relevancy for Nearest Neighbor search. We need a tool that can help at least "visually" help determine a good cutoff for relevancy (see : https://towardsdatascience.com/k-nearest-neighbors-knn-for-anomaly-detection-fdf8ee160d13)
It might make sense for the cassIO VectorTable class to offer its own MMR implementation.
(Currently for the langchain integration case this is done at langchain level, but arguably the right place is cassIO).
I followed the setup in the readme with a local Cassandra instance and tried to run the integration tests. It's easy to create the keyspace, but I think with a local Cassandra setup, we should set up the keyspace as we have it in the CASSANDRA_KEYSPACE environment variable. Also if we do a "CREATE KEYSPACE IF NOT EXISTS" we could just do that in all cases.
It's not a huge deal but would make it simpler for first time users to not run into errors on the happy path.
Dot product is about 40% faster than the default cosine, but we can only use it if the embedding vectors are normalized.
If we know what the embeddings provider is we can make an intelligent default. (OpenAI and Google's are both normalized, for instance. OpenAI's are probably overwhelmingly the most popular.)
Currently: cassIO does nothing, and DB exceptions bubble up to the caller.
(At the integration level, e.g. the langchain code using cassIO, the same happens).
Is there a change in philosophy needed here? Pro: users who don't want to bother have an easier life (in a sense). Con: error swallowing is generally bad.
For every LLM Prompt and Response, it would also be useful to track what pieces of context data was used for generating the prompt. For example, if 10 different entities was retrieved from the database, store the keyspace, table, column, and id for each entity in the Chat History.
This is useful for data lineage and data tracking. Data tracking can used to find bad data, or to help find sources of data that was used to generate answers.
I.e. the insertions normalize all vectors to norm one and then internally the dot is used for cos.
This saves ~50% cpu time on ANN searches.
This error is thrown in a call to VectorstoreIndexCreator.from_loaders. It was working a few days ago.
Vector Search - specifically k nearest neighbor - is very sensitive to outliers. Filtered vector search gives the ability to reduce the search space prior to performing vector search. This is being advertised in Pinecone's marketing materials significantly, so we need to figure out how to perform this. Tooling such as langchain actually don't make it possible to do filtered vector search.
Standardize the code and the flow with these elements.
This includes type hints everywhere.
(and will also expose the leaky abstraction around the current vector mixin, eeeh)
LangChain:
In the current implementation of the Summary Buffer Memory, the summary is never persisted (always in memory).
Investigate whether it pays off / is feasible to use Cassandra for that.
When retrieving data via ANN from Cassandra, a light-weight re-ranking for the purposes of determining what vector search results to pass the the LLM is necessary.
Much work needed on the "Data extractor" facility.
Optimize queries (each table queried at most once)
For multiple-row returning, some thinking is needed (perhaps even just another extractor altogether?)
LangChain has no specific "semantic chat memory": that stems, instead, from a certain usage of the VectorStore.
(see here on cassio.org and here for a howto on LangChain site).
In practice, once you have a vectorstore, first a "retriever" is created out of it (langchain standard construct) and then the latter is wrapped by a VectorStoreRetrieverMemory
class (another langchain standard). Relevant steps:
vectorstore = whatever-your-backend.init(...)
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))
memory = VectorStoreRetrieverMemory(retriever=retriever)
# now "memory" can be used e.g. in a chat
So in a realistic usage you don't want to pull relevant chat snippets from the whole store, rather from the conversation with that user of course. In Cassandra terms, this means clustering rows by user_id.
Hence we need a parameter in CassIO's VectorTable
init that controls whether we have a primary key (( document_id ))
or (( session_id) , document_id)
in Cassandra terminology. This is not implemented yet: at the moment we only have the first choice and no control.
(Note: I assume we don't want to have a different table per user id !)
Once the above is addressed, LangChain also will have to slightly change.
similarity_search_with_score_id_by_vector
The search_kwargs
parameter in spawning the "retriever" will be the place to specify the user_id (i.e. session_id, i.e. partition to use for the subsequent lookup). These end up in the kwargs of the similarity_search_with_score_id_by_vector
method of the Cassandra Vector Store, which will be able to pass this partition key to the cassIO search
.
Pro: less proliferation of instances of vector store.
Con: might involve more kwargs as this param gets to the Cassandra vector store through several routes (whether mmr, similarity, etc it's different functions being called. See as_retriever
method of base VectorStore class.
In this case one creates as many VectorStore classes as there are session_ids, each with the partition key as instance property, and this gets injected into each search()
call within that instance. Much less intrusive, a bit heavier resource-wise perhaps.
There are the following "Objects" in Langchain that can be managed by data stored elsewhere, and not necessarily "hardcoded" in code.
In the beginning can just be Python objects , or rather JSON configs , then we can move to tables.
AgentTypes (stores the registry of Agent Types)
LLMTypes
ToolTypes
DocumentLoaderTypes
Can start with JSON and then move to Tables
LLMConfiguration
Agent
Tools
DocumentLoaderConfiguration
Index
This is just the overall spec of what goes into an agent at a high level. Recommendation is to first implement with 100% JSON driven agent -- then implement in Schema -- since it will just be an optimization of where to store the config rather than functionality. This is applicable to future agent frameworks.
Different chunking (text-splitting) strategies affect the performance of how well embedding models are able to embed data into vector spaces. Right now the out of box methods being used cuts the sentences off mid way. This task should:
Currently, the docs of the langchain integration looks very basic, and doesn't reflect vector search or CassIO:
https://python.langchain.com/docs/modules/memory/integrations/cassandra_chat_message_history
https://python.langchain.com/docs/ecosystem/integrations/cassandra
https://github.com/hwchase17/langchain/blob/master/langchain/memory/chat_message_histories/cassandra.py
It is also possible to create a version of the chat that discusses the managed version of Cassandra (AstraDB)
https://python.langchain.com/docs/modules/memory/integrations/motorhead_memory_managed
Other places that documentation & integration is missing:
https://python.langchain.com/docs/modules/data_connection/retrievers/
https://python.langchain.com/docs/modules/data_connection/text_embedding/
A suggestions is to drop a link to the CassIO website.
A retry strategy (essentially, it could be three parameters num_retries
, retry_timeout
, retry_sleep_seconds
) for CQL operations.
This might have interplay with a batching strategy at cassIO level. However, given the current design of cassIO/langchain integration (e.g. vector store insert many) one should move the whole insert-many into cassIO (which could even make sense after all).
So cassIO would espose a put_many
method that internally handles even batching.
Prompting the LLM with the right prompt template is important for the LLM to make sense to the content from Vector search. Important items include:
Passing None
as metadata results in the column containing the literal string null
(a valid JSON).
Either allow null metadata (currently the idea is to have an empty dict at least) or forbid it (e.g. normalizing to {}
when writing).
when reading using the vector index, we don't require ALLOW FILTERING
as that will cause performance degradation.
In the add_texts
method, the intent is to have an optional TTL which defaults to the class-level one.
This is done via
ttl_seconds = ttl_seconds or self.ttl_seconds
Suppose the class default is 10 seconds and one passes explicitly 0 to the method. The insertions have then 10 seconds, contrary to user expectations.
Find a better interface (this seems like it'll be a general problem with TTL, where zero and None have ambiguous meaning).
Suggestion: a symbolic NOT_PASSED default which is not None and not zero, checked for in the code.
https://gptcache.readthedocs.io/en/latest/index.html#roadmap
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.