Comments (8)
@luismanez assuming that the results are the same and the PR is improving the query (not adding new features), I plan on running a few tests and make sure I fully understand the new syntax, then yes the PR should be merged and released soon, I think 2 weeks max 👍
from kernel-memory.
Interesting problem, do you know how SharePoint and Active Directory scale access control to similar scenarios? It might be a reverse lookup, e.g. after fetching a list of records, filter out those that are not accessible, client side. In KM that would mean fetching all relevant records, regardless of user access, and filtering them out on the client side, before the user can consume them.
from kernel-memory.
As far as I know, SharePoint indexes Permissions too, so the search engine apply the security trimming in the server side. I can confirm so, cos for instance, if let's say Bob has access to Document1, Bob runs a search query and Document1 is returned. Right then, and admin changes permissions for Document1 and remove Bob access. Bob still will see Document1 in search results for a while, until the Incremental crawl re-indexes Document1 permissions.
Actually, our approach is working fine, but we have needed to download KM source code, and edit BuildSearchFilters
method to compose a search.in query if we find multiple filters with the same key:
tags/any(s: search.in(s, '.....
We're happy to do a PR but wondering if there's something better.
Very curious to know how M365 Copilot solves this same problem 😄 ? I don't think it gets back all the documents, and then starts calling MS Graph to check if the current user has permissions on each document.
from kernel-memory.
happy to take the PR if you can work on it.
Trying to think about how one would store "this document is accessible to user1, 2, 3.... 100000", there might be multiple approaches. E.g. one could be about using virtual groups stored in meta-tables, auto-clustering users to reduce the cardinality of those filters. Something like:
doc1 is accessible to u1,u7,u8,u100,u102,u103
doc2 is accessible to u1,u7,u8,u100,u102,u888
vgroup1=u1,u7,u8, u100,u102
vgroup2=u103
vgroup3=u888
and so on...
from kernel-memory.
Thanks @dluc
I'll do the PR in the next days.
If I'm understanding right, in our scenario, the meta-tables is Azure AD, and the virtual groups are AAD Groups / M365 Groups. So, the document is indexed with a custom tag "PrincipalsAuthorized", and there we stored the different Groups that have access to the document (and also UserIds, if only specific users are configured). Then, document is indexed with less than 20 IDs in most of the cases.
However, the problem is when you want to query only documents where the current user has permissions. In this case, if a user is member of 500 groups (pretty common in M365, as every Team is a M365 Group in Azure AD), the search query will have 500 "conditions":
(tags/any(s: s eq 'PrincipalsAuthorized:xxxxxxx’)) or (tags/any(s: s eq 'PrincipalsAuthorized:xxxxxx’)) or .......
This query will crash in Azure Search (Invalid expression: Recursion depth exceeded allowed limit.\r\nParameter name: $filter)
The Search.In query works fine for these scenarios, but the KM must keep also the possibility of combining multiple MemoryFilters (this is what we're doing now before sending the PR) ...
from kernel-memory.
hey @dluc I've sent a PR with our solution to this issue. Please, give it a try, as although is working for us, we're only using a Tag, and might be missing something.
Many thanks!
from kernel-memory.
Hi @dluc
sorry to bother, but we are upgrading our (big) solution to .NET 8, and we'd love to have our PR merged, so we can rid off our (old) copy of KM code. I know you are busy, but can you at least let me know if the PR looks good and likely will be merged in 2-3 weeks? otherwise, I will copy latest KM source code and will add my changes, but is not cool 😄
Many thanks!
from kernel-memory.
Closing this one, as has been addressed in Package 0.27.240207.1.
thanks for your help @dluc !
from kernel-memory.
Related Issues (20)
- [Bug] Microsoft.AspNetCore.Http.Abstractions is deprecated HOT 2
- [Feature Request ]I want to know,do we push python version for kernel-memory in the future?
- [Bug] Exception when using Qdrant v1.8.0 HOT 5
- [Bug] Issue with default index name management HOT 1
- [Feature Request] vector support milvus
- [Question] Possible to get an example using AzureQueue ?
- [Bug] Documents duplicated with Qdrant HOT 5
- [Question] qdrant HOT 8
- [Question] Is it possible to import from a web page that requires auth? HOT 2
- Running server with Simple Disk storage seems always to use volatile memory HOT 2
- [Question] Should document upload status endpoint signal when the files were ignored by the text extractor?
- [Bug] NotSupportedException when referencing Microsoft.SemanticKernel library v1.6.1 HOT 7
- Non of the serverless examples run without Azure and/or OpenAI endpoints. HOT 6
- [Bug] An item with the same key has already been added. Key: XXXXXX.docx.partition.0.txt.AI.OpenAI.OpenAITextEmbeddingGenerator.TODO.text_embedding HOT 1
- [Question] TextExtractionHandler HOT 1
- [Question] I want to use the RAG function offline, but I only found the KEY for OPENAI in the documents and search. HOT 2
- [Bug] System.ArgumentException: Input span arguments must all have the same length. HOT 8
- TextChunker CPU usage HOT 2
- TextChunker doesn't handle Markdown Tables HOT 3
- Different similarity results when using text-embedding-3-small or text-embedding-3-large models HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kernel-memory.