GithubHelp home page GithubHelp logo

Comments (8)

dluc avatar dluc commented on June 6, 2024 1

@luismanez assuming that the results are the same and the PR is improving the query (not adding new features), I plan on running a few tests and make sure I fully understand the new syntax, then yes the PR should be merged and released soon, I think 2 weeks max 👍

from kernel-memory.

dluc avatar dluc commented on June 6, 2024

Interesting problem, do you know how SharePoint and Active Directory scale access control to similar scenarios? It might be a reverse lookup, e.g. after fetching a list of records, filter out those that are not accessible, client side. In KM that would mean fetching all relevant records, regardless of user access, and filtering them out on the client side, before the user can consume them.

from kernel-memory.

luismanez avatar luismanez commented on June 6, 2024

As far as I know, SharePoint indexes Permissions too, so the search engine apply the security trimming in the server side. I can confirm so, cos for instance, if let's say Bob has access to Document1, Bob runs a search query and Document1 is returned. Right then, and admin changes permissions for Document1 and remove Bob access. Bob still will see Document1 in search results for a while, until the Incremental crawl re-indexes Document1 permissions.

Actually, our approach is working fine, but we have needed to download KM source code, and edit BuildSearchFilters method to compose a search.in query if we find multiple filters with the same key:
tags/any(s: search.in(s, '.....

We're happy to do a PR but wondering if there's something better.

Very curious to know how M365 Copilot solves this same problem 😄 ? I don't think it gets back all the documents, and then starts calling MS Graph to check if the current user has permissions on each document.

from kernel-memory.

dluc avatar dluc commented on June 6, 2024

happy to take the PR if you can work on it.

Trying to think about how one would store "this document is accessible to user1, 2, 3.... 100000", there might be multiple approaches. E.g. one could be about using virtual groups stored in meta-tables, auto-clustering users to reduce the cardinality of those filters. Something like:

doc1 is accessible to u1,u7,u8,u100,u102,u103
doc2 is accessible to u1,u7,u8,u100,u102,u888

vgroup1=u1,u7,u8, u100,u102
vgroup2=u103
vgroup3=u888

and so on...

from kernel-memory.

luismanez avatar luismanez commented on June 6, 2024

Thanks @dluc
I'll do the PR in the next days.

If I'm understanding right, in our scenario, the meta-tables is Azure AD, and the virtual groups are AAD Groups / M365 Groups. So, the document is indexed with a custom tag "PrincipalsAuthorized", and there we stored the different Groups that have access to the document (and also UserIds, if only specific users are configured). Then, document is indexed with less than 20 IDs in most of the cases.

However, the problem is when you want to query only documents where the current user has permissions. In this case, if a user is member of 500 groups (pretty common in M365, as every Team is a M365 Group in Azure AD), the search query will have 500 "conditions":

(tags/any(s: s eq 'PrincipalsAuthorized:xxxxxxx’)) or (tags/any(s: s eq 'PrincipalsAuthorized:xxxxxx’)) or .......

This query will crash in Azure Search (Invalid expression: Recursion depth exceeded allowed limit.\r\nParameter name: $filter)

The Search.In query works fine for these scenarios, but the KM must keep also the possibility of combining multiple MemoryFilters (this is what we're doing now before sending the PR) ...

from kernel-memory.

luismanez avatar luismanez commented on June 6, 2024

hey @dluc I've sent a PR with our solution to this issue. Please, give it a try, as although is working for us, we're only using a Tag, and might be missing something.

Many thanks!

from kernel-memory.

luismanez avatar luismanez commented on June 6, 2024

Hi @dluc
sorry to bother, but we are upgrading our (big) solution to .NET 8, and we'd love to have our PR merged, so we can rid off our (old) copy of KM code. I know you are busy, but can you at least let me know if the PR looks good and likely will be merged in 2-3 weeks? otherwise, I will copy latest KM source code and will add my changes, but is not cool 😄

Many thanks!

from kernel-memory.

luismanez avatar luismanez commented on June 6, 2024

Closing this one, as has been addressed in Package 0.27.240207.1.
thanks for your help @dluc !

from kernel-memory.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.