BLUF: We are looking to add the ability to add labels to a decision log from within po

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

You could use decision log <a href="https://www.openpolicyagent.org/docs/latest/manage

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add the ability to create decision labels within a policy,about open-policy-agent/opa

Comments (28)

aholmis commented on May 8, 2024 1

My current project also have similar requirements, we want to have in our logs the reason for the decision, using a common component, without having it added to the result of all (many) rules.
I don't really see why not, but I'm biased 😄
If the --server thing is a blocker, then maybe also add it to the output for use with --explain or kindof what print does?

from opa.

srenatus commented on May 8, 2024 1

Ah, so this would allow us to side-step the problem that ctx isn't forwarded through iter? I.e. we don't have to do the more idiomatic

ctx = logs.WithKV(ctx, "key", "val")
return iter(ctx, ...)

...which would (probably) require deeper changes in the eval code.

from opa.

ashutosh-narkar commented on May 8, 2024

Hello @david-hamilton-bah thanks for the detailed issue and also for providing info on your use-case. From what I understand the app developers will author some policies and use the common helper packages y'all provide to come up with the final decision. Is this correct?

Create a built-in function that allows the ability to add a Map of data to the decision context, then output that data with the decision log api on a new field decision_labels or similar.

What's not clear to me why these extra "labels" that you need for filtering purposes cannot be part of the policy decision (ie result field). I would have imagined the policy decision is an object that would contain all the necessary context which the decision log server could use for filtering, analytics etc. It seems very specific to your use-case that label data should be separate from the result. Something to consider would be to define a policy decision format which the app developers need to adhere to and in this way you know how the policy decision will look. I don't know much about your use-case to tell whether this is feasible but happy to expand if you provide more details. Thanks!

from opa.

david-hamilton-bah commented on May 8, 2024

From what I understand the app developers will author some policies and use the common helper packages y'all provide to come up with the final decision. Is this correct?

Correct.

What's not clear to me why these extra "labels" that you need for filtering purposes cannot be part of the policy decision (ie result field). ... Something to consider would be to define a policy decision format which the app developers need to adhere to and in this way you know how the policy decision will look.

Defining a standard policy decision format is something I am attempting to avoid. This is a very large project with many teams distributed across the country. If we had to make changes to that standard format, we would need to have each team apply any new changes there. Many will be using istio envoy proxies that are expecting a simple true/false value on the response. Even with mandated response format, we could open the door for misuse and missing reporting in these cases. Putting the data in response also goes against Zero Trust principles in that it sends additional data to the consuming application that it does need. That data may end up being logged by the consuming application, potentially creating a leak.

from opa.

anderseknert commented on May 8, 2024

You could use decision log masking to add data to the result (or the logged input if you so prefer) without it being part of the actual result returned to clients.

package system.log

import rego.v1

mask contains {"op": "upsert", "path": "/input/result/labels", "value": labels} if {
    # some logic to determine what labels to apply here
    labels := #...
}

I think that would be preferable over adding a new built-in to OPA which was essentially a no-op in any context but opa run --server.

from opa.

david-hamilton-bah commented on May 8, 2024

@anderseknert
That's an interesting thought, we are already using log masking to hide PII. However, how would I get the data from my common policy into the decision context in order to add it to the result in the log masking rego?

from opa.

anderseknert commented on May 8, 2024

You can import and evaluate any rules you normally would from your decision log policy. I guess it could be a problem if something was really expensive to calculate in the first place, and you now would have to do it once for the client and another time in the masking policy (possibly with some mocked input attribute indicating that you'd want extra data returned in the result). The masking policy runs asynchronously, so the user isn't kept waiting, but it obviously adds some load to the OPA instance nonetheless. I could think of a few ways that could be improved if necessary, but it's a bit early to expand on that before knowing if/where it is a problem in the first place :)

Adding built-in functions that have a hard coupling to the way OPA is run (like various logging functions suggested in the past), and that wouldn't work with opa eval, opa test, etc... is something at least I'd like to avoid if possible.

from opa.

david-hamilton-bah commented on May 8, 2024

I can understand not wanting to introduce built-ins that do not have a purpose for multiple methods of running opa. Could this functionality of context labelling serve a purpose to also be logged using --explain similar or enhancing trace functionality?

I know for opa test usage I would like to see the data added to the context within the trace

from opa.

anderseknert commented on May 8, 2024

The system.log policies could/should be unit tested as other code. Something like rule level tracing would indeed be a nice addition to OPA, and probably necessary if it's meant to be always on.

from opa.

david-hamilton-bah commented on May 8, 2024

Sorry, but I'm still not seeing how the desired behavior could be handled in the log masking. Say our tenant application has an endpoint GET /api/widgets/123 and they are using an envoy proxy ext_authz filter. In their policy they first check user roles (RBAC) then they use the data to determine to whom this widget belongs. Let's say participant id 444. Then they call data.common.has_read_access_to_participant_by_id(user_id, participant_id) and it returns false. The policy evaluates to false and the proxy returns 401.

Now asynchronously in the log policy we have the envoy input and the response output. Neither of those things provides us, in the common functionality, the ability to re-call the has_read_access with the participant id. We would have to translate from the widget id to the participant id, and that would require our common policy to have domain knowledge of widgets.

from opa.

ashutosh-narkar commented on May 8, 2024

Now asynchronously in the log policy we have the envoy input and the response output. Neither of those things provides us, in the common functionality, the ability to re-call the has_read_access with the participant id.

In the first policy eval, how was the participant_id determined? You mentioned some RBAC roles had to be looked up to get the participant_id. One option is to make participant_id part of the result and have that available in the masking policy.

Alternatively how are the RBAC roles provided? Are they part of a OPA bundle? If yes, the masking policy will have access to the OPA store which means it can access the RBAC data as well to lookup participant_id.

Another option is to make a http.send call from the masking policy but you've found this to not be performant iirc.

Since you mentioned envoy proxy ext_authz filter, I imagine you're using the OPA-Envoy plugin. There are ways to inject data in the result object using that plugin but we can explore that later.

from opa.

david-hamilton-bah commented on May 8, 2024

i probably shouldn't have mentioned the RBAC, that was just an example of what the tenant application may do. They may do some RBAC, then they have to determine read access to this particular resource. The data in OPA will have the mapping of widget->participant, but that is tenant data. Where the data comes from is irrelevant, the point is that it belongs to the tenant domain, and the common policy has no concept of a widget, only participant. I understand the possibility of enforcing that the tenant put the participant id in the response, but again it puts more onus on the tenant to ensure that they are compliant and produces the chance of this mandated reporting being missed. It also comes back to forcing the policy to be re-run in order to add the data to the decision log. This all seems very hacky compared to being able to label the decision context.

To me this is a very natural functionality. I would compare it to the Mapped Diagnostic Context (MDC) used in Log4j, which allows you to set data on the context that is then used for logging. See here for example. Our legacy applications using Sun XACML have access to the EvaluationCtx object and can store data there that is later processed after the evaluation is complete. Does the opa.runtime() or any other method accessible from policies have a way to access the context within a policy?

from opa.

ashutosh-narkar commented on May 8, 2024

It also comes back to forcing the policy to be re-run in order to add the data to the decision log.

If the participant id is part of the result then there is no need to re-run the policy to add this in. You either do it as part of the tenant policy or in the masking policy. The former as you said puts the onus on the tenant. The latter is controlled by the OPA admin which may be separate from the policy author. So this approach is allowing you to customize the additional data that you want to insert in the log post policy eval. Also since the masking policy uses the same in-memory store as the tenant, I would image that could be leveraged as well.

Does the opa.runtime() or any other method accessible from policies have a way to access the context within a policy?

Can you elaborate a bit on what you mean by "context within a policy"?

from opa.

david-hamilton-bah commented on May 8, 2024

If the participant id is part of the result then there is no need to re-run the policy to add this in.

The participant ID is not the data, the participant id is the minimum needed in order to evaluate whether the user has access. All of the policies would need to be re-evaluated to determine if the user can access the participant and then put that data in the log. Just making up a sample where users can't view users from participants belonging to another organization, the data that may be logged would be:

{
  user: {
    name: "Bob", 
    organization: "ABC"
    ...
  }, 
  participant: {
    name: "Alice", 
    organization: "DEF"
    ...
  }, 
  action: "View Participant Widgets", 
  decision: "Denied", 
  textReason: "Users belong to different organizations",
  ...
}

This is determined by a policy that would have had to run before the decision, and again in order to log the data.

Can you elaborate a bit on what you mean by "context within a policy"?

The Context in which the policy has run. An object that holds the information like the request, response (which obviously would not exist yet), NDBuiltinCache, trace ids, etc.

from opa.

ashutosh-narkar commented on May 8, 2024

Thanks for providing the above sample. In your original issue description, your plan was to Create a built-in function that allows the ability to add a Map of data to the decision context. Now let's say we have that, so in your use-case would this builtin be invoked in the tenant policy?

from opa.

david-hamilton-bah commented on May 8, 2024

No, the common policy would invoke the built-in to add the data.

from opa.

stale commented on May 8, 2024

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

from opa.

srenatus commented on May 8, 2024

I'm not sure if the notion of a builtin that's used for its side effects only is a good one in Rego. We've explicitly stated in the docs that custom builtins shouldn't do that, and that http.send should not be used to trigger side-effects in other systems. And now we'd introduce a builtin for just that, a side effect? 🤔

The result of that would make for a difficult-to-use feature, I'm afraid. People would need to know enough about the Rego evaluation semantics to be able to use it correctly; and even so, future optimizations could break things by no longer calling the builtin.

Are there any other things we could consider? Some clever scheme using rule annotations, perhaps?

from opa.

xico42 commented on May 8, 2024

I have a similar need as well.

For example, I want to add integration with a feature flag system (say unleash) directly in policies as a custom builtin. And it would be great to have the feature flag result in decision logs.

In this case if the builtin implementation had access to append to this context in go directly, would be enough.

Maybe some KV pair data structure that gets propagated from the builtin context to the context that goes to the decision log plug-in?

This way we don't add a new default builtin to bloat the rego api, but it is handled internally by custom builtin implementations.

Also, it might be easier than handling custom metadata.

By reading @david-hamilton-bah request it seems that the custom function that handles participant data may call a custom builtin already, or that at least could be solved this way.

from opa.

srenatus commented on May 8, 2024

In this case if the builtin implementation had access to append to this context in go directly, would be enough.

🤔 So if the evaluator (topdown) passed the context along while giving the builtins a chance to append key/val pairs by the usual context.WithValue machinery; then a custom decision log plugin could extract and process them. @xico42 is what what you had in mind?

So this would become less of an enhancement request and more a smaller-sized internal thing that would enable power-users to do stuff they hadn't (easily) been able to do before.

from opa.

xico42 commented on May 8, 2024

Exactly @srenatus. Although I was thinking about something coded into the default decision log, if this was possible by using a custom one, I would be happy with it either way.

from opa.

xico42 commented on May 8, 2024

Just to make my self a bit clearer, that's what I had in mind:

// Package logs implements decision log buffering and uploading.
package logs

type ctxKey struct{}

type KV struct {
	m map[string]interface{}
	mu *sync.RWMutex
}

func (kv *KV) Get(key string) (interface{}, bool) {
	kv.mu.RLock()
	val, ok := kv.m[key]
	kv.mu.RUnlock()
	return val, ok
}

func (kv *KV) Set(key string, val interface{}) {
	kv.mu.Lock()
	kv.m[key] = val
	kv.mu.Unlock()
}

func WithContext(ctx context.Context) context.Context {
	return context.WithValue(ctx, ctxKey{}, &KV{})
}

func Set(ctx context.Context, key string, val interface{}) {
	kv, ok := ctx.Value(ctxKey{}).(*KV)
	if !ok {
		return
	}
	
	kv.Set(key, val)
}

Then right before policy evaluation, the context would be filled with the KV datastructure:

ctx := logs.WithContext(ctx)

And in custom builtin implementations, one could do the following:

// implements topdown.BuiltinFunc
func builtinRepeat(bctx BuiltinContext, operands []*ast.Term, iter func(*ast.Term) error) error {
    // custom builtin logic...
    logs.Set(bctx.Context, "someKeyThatGetsAppendendToDecisionLogs", "some random value that makes sense to me")
    return iter(ast.StringTerm(strings.Repeat(string(str), count)))
}

Later on, the KV map would be extracted and rendered into an extra key in the decision logs.

from opa.

xico42 commented on May 8, 2024

Does it make sense? I could provide a PR with such implementation, as it seems to have low impact in the rest of the code.

from opa.

srenatus commented on May 8, 2024

Oh I think it makes sense, thank you!

The overall design question kind of remains, though -- how do you provide the extra keys? With what you're proposing, it would be up to the user to create the builtin (simple enough), and use it properly. And that would be the bar -- you'd need to be able to come up with (or find) the code, and to use it properly. As opposed to providing the simple builtin as part of mainline OPA, with the same caveats. So, I feel like this is still a bit of a well-designed pitfall, and would only be one small step further away 😅

from opa.

xico42 commented on May 8, 2024

Even if a simple builtin as part of mainline OPA is desired, it could benefit from this solution anyway.

So it seems that it enables both approaches.

from opa.

tsidebottom commented on May 8, 2024

Alternative Approaches considered (as requested by @srenatus):

Using Regal to ensure return data includes decision result via policy enforcement
Calling http.send a second time specifically to add the desired information to the context so it will be present in the Decision Log
Embed OPA directly into consumer services Allowing consumers to add the data directly into the context via local policies

In the cases of Alternatives 1 and 3, this forces the responsibility to include the data onto the consumer which is not the intent of our use case. In the case of Alternative 2, this was our original solution, but when implemented in a test environment the latency experienced quickly became unacceptable as multiple service calls to OPA would "stack" the queries.

from opa.

srenatus commented on May 8, 2024

Calling http.send a second time specifically to add the desired information to the context so it will be present in the Decision Log

I'm afraid I don't understand this one. Do you call OPA via http.send from another OPA instance? 🤔 Or from the same OPA instance?

from opa.

david-hamilton-bah commented on May 8, 2024

Same instance, although it could also be a different instance if we chose to go that route. Basically in the function data.common.has_read_access_to_participant_by_id(user_id, participant_id) we check if we have this rest service enabled. If disabled, we perform the check via normal rego; if enabled then we POST a request to the common policy, for example v1/data/common/participant/read. This server endpoint is property that could be set to localhost or to a separate OPA instance.

What this does is ends up creating a new request, so we get a separate decision log with the result that we want in it, and the output will affect the result of original request. It makes it easy to find the decision logs because they all will have the same request path. However, the downside is the performance. On a lightly-used instance there is minimal lag, but a heavily-used instance will end up with large delays due to stopping mid-request and putting a new request "in the queue" that it needs to wait for.

from opa.

Add the ability to create decision labels within a policy about opa HOT 28 OPEN

Comments (28)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs