Hey guys, thanks for releasing the project as open-source, it's great to see this proj

With #VE, there are two EPTs: The main EPT, where the guest us

Question about the agents' attack surface about hvmi HOT 4 CLOSED

bitdefender commented on August 23, 2024

Question about the agents' attack surface

from hvmi.

Comments (4)

vlutas commented on August 23, 2024 1

Hello, and thanks for the question! 👍

To begin with, a small recap - there are two main types of agents:

process agents, which are regular processes injected inside the VM
special agents, which are used for performance optimizations (these are generally modules that run in kernel)

In addition to these, there is the agent loader, which is a pseudo-agent used to load other agents.

While browsing the code I noticed that a good chunk of operations rely on injected in-guest "agents"

Actually, HVMI doesn't need any kind of agent to function properly. Agents are used only for optimizations (such as the #VE filtering agent) or to simply gather information which would otherwise be unavailable and which is not useful to HVMI directly (for example, logs). HVMI is functionally identical with or without the agents, and the agents are used only to provide information which is not used by HVMI (the logs are a great example). In fact, anyone can create their own agent that can be injected inside the guest - for example, you could create an application that could get injected inside the guest, to gather some data that is specific to your own use-case! You could even inject existing applications! Also, the injected process agents don't necessary have to communicate with HVMI - HVMI just injects an executable, and runs it, and that's pretty much the end of story.

I was wondering what kind of protections are in-place for the agents while they are being invoked by the hypervisor? What happens if a malicious agent is issuing the same hypercalls while introcore is also trying to inject? Is that multi-vCPU safe, are there no potential race-conditions? Or is the assumption that agents are only safe to deploy while the VM is uncompromised?

Our use-case is one-way only - HVMI injects the agent, and the agent is mostly on its own from there on. The agent can use a very limited set of hypercalls to ask for information from HVMI, and HVMI will ignore VMCALLs that don't originate from a known agent. Of course, no one prevents a potential attacker from jumping in the middle of an agent to invoke a hypercall, and that is why the hypercall interface is very limited, and subject to significant scrutiny in HVMI.
In addition, there is no efficient way for HVMI to ensure that the agent will indeed run (the attacker has a million ways to interfere with code inside the guest), and this is another reason why agents are optional and HVMI doesn't rely on them. HVMI will tell you if the agent has been started successfully or not, but it will not guarantee that it will run flawlessly without interference.
The only exception here is the #VE filtering agent - this agent is injected in an atomic manner, and it ensures that if the injection is successful, attackers won't be able to disable #VE filtering (the #VE agent is isolated in it's own physical address space). If injection of the #VE agent is not successful, monitoring will work via EPT as usual. Even in this case you can see that HVMI will work without the agent (albeit perhaps with a higher performance overhead).

Since one feature of the agent is to collect logs, which to me sound like something you would want to do after a compromise happened. I wonder what happens in that scenario when the VM is already compromised by a introcore-aware adversary. If the agent is executing in the same context now as the adversary, couldn't it mess with those logs while the agent is trying to gather them? I've read through https://hvmi.readthedocs.io/en/latest/chapters/7-agents-architecture.html but it's still unclear to me.

Indeed, HVMI can't ensure that any agent will run. For example, if we inject the log collector process, no one guarantees that the agent won't crash, or that someone won't terminate it prematurely. This, however, doesn't affect HVMI from a functional perspective, and whether the agent runs or not, HVMI will work the same.
This particular use-case with a VM that is already compromised is, in fact, the reason we created agent injection in the first place - if HVMI detects an attack inside the VM, it can inject a remediation tool (for example, legacy antivirus software) capable of reverting (to certain extend) the modifications made to the system by the malware.

Hope this is helpful! :)

Cheers,
Andrei.

from hvmi.

tklengyel commented on August 23, 2024

Thanks, that answered my question! :)

The only exception here is the #VE filtering agent - this agent is injected in an atomic manner, and it ensures that if the injection is successful, attackers won't be able to disable #VE filtering (the #VE agent is isolated in it's own physical address space). If injection of the #VE agent is not successful, monitoring will work via EPT as usual. Even in this case you can see that HVMI will work without the agent (albeit perhaps with a higher performance overhead).

To follow up on that, what prevents an attacker from issuing VMFUNC? I understand that the agent would be running in a protected EPT that's X only in the regular view, but AFAICT that doesn't prevent someone else from executing the instruction and interfering with the agent from some other place.

from hvmi.

vlutas commented on August 23, 2024

With #VE, there are two EPTs:

The main EPT, where the guest usually runs, and normal EPT permissions are established (possibly restricted by regular HVMI policies); the #VE agent is inaccessible inside this EPT, except for the VMFUNC trampoline page, which is X (executable)
The protected EPT, where the agent runs; the guest is normally RW (there are no X pages) in this EPT, and the agent has normal permissions (RW, X depending in the section); this view is entered when a #VE takes place, via the VMFUNC in the trampoline page; even the stack used by the #VE agent is different than that of the regular OS, for obvious reasons

To get back to your question, nobody really prevents the guest from arbitrary executing VMFUNC to switch EPTs, and one of the following outcomes would happen:

The guest tries to VMFUNC into an invalid EPT (only two are valid) - depends on how the HV handles this event, it could lead to a guest crash or a fault injected inside the VM, or maybe other behavior I'm not imagining right now
The guest tries to VMFUNC into the protected EPT using a VMFUNC instruction located at an arbitrary location - the switch would be successful, but the instruction immediately following VMFUNC would cause an EPT violation, because only the agent is X in the protected EPT; this would result in a DoS, most likely
The guest tries to VMFUNC into the protected EPT by maliciously branching to the VMFUNC inside the trampoline - execution would either continue normally, as if a #VE took place, or the guest would crash, depending on where exactly the attacker/malware would branch inside the VMFUNC trampoline

The main purpose of this is to prohibit unknown code from running together with the #VE agent. This, however, has the disadvantage that the #VE agent is completely self-contained, and it has absolutely no dependency whatsoever - once inside the protected view, there will be no API calls outside the agent. :)

Cheers,
Andrei.

from hvmi.

tklengyel commented on August 23, 2024

Thank you, appreciate the in-depth answers! Makes sense now.

from hvmi.

Question about the agents' attack surface about hvmi HOT 4 CLOSED

Comments (4)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs