Virtualization Monitoring & Incident Response

Overview of Virtualization

This article is going to focus on monitoring and response for virtualization systems called type 1 hypervisor which means that it interacts directly with the underlying server hardware. These types of virtualization platforms have become standard in many organizations and have been more commonly been exploited by threat actors in during intrusions over the last several years.

These systems have unique considerations for cyber security monitoring and incident response.

Some common Type 1 hypervisor virtualization platforms are:

VMware
KVM "Kernel-Based Virtual Machine"
- ProxMox
XEN
- XCP-ng

This article focuses on cybersecurity detections specific to the VMware virtualization platform. However, many of the network detection strategies discussed are broadly applicable to cybersecurity monitoring across various virtualization platforms.

VMware Overview

VMware was the dominant virtualization platform for years, and effective incident response requires understanding its key components. The terms VMware, vSphere, and vCenter often get used interchangeably.

VMware - The company behind the technology.
vSphere -The platform that includes ESXi (hypervisor) and vCenter (management tool).
ESXi - The Type 1 hypervisor that runs directly on hardware, enabling the creation and management of virtual machines (VMs).
vCenter - The centralized management tool for ESXi hosts and VMs within the vSphere platform.

The VMware Type-1 hypervisor is called ESXi. It serves as the interface between the server hardware and virtual machines, running directly on the system hardware without relying on an operating system. Originally known as ESX (Elastic Sky X), the hypervisor was rebranded as ESXi (Elastic Sky X Integrated). While ESXi has a Linux-like command line interface and supports many standard Linux features and commands (e.g., SSH, user management, and logging), it is not a Linux-based system.

Another critical component is vCenter, VMware's centralized management platform. vCenter acts as the orchestration hub, enabling administrators to manage multiple ESXi servers from a single interface. vCenter operates as an application on a full-featured Linux operating system. While not mandatory, vCenter is often deployed as a virtual machine (VM) hosted on an ESXi server that itself manages.

VMware - ESXi

Understanding access methods for VMware systems is crucial. Individual ESXi servers, the ESXi OS, can be accessed directly via SSH, web interface (via vSphere web client), or via vCenter.

An important security feature of VMware ESXi servers is Lockdown Mode. When enabled, Lockdown Mode restricts direct access to the ESXi server, allowing management only through vCenter. This centralization is logical, as the ability to add or modify virtual machines on individual ESXi servers is powerful and should be controlled through a single orchestrator like vCenter.

However, it’s important to note that not all VMware implementations are configured with lockdown mode, leaving room for potential misconfigurations and abuse. There are numerous documented cases where threat actors, including insiders, have exploited VMware compute resources to set up unauthorized servers, such as game servers, crypto miners, or backdoor servers. Often, this activity goes unnoticed by the organization that owns the servers, leading to significant resource misuse and potential security risks.

VMware - vCenter

Above the ESXi servers lies the vCenter Server, an application designed for centralized management of VMware environments. vCenter itself runs on top of a Linux-based operating system.

From an administrative perspective, the vCenter application is typically accessed through its web interface. However, the underlying Linux OS hosting vCenter can also be accessed directly via SSH, or web (If the Linux OS has a web management utility like Cockpit), providing additional management and troubleshooting capabilities.

Review of ways to manage VMware components

Direct Access to Individual ESXi Servers:

Via SSH.
Through the VMware Web App client interface, which also allows virtual machine (guest) console access.

Centralized Access via vCenter:

vCenter Web App interface.
- Also allows console access to VMs running on ESXi servers.
Configuration management for VMs hosted on ESXi servers.
PowerCLI commands executed against the vCenter environment.
Underlying vCenter OS access

PowerCLI Over HTTP:

Direct interaction with ESXi servers or vCenter via HTTP-based PowerCLI automation.

PowerCLI is a command-line interface tool for managing and automating VMware environments. Built on Microsoft PowerShell, it provides a set of cmdlets specifically designed to interact with VMware products like vSphere, vCenter, and ESXi. PowerCLI enables administrators to perform tasks such as creating, configuring, monitoring, and automating virtual machines, hosts, and network settings efficiently. It's particularly valued for its ability to streamline repetitive tasks and integrate VMware management into larger IT automation workflows.

How Threat Actors Compromise VMware?

Compromise through credential abuse

The primary vector for threat actor access to the VMware environment is through legitimate stolen credentials or weak credentials. For example, VMware credentials stored in a shared password file on platforms like SharePoint can become easy targets. Threat actors may also perform discovery scans to locate the VMware servers or identify them through improperly secured documentation.

Compromise through Vulnerability

If credentials cannot be obtained or cracked, attackers may enumerate the environment to determine if the victim’s VMware version is vulnerable to known CVEs. Unpatched VMware servers are often a risk, as ESXi and vCenter interfaces are typically part of a closed control plane without full internet access. This setup can give organizations a false sense of security about the need for continuous patching of these systems. Additionally, vCenter environments often host critical production workloads, making IT teams hesitant to take them offline for patching, leaving these systems vulnerable for extended periods.

Like any software, VMware products have vulnerabilities that require regular patching. Recently, active exploits targeting vCenter have been observed being exploited, including:

CVE-2024-38812 - A heap-overflow vulnerability in vCenter Server's implementation of the DCERPC protocol.
CVE-2024-38813 - A privilege escalation vulnerability.

For example, in a 2024 exploit, documented in the Microsoft's security blog https://www.microsoft.com/en-us/security/blog/2024/07/29/ransomware-operators-exploit-esxi-hypervisor-vulnerability-for-mass-encryption/, a threat actor exploited VMware CVE-2024-37085. This vulnerability affected VMware ESXi hypervisors joined to an Active Directory (AD) domain, allowing any member of a domain group named “ESX Admins” to gain full administrative access by default. Notably, this group is not a built-in AD group and does not exist by default.

The attackers took advantage of this vulnerability by creating an "ESX Admins" group within the domain and then added compromised user accounts to the group, thereby gaining administrative control over the VMware environment with regular user compromised accounts. They used the following commands to execute their attack:

net group “ESX Admins” /domain /add
net group “ESX Admins” username /domain /add

What are options the for VMware instrumentation and telemetry directly from VMware systems?

Most organizations heavily rely on Endpoint Detection and Response (EDR) solutions for server-based detection and response. However, VMware ESXi servers typically lack EDR because they operate as stripped-down, dedicated systems treated more like appliances. This absence underscores the critical importance of robust security controls on ESXi platforms, as they do not benefit from the active attack-blocking capabilities provided by EDR's on traditional servers.

Detecting attacks on ESXi servers primarily depends on analyzing ESXi logs, correlating data through Security Information and Event Management (SIEM) systems, and leveraging Network Detection and Response (NDR) solutions to monitor traffic to and from these platforms.

In contrast, the vCenter component runs as an application on a full-fledged operating system, such as Linux, which can support EDR. However, some organizations may mistakenly view the entire VMware implementation, including both ESXi servers and vCenter, as a unified system that doesn't accommodate EDR, leaving vCenter without this critical layer of protection. It is vital to recognize that while ESXi servers do not support EDR, the underlying OS hosting vCenter does, and EDR should be implemented there for enhanced security.

The diagram below highlights key telemetry and instrumentation points for VMware environments. While these data collection points enhance detection visibility, getting them all working takes a great deal of system administration effort, and if not tuned correctly can overwhelm your SIEM with excessive or noisy/costly operational logs.

ESXi Servers - These servers support rsyslog, enabling them to forward syslogs directly to a SIEM.
vCenter Application - Application logs can be sent to a SIEM using syslog or retrieved via API integrations.
vCenter Host OS - The host operating system for the vCenter app can be configured to send logs via syslog or utilize a log shipper, such as Elastic Agent (e.g., Filebeat, Auditbeat), to forward logs to a SIEM.
Endpoint Detection and Response (EDR/XDR) - The host OS for the vCenter app can also integrate with EDR/XDR solutions, such as Microsoft Defender, for enhanced security monitoring.

What bad things can be done with the compromised VMware platform?

The theft of compute resources has already been discussed, but threat actors have two additional key objectives depending on their goals:

Ransomware the ESXi Environment

One of the advantages of virtualization is the ease of restoring servers from backups. From a ransomware response perspective, having reliable backups serves as a powerful countermeasure. However, this also motivates ransomware actors to target the virtualization environment itself. By encrypting the ESXi environment, threat actors make it significantly harder for organizations to recover their servers from backups, increasing the likelihood of ransom payment for decryption.

Compromise Hosted Servers (VMs)

Gaining access to the virtualization environment grants threat actors low-level control over the virtual machines (VMs) running within it. Since VMs are represented as files, stealing these files is akin to physically removing a server from the server room. Once the stolen VM files are in the attacker’s possession, they can recreate the servers in their own controlled environment, free from the organization’s security measures. This allows attackers to conduct further attacks on the servers without detection.

Additionally, both the ESXi web interface and the vCenter platform provide console access to hosted servers. If these consoles lack proper security controls or timeouts, attackers can exploit them to access sensitive data. For instance:

Data theft via console access - An open console on a Linux server could allow attackers to copy the contents of /etc/shadow for offline password cracking.
Local password attacks - Attackers could reboot the VM through the console and bypass local passwords. While noisier, as it could disrupt production services, this tactic might still be effective in certain scenarios.

Attackers have also been known to create rogue virtual machines (VMs) as a method of maintaining persistent remote access. In a notable case, a sophisticated adversary transferred VM files to compromised ESXi servers using SFTP. They then executed these files directly on the ESXi servers using /bin/vms, effectively bypassing detection by vCenter or the ESXi web interface.

In another instance, threat actors created a rogue VM by duplicating the virtual machine files of a Domain Controller (DC) running on the same ESXi server. By operating the duplicate DC VM, they were able to take an undetected snapshot of the server. System admins would have visibility to snapshots done on a vCenter managed VM. The threat actors then exfiltrated the DC server snapshot back to their lab for offline attacks. This allowed the attackers to perform extensive Domain Controller attacks, like DCsync, and compromise numerous user accounts without triggering alerts within the victim organization.

ESXi Detections

If a threat actor gains access to a VMware platform's web interface (e.g., vCenter or vSphere client directly on an ESXi server) and can access open virtual machine consoles, mitigation options are limited. Organizations must prioritize foundational controls to secure authentication mechanisms for these web interfaces and enforce security measures, such as session timeouts and logout policies, on VM consoles.

Beyond these basic controls, detecting malicious activity in VMware environments is possible but depends heavily on the security architecture, instrumentation, and telemetry available within the virtualization deployment. Key detection opportunities include:

Control Plane Analysis & Detections - Monitoring and analyzing activity to and from the VMware control plane.
Log Analysis & Detections - Leveraging logs for suspicious activity or unauthorized access.
Configuration Analysis & Detections - Identifying misconfigurations or unauthorized changes.
Software Analysis & Detections - Analyzing software and patch levels for vulnerabilities or anomalies.

Control Plane Analysis & Detections

A dedicated management plane helps secure the VMware environment by isolating network management traffic, CLI or API traffic, and third-party software traffic to and from ESXi and vCenter servers from other network traffic. This management network should only be accessible to system, network, and security administrators, ideally through secure methods like jump boxes or VPNs.

Combining this architecture with host-based security controls, such as firewalls that restrict SSH and web access to specific management hosts, creates a robust defense against system compromise. Additionally, monitoring logs for remote access and attempted connections further enhances the security of the dedicated management plane.

Network Detection and Response (NDR) tools, such as Zeek, play a critical role in monitoring and detecting threats within sensitive VMware systems. Outbound traffic from these systems should be tightly controlled by firewalls, as Internet access is typically restricted to vendor updates and essential services like NTP. The diagram below illustrates this outbound monitoring point, emphasizing a Zeek-based NDR strategy.

A well-architected, segmented network further enables additional internal NDR monitoring points. The diagram also highlights three key anomaly detection scenarios for unauthorized access in VMware environments:

Access from Non-Control Plane Systems - A threat actor attempting to access the VMware environment from an unauthorized system.
Lateral Movement to Control Plane Systems - A threat actor attempting to pivot to a control plane system.
Unauthorized Access via Control Plane Systems - A threat actor using a control plane system that is not an authorized jump host to access the VMware environment.

Log Analysis & Detections

Two critical log files to analyze on ESXi servers are auth.log and shell.log. The auth.log file tracks authentication events, such as SSH access, making it essential for monitoring and analyzing suspicious connection attempts or unauthorized access to the system.

ESXi Shell login via interactive SSH appears similar to:

YYYY-MM-DD HH:MM:SS sshd[12345]: Connection from 10.100.50.5 port 22
YYYY-MM-DD HH:MM:SS sshd[12345]: Accepted keyboard-interactive/pam for root from 10.100.50.5 port 22 ssh2
YYYY-MM-DD HH:MM:SS sshd[55787]: Session opened for 'root' on /dev/char/pty/t0
YYYY-MM-DD HH:MM:SS sshd[12345]: Session closed for 'root' on /dev/char/pty/t0
YYYY-MM-DD HH:MM:SS sshd[12345]: Session closed for 'root' 2

ESXi Shell login via SSH with public key appears similar to:

YYYY-MM-DD HH:MM:SS sshd[12345]: Connection from 10.100.50.5 port 22
YYYY-MM-DD HH:MM:SS sshd[12345]: Accepted publickey for root from 10.100.50.5 port 22 ssh2
YYYY-MM-DD HH:MM:SS sshd[55787]: Session opened for 'root' on /dev/char/pty/t0
YYYY-MM-DD HH:MM:SS sshd[12345]: Session closed for 'root' on /dev/char/pty/t0
YYYY-MM-DD HH:MM:SS sshd[12345]: Session closed for 'root'

When analyzing the shell.log, focus on the following:

Unusual or attempted commands.
Interactive shell sessions starting or stopping.
Gaps or inconsistencies in shell history.

It's also crucial to identify the shell log ID associated with a user login. This ID is essential for correlating actions taken in the shell with the suspicious login. The shell log ID is typically shown as a numerical value in the login message, often after the "login" keyword. For example, in the message login[55787], the log ID is 55787 for a root user console login.

YYYY-MM-DD HH:MM:SS shell[55787]: Interactive shell session started
YYYY-MM-DD HH:MM:SS shell[55787]: cd /var/log
YYYY-MM-DD HH:MM:SS shell[55787]: ls
YYYY-MM-DD HH:MM:SS shell[55787]: vmware -v
YYYY-MM-DD HH:MM:SS shell[55787]: exit

Threat actors frequently exploit the same CLI commands that administrators rely on, underscoring the critical importance of monitoring and analyzing CLI activity. ESXi systems include a command-line interface called esxcli, which provides access to many administrative functions. Following the concept of "Living off the Land" (LOL)—where attackers abuse built-in system tools to compromise systems, a similar approach exists for ESXi, documented in the Living off the Land ESXi (LOLESXi) project. https://lolesxi-project.github.io/LOLESXi/lolesxi/Binaries/esxcli/

For example, attackers can manipulate the ESXi firewall using the esxcli tool to ensure persistent remote access. For example, using esxcli to manipulate the ESXi host-based firewall settings:

esxcli network firewall set --enabled false

Additionally, a ransomware group was observed abusing both esxcli and vim-cmd (another ESXi CLI tool) to terminate all virtual machines and delete their snapshots before encrypting the system. These examples highlight how attackers can leverage native ESXi tools for malicious purposes, making robust log analysis and proactive monitoring essential for securing the VMware environment.

esxcli vm process list
vim-cmd vmsvc/getallvms
esxcli vm process kill -t force -w %llu
vim-cmd vmsvc/snapshot.removeall %llu > /dev/null 2>&1

Configuration Analysis & Detections

When threat actors gain access to an ESXi system, the most commonly exploited configurations include enabling SSH (if it is disabled) and manipulating firewall rules to allow unauthorized access. Additional malicious activities often involve disabling logging and establishing persistence using ESXi's implementation of cron. While ESXi’s cron functionality is similar to that of most Linux distributions, it has unique differences that can be exploited.

To mitigate these risks, it is essential to harden ESXi configurations according to standards such as CIS Benchmarks or STIGs. Equally important is the continuous monitoring of these secure configuration settings to detect and respond to unauthorized changes.

Software Analysis & Detections

Unlike full-fledged operating systems, ESXi benefits from being a lightweight hypervisor without the overhead of hundreds of software packages that could potentially be exploited. However, critical system files in the /etc directory must still be monitored to ensure they remain untampered. Advanced threat actors have also compromised ESXi systems by exploiting vSphere Installation Bundles (VIBs).

A VIB is a package of files used to create or maintain an ESXi image, allowing administrators to control ESXi behavior, such as setting startup tasks, configuring firewall rules, or running binaries at startup. A VIB package typically includes:

Payload Files - An archive containing the files to be installed on the host.
XML Descriptor - Contains information about VIB requirements, dependencies, compatibility, payload details, name, and install date.
Signature File - Verifies the origin of the VIB and its trust level.

In a notable attack, an advanced threat actor, suspected to have ties to China, used malicious VIBs to deliver the VirtualPita and VirtualPie malware to ESXi servers.

Preventing and detecting unauthorized or unsigned VIBs is a critical defense mechanism for protecting ESXi systems. Useful commands include:

esxcli software vib list: Displays installed VIBs and their acceptance levels, helping identify discrepancies or unauthorized installations.
esxcli software vib signature verify: Verifies the signatures of installed VIB packages and provides detailed information such as:
- VIB Name
- Version
- Vendor
- Acceptance Level
- Signature Verification Result