In simple terms, Zeek sensors capture traffic, generate protocol-specific log files for the captured session traffic, and can export these log files to external logging systems or flat file storage.
One of Zeek's most powerful features is its ability to parse protocols into specific log files, allowing custom scripts to run further analysis and provide deeper insights into the data. Additionally, it offers options to compare data values against Cyber Threat Intelligence.
Zeek, and Corelight sensors specifically, divide the process of handling and analyzing data into four distinct areas, as illustrated in the diagram:
Inputs
Packages
Enrichments
Outputs
Much of the advanced analysis and detection in Zeek occurs through packages. These packages are collections of targeted detections, inferences, and data transformations designed to provide deeper visibility into adversary activities. The main categories of collections in Corelight NDR include:
Core
Encrypted Traffic
C2 (Command and Control)
Entity
Analyzers
ICS/OT (Industrial Control Systems/Operational Technology)
Custom
Below is a high-level overview of the core packages and the detections, inferences, and data transformations they provide.
Core Packages
Bitcoin Detector
Corelight provides detection for cryptomining traffic, which could indicate cryptojacking within your organization. Cryptojacking occurs when an organization's servers are hacked to perform cryptomining operations. Cryptomining requires significant compute resources, which are costly. Consequently, hackers often compromise organizational servers, programming them to perform cryptomining operations without the organization's knowledge, leaving them to foot the bill. Cryptomining operations use specific protocols to join mining pools. Initially, these protocols included getwork and getblocktemplate, with stratum v2 now being the most popular. Observing these protocols on a network can suggest that servers are performing cryptomining. The Zeek cryptomining package can detect mining traffic for cryptocurrencies like Bitcoin and Litecoin over TCP or HTTP. However, it cannot detect the Bitcoin P2P protocol, which differs from mining protocols.
Community ID
Community ID, also known as flow hashing, is a method used by systems observing network traffic to calculate a hash value based on key values recorded within that traffic. These key values typically include the source and destination IP addresses, as well as the source and destination port numbers. While this method itself does not directly serve as a network detection mechanism, it facilitates easier pivoting between different cybersecurity systems to find session data for a specific session.
The diagram below illustrates the usefulness of Community ID. In this hypothetical example, an analyst wants to pivot between a Zeek log and a firewall log for the same session to determine the firewall policy the session traversed. The firewall policy information is typically the only additional data provided by the firewall session log that is not already available in the Zeek data.
Even if the firewall or other cyber sensors do not support flow hashing as part of their log export, some logging systems, such as Elastic, can calculate the Community ID during data ingestion. This capability enables analysts to correlate and analyze session data across different logs more effectively.
The GitHub page for the Community ID specification lists the current documented systems that support Community ID. https://github.com/corelight/community-id-spec
Data reduction
When conducting network traffic analysis, it is often necessary to reduce the number of logs collected and exported to manage data volume effectively. Networks frequently contain noisy systems that generate traffic with little to no analytical value. Some mechanisms to reduce network traffic:
BPF Filters on Sensor Ingest: Apply Berkeley Packet Filter (BPF) filters to sensor data ingestion to exclude irrelevant traffic.
Selective Log Exporting: Avoid exporting logs from certain types of files or systems, such as repetitive syslog or SNMP logs, that offer minimal analysis value.
Exporting Log Values Instead of Log values + Keys: Export only log values without log keys, which requires the logging system to understand log keys. This feature is only available on Splunk.
Data Reduction: Implement data reduction techniques to limit certain protocol values in log files, thereby creating more space. The resultant Zeek log file will be tagged with 'red' to indicate reduction. However, be cautious, as enabling data reduction can sometimes generate more data by creating both the original full Zeek log and an additional reduced log. To fully benefit from data reduction, configure the sensor to stop exporting the original full protocol log file, ensuring only the reduced log file is exported.
HASSH
The Zeek HASSH package, which stands for Hashing SSH, is designed to hash specific characteristics of SSH sessions. When an SSH session is established, the client and server exchange a series of plain-text messages detailing the key exchange methods, encryption levels, message authentication algorithms, and compression methods they support. Once they agree on these settings, they set up the encrypted session.
Different SSH clients and servers often propose specific key exchange methods and encryption algorithms, and the order of these proposals can vary between different SSH software packages. By hashing the values observed during this negotiation, it is possible to create a somewhat unique signature for the SSH software being used.
The image below, from the Salesforce article "Open Sourcing HASSH" (https://engineering.salesforce.com/open-sourcing-hassh-abed3ae5044c/), displays the calculation of HASSH and HASSH Server values for an SSH session. This is overlaid against an example from the Zeek ssh.log file, as shown in the "Book of Zeek" (https://docs.zeek.org/en/master/logs/ssh.html), highlighting the additional metadata fields added for HASSH and HASSH Server in the log file.
HTTP Post Bodies
The HTTP POST method is crucial for client-to-server communications on the web. In the example provided, the client (left) submits data to the web server (right).
Zeek captures many data fields associated with the HTTP POST method, which aids in network traffic analysis. However, by default, the body of HTTP POST requests is not captured due to its potentially large size. Enabling the HTTP POST bodies package in Corelight adds a field in the http.log file to include the POST body data, though this data is still size-limited to manage volume. Analyzing HTTP POST bodies provides additional information for detections and threat hunting, enhancing the ability to characterize and investigate network traffic.
Darknet
The term 'Darknet' can be quite broad in its meaning. Typically, it refers to IP address spaces hidden behind networks like the TOR network. However, in this context, the Darknet package aims to offer a different kind of detection.
Network engineers often allocate IP address space in large blocks for easier administration and expansion. However, organizations usually only utilize a small portion of the allocated IP addresses. The 'Darknet' section of the network refers to the routed IP address space where there are no active servers or services, meaning there should be no traffic. If any traffic is detected from these darknet servers, it's likely to be malicious. More commonly, it suggests a lack of thorough change management within your organization. This package can also be used to detect usage of defined Honeynet address space.
JA3
The JA3 package is named after its developers, three engineers from Salesforce, all named John with last names starting with the letter A. Similar to what HASSH does for SSH, JA3 creates unique fingerprints for HTTPS encryption negotiations. When a client negotiates encryption settings, these settings can be hashed to produce a distinctive fingerprint that identifies the way the client negotiates encryption.
The diagram below illustrates the extra metadata fields, JA3 and JA3S, integrated into the Zeek ssl.log, focusing on the encryption negotiation during a session. The top right displays the client initiating the encrypted session by proposing ciphers and encryption settings in clear text. Similarly, the bottom right depicts some server responses also in clear text. This unencrypted data can be aggregated and hashed to generate a distinctive signature for the software's encryption negotiation process.
One challenge with JA3 detection is that hackers can easily configure their software to match the encryption settings of legitimate software, leading to false positives. Additionally, many JA3 hashes correspond to legitimate software, which can flood threat indicators with benign matches. To address this, JA3S was developed to hash server-side encryption negotiation. While JA3 and JA3S individually have many legitimate matches, combining both can serve as a powerful detection technique to identify malicious communications on a network.
The Github site, JA3 - A method for profiling SSL/TLS Clients https://github.com/salesforce/ja3 , has a list of documented systems that support calculating JA3 from TLS sessions.
Log4shell
Corelight offers a package designed to detect network exploitation of the Log4shell vulnerability. https://github.com/corelight/cve-2021-44228
Unlike other Log4shell detection methods, such as IPS signatures or server vulnerability scans, this package focuses on analyzing all HTTP headers for JNDI strings related to Log4shell.
Long Connections
Zeek has a package designed to detect abnormally long connections, which can indicate potential issues. This package generates a separate log called conn_long and can also trigger a notice. More information about this package can be found here. https://github.com/corelight/zeek-long-connections
The top diagram illustrates the initiation of a TCP session, starting with the initial SYN message, followed by SYN/ACK and ACK, opening the session for communication. When the session is closed, a FIN message is sent. Using Zeek logs, sessions can be monitored to determine if they remain open for unusually long periods.
Mitre BZAR
BZAR stands for Bro/Zeek ATTACK-based Analytics and Reporting. It is a set of Bro/Zeek scripts designed to analyze SMB and RPC protocols for suspicious behaviors. These protocols can be used for various malicious activities, including remote command execution, lateral movement, discovery, defense evasion, persistence, and credential access. A common example is how ransomware actors use the SYSVOL share to distribute malware, schedule malicious tasks etc, because it is connected to everything., More information about the BZAR project can be found https://github.com/mitre-attack/bzar
Another important aspect of using NDR to examine your network for lateral movement is the addition of internal network traffic collection, commonly known as 'East-West Traffic'. While this is achievable, it requires careful consideration of traffic capture points, blind spots, and traffic volumes. Additionally, you must address the increase in false positives related to suspicious network traffic patterns.
Mitre HTTP/2
HTTP/2, defined in RFC 7540 https://datatracker.ietf.org/doc/html/rfc7540 , was designed to enhance performance over HTTP/1 by incorporating features like prioritization in page loading, header compression, streaming, server push, and improved security. Although HTTP/2 supports encryption, not all systems implement it. The term 'H2c' refers to HTTP/2 implemented in clear text.
Encryption support in HTTP/2 is crucial for the HTTP/2 Zeek analyzer, as it cannot function with encrypted HTTP/2 (H2) unless there is SSL/TLS interception capability to decrypt the traffic for Zeek. Additionally, HTTP/2 changes the transport mechanism by switching to a binary protocol.
The successor to HTTP/2 is HTTP/3, defined in RFC 9114 https://datatracker.ietf.org/doc/html/rfc9114
SHA 256 File hashes
Zeek can extract entire files from certain types of network traffic, carving out files from specific protocols. It can also calculate the file hash and include it as metadata in the Zeek log file. These file hashes can then be analyzed against cyber threat intelligence databases and/or the Zeek intel framework.
Shunting Support
Certain models of Corelight appliances also offer shunting support, which optimizes performance and zeek traffic collection by focusing on the most relevant network traffic flows for the organization. These asymmetric traffic collection flows are sometimes referred to as 'Elephant Flows'. For example, TLS traffic shunting can be configured to occur after the TLS handshake. This approach captures all the context of the encrypted session without wasting resources on the encrypted data that cannot be analyzed.
Simple Scan
In cybersecurity monitoring, it's common for both EDR and NDR sensors to generate alerts from vulnerability scanners like Nessus, Rapid7, and Qualys operating on your network. This package reduces the creation of protocol log files for sessions identified as being associated with these scanners.
SMTP Links
This package allows for pulling URLs out of SMTP files. These links as urls can then be analyzed against cyber threat intelligence databases and/or the Zeek intel framework.
Traceroute
This package monitors ICMP traceroute traffic and logs it in a special Zeek file called traceroute.log. Traceroute operates by sending repeated ICMP messages with incrementing TTL values. Each hop decreases the TTL value, and when it reaches zero, a "time exceeded" message is sent, thereby identifying each hop in the network path. The package identifies traceroute traffic by detecting ICMP traffic with numerous "time exceeded" messages directed at hosts that sent low TTL packets.
Name Cache
When analysts examine network data involving IP addresses communicating with each other, the first question often asked is, "Is this normal?" A key part of determining normalcy is understanding what the communicating IP addresses represent. If the organization uses descriptive DNS and hostnames, these names can greatly aid in identifying the systems and assessing if the traffic pattern is normal.
When enabled, the Zeek Name Cache package monitors forward and reverse DNS lookups to establish address-name mappings. This is illustrated in the top diagram, where a lookup resolves the IP address z.z.z.z to a hostname. The DNS resolution returns with the name 'acmefileshare' This allows the conn.log entry for that IP address to include both the name and the source of the name, in this case, a DNS A record (IPv4 query).
VLANs
This script adds fields to the conn.log file to capture both the outer and inner VLAN tags of an Ethernet frame.
MAC Addresses
This package adds the MAC address to the conn.log file as the orig_l2_addr and resp_l2_addr fields. An article from Corelight discusses an intriguing case from Blackhat 2023, where a researcher investigated suspicious network traffic. The investigator used the suspicious MAC address recorded in the conn.log to identify both the communication protocol and the systems involved. https://corelight.com/blog/open-ndr-platform-for-netops
Windows Version Detector
Windows has a CertVerifyCRLRevocation function that checks a certificate revocation list (CRL) to determine whether a subject's certificate has been revoked. This function also reveals the version of the Windows Crypt32.dll being used, which can help identify older versions of Windows.