Zeek (formerly known as Bro) is an open-source project conceived by Vern Paxson. Its roots trace back to its original incarnation as "Bro." For an in-depth exploration of Bro's genesis and concepts, the seminal paper authored by Vern Paxson can be accessed here https://www.usenix.org/legacy/publications/library/proceedings/sec98/full_papers/paxson/paxson.pdf
Understanding Zeek's Functionality
Zeek operates by capturing network traffic through a Zeek sensor equipped with protocol parsers. These parsers dissect the data, organizing it into protocol-specific log files, each associated with a Unique ID (UID) representing the network session.
Zeek boasts an extensive array of protocol log files, within these log files you can find key protocol fields.
There are handy cheat sheets that can be found here on github here:
They can also be downloaded from Corelight here: https://corelight.com/about-zeek/zeek-data
They can also be found under the open source Zeek documentation here:
Why is Zeek better than Netflow, Firewall logs, PCAP?
One of the 1st questions to ask yourself is, why is Zeek NSM data anymore useful than the other major types of NSM log data: Netflow, Firewall logs, PCAP.
Zeek vs. Netflow: Unveiling the Distinctions
Netflow provides a high-level view of session metadata, including information such as source and destination IP addresses and port numbers. Zeek's conn.log encompasses a substantial portion of the data found in Netflow. However, Zeek goes beyond Netflow by capturing additional session-related data pertaining to various protocols involved in the session in separate protocol log files. These separate protocol log files are interconnected by a unique key value known as a UID, facilitating rapid searches between different protocol data within the same session.
Below is a comparison Netflow data and Zeek's conn.log. The primary data elements present in Netflow but absent from Zeek's conn.log include:
Input and Output Network Device Interfaces - Netflow captures information about the specific network interfaces through which data enters and exits the network.
Type of Service (ToS) Values - Netflow provides a field for the Type of Service value.
Routing Information - Netflow includes routing information that aids in understanding the path data takes through the network.
Netflow's Original Purpose and Its Applicability in Cybersecurity
Netflow wasn't initially conceived for cybersecurity; its primary purpose was to facilitate system accounting on routers. This function aimed to answer queries that service providers needed to bill customers based on their data usage. Nevertheless, the widespread availability of the ability to export Netflow data by default on most networking devices, including routers, switches, and firewalls, makes it a convenient option for basic network data collection.
If you're seeking a straightforward means to gather network data and you cannot direct traffic to a dedicated Zeek sensor, Netflow could be a suitable choice. However, it's important to exercise caution when dealing with Netflow logs. The sheer volume of sessions coursing through a network may render the generation of a Netflow record per session impractical. Consequently, many Netflow implementations resort to generating Netflow records on a sampling basis, such as 1 in every 50 sessions. This sampling approach aims to strike a balance between data volume and practicality.
Comparing Zeek with Firewall Logs
Firewall logs share a similar advantage to Netflow in that the majority of an organization's inbound and outbound traffic passes through the firewall, and most firewalls possess some level of logging capability for the sessions they handle. These firewall logs, sometimes referred to as firewall "traffic logs" or "Data Plane logs", offer additional insights compared to Netflow and Zeek, particularly concerning firewall policy zones, firewall rules, and the actions taken by the firewall for each session.
It's essential to recognize a few key points about firewall logs:
There is a distinction between Firewall traffic logs and firewall alerting Logs. Firewall traffic logs should not be confused with firewall alerting logs, such as those generated for threat detection like Next Generation Firewall (NGFW) Intrusion Prevention System (IPS) alerts.
When exporting firewall logs to an external SIEM or logging system, they often don't undergo sampling, which is a common practice with Netflow logs. Consequently, the consumption of firewall traffic logs in a SIEM or logging system can become prohibitively expensive and yield a lower cost-to-value ratio.
In firewall logging, there is a concept of having separate logs for the creation and closure of a firewall session, even for the same firewall session. Engineers often choose to log only the firewall session closure to simplify log collection. This is because the firewall session closure log contains all the data found in the firewall session creation log, along with additional information about when the session concluded and the amount of data transferred during the session. However, in some firewalls, you won't find records of blocked sessions when you log only the firewall session closure information. This occurs because when a firewall session is blocked, there is no subsequent session to close. Nevertheless, this is typically not a significant concern, as having logs for every action your firewall blocks may not provide substantial value in practice.
Zeek compared to Pcap
To gather PCAPs for session traffic, similar to Zeek, it's necessary to replicate the session's traffic flow through a mirrored SPAN port, tap, or traffic broker. While PCAP files contain all the session data, Zeek's logs are structured in JSON formatted and only record key protocol fields.
Interpreting Zeek data
The Zeek sensor generates log records for the relevant protocols within the session, but for analysis purposes, these logs need to be exported to another location.
Zeek logs are typically exported in JSON format to a SIEM or logging system.
For instance, the example on the left, taken from the Zeek open-source documentation (https://docs.zeek.org/en/master/logs/dns.html), displays the dns.log file in JSON format. JSON tools such as jq (https://stedolan.github.io/jq/) can be employed to work with this type of data. Many SIEMs, including Splunk and Humio (now Crowdstrike Falcon LogScale), come equipped with built-in parsers to handle Zeek logs.
For those searching Zeek log files outside of a SIEM, Zeek offers a tool called "zeek-cut" (formerly known as "bro-cut"), which simplifies manipulation of Zeek log data. Analysts can then utilize Linux-based tools like "cat" and "awk" in conjunction with "zeek-cut" to query and view the data. In the example below, the conn.log file is being viewed using the Linux tool "cat," and data from Zeek fields such as "ts," "id.orig_h," "id.resp_h," "id.resp_p," "proto," and "conn_state" are extracted using "zeek-cut." The output can further undergo manipulation by extracting data from other Zeek fields and leveraging additional tools like "grep," "awk," and "sort."
cat conn.log | zeek-cut -d ts id.orig_h id.resp_h id.resp_p proto conn_state
North/South traffic and East/West traffic
When analyzing NSM data within your environment, it's crucial to factor in the distinction between North/South traffic and East/West traffic.
North/South traffic refers to the flow of data to and from the Internet.
East/West traffic encompasses inter-organizational traffic, such as data traveling to an organization's servers or between different geographical locations.
This differentiation holds significant importance in terms of sensor placement and network traffic analysis. Understanding the locations for network traffic collection serves as a critical element in sensor deployments, ensuring that the NSM system effectively covers the desired network traffic analysis areas.
Analysis and inferences from zeek data
Searching Zeek data is indeed a powerful capability, but its true in Zeek power is Zeek scripting. Zeek boasts an adaptable framework that enables the creation of scripts to perform sophisticated analysis on the protocol fields collected in the Zeek logs. For instance, engineers at Salesforce developed a framework named JA3, which calculates a hash of client-side ciphers derived from Zeek data concerning encrypted sessions (e.g., TLS). You can find more details about this project here: https://github.com/salesforce/ja3/blob/master/zeek/ja3.zeek. In Zeek, the log data analyzed for ja3 analysis s typically stored within the ssl.log file.
Corelight Zeek sensors
Zeek is open source and as such free to deploy as long as you have the system administration chops to do so. Some of the experts in Bro/Zeek started a company called Corelight https://corelight.com/ that pre-packages the zeek sensors and adds some added value Zeek scripting packages to their customers. An example of this is the Corelight package for C2 (Command & Control) detection https://corelight.com/products/collections/command-and-control this package analyzes the data available in Zeek log files, like the http.log and dns.log, took look for indicators of C2 in your network traffic.
Rita & AC Hunter
Another project related to analyzing zeek data is from an organization called Active Countermeasures https://www.activecountermeasures.com/
Active Countermeasures released an open source software called Rita, https://github.com/activecm/rita (They also have a commercial version of Rita called AC Hunter https://www.activecountermeasures.com/ac-hunter/ ) This software performs statistical analysis of the zeek log data to find consistencies in connections. Consistencies in connections could relate to the consistency of the source and destination of the connection, the time of the connection, the duration of the connection and the amount of data transferred in the connection. Connections with consistent patterns indicate beacons which could be infected systems in your environment communicating to Command & Control servers.