Understanding SIEM Licensing
There are several common SIEM licensing models, including:
Events Per Second (EPS)
Gigabytes per day (GBpd)
Compute Resource Model ('Consumption Based Billing')
Events Per Second (EPS) & Gigabytes per day (GBpd) licensing
The first diagram illustrates the EPS model. Ovals marked with 'E' represent events, and EPS signifies the number of events ingested into the SIEM per second. In the provided example, there are 5 events in a second, resulting in an SIEM EPS of 5. SIEM vendors typically impose a licensing fee based on EPS capacity. For instance, if the SIEM is licensed for 6 EPS, it would surpass its EPS license limit with the illustrated 5 events. Conversely, if it were licensed for 4 EPS, it would remain under its license limit.
The second diagram illustrates the Gigabytes per day model. In this context, each event in the example has a specific size, with the assumption that each event is 100 bytes. Within the same second, the SIEM ingested 500 bytes of data by processing those 5 event logs. However, SIEM vendors do not typically license based on data ingestion per second; instead, they license based on data ingestion over a defined time period, such as a day.
There are 86,400 seconds in a day. So, if those same 5 events were ingested into the SIEM every second throughout the day, the SIEM would process 43,200,000 bytes in a day. To convert total bytes per day to Gigabytes per day, one can achieve this by dividing by 1024 three times.
SIEM Costs unrelated to licensing - Hardware & System Administration Costs (SIEM People Cost)
Before delving into the 'Compute Resource' licensing model, it's crucial to grasp the factors driving this model. The organization needs to purchase the hardware for the SIEM, as well as invest a significant amount of man-hours in system administration. Here are some of the SIEM administration costs:
Initial SIEM setup for servers and applications
Ongoing upgrades and patching for SIEM servers and applications
Programming log sources and log ingestion during initial SIEM setup (Still needed in SaaS based SIEMs)
Configuring rules, dashboards, and reports during initial SIEM setup Still needed in SaaS based SIEMs)
Continual programming of log sources and log ingestion for the SIEM Still needed in SaaS based SIEMs)
Continuous development and tuning of rules, dashboards, and reports for the SIEM Still needed in SaaS based SIEMs)
Given the significant administration costs associated with SIEM installation and setup (including servers and applications), as well as ongoing upgrades and patching, numerous organizations are choosing Software-as-a-Service (SaaS)-based SIEMs in the cloud.
Compute Resources Licensing Model (Consumption-based billing)
Consumption-based billing differs from subscription-based billing, which might involve a fixed daily (measured monthly) allowance of X Gbps or X EPS. Consumption-based billing offers distinct advantages: you only pay for what you actively use, and expanding doesn't necessitate purchasing new licenses. Resources, including RAM, CPU, and Disk, correspond to your usage and contribute to consumption. The cost of storing this log data hinges on the volume stored and the underlying computing resources employed for storage. These resources directly impact query performance for log data.
Consumption compromises the compute resources needed: To have the logs searchable, measured in (GB per hour), support data transfers (GB transferred), and support log storage (GB per month). Most of the SaaS SIEM vendors have developed distinct storage tiers, each with cost and performance implications. Example storage tiers include:
Hot (Fastest cloud compute resources)
Warm
Cold
Frozen
In an ideal scenario devoid of financial constraints, all data would be stored in the pricier hot tier that has the fastest cloud computing resources. However, customers need to conduct a cost-benefit assessment. For instance, they might ascertain that most actively queried data is less than 48 hours old. Occasionally, they might need to search data from a few days ago or several weeks back, but speed isn't crucial for these infrequent queries. In this scenario, data could be retained in the hot tier for 48 hours, then moved to the warm tier for 5 days, followed by migration to the cold tier for 3 weeks, and eventually to the frozen tier. This data management sequence is termed the 'Index Lifecycle Management' (ILM).
Hopping Back to analyzing EPS and Gigabytes per day licensing
Even with the widespread shift to SaaS-based SIEMs employing compute resource licensing models, it remains crucial to comprehend EPS and Gigabyte per day licensing. This knowledge should extend to understanding the impacts to logging costs using these log measurement models and the process of converting between the two. Deciding on the appropriate SaaS SIEM cloud compute resources is heavily influenced by your logging sources, which will present data in either EPS or Gigabytes per a specific period of time.
Log Size Impact to EPS versus Gigabytes per day licensing
Here's a straightforward rule to remember:
EPS licensing is advantageous when dealing with larger log sizes.
Gigabytes per day licensing is beneficial for managing smaller log sizes.
Log Pattern impact to EPS versus Gigabytes per day licensing
Many organizations experience peak log generation during specific periods of the day. If your organization sees the highest log generation in the core 8 business hours, with a significant drop in log volume during the remaining 16 hours, Gigabytes per day licensing could be beneficial for you.
Log Aggregation impact to EPS versus Gigabytes per day licensing
The term "log aggregation" can encompass various meanings. In this context, I'm using it to describe a feature in some SIEM systems. This feature involves adding an extra data field to high-volume logs. When certain fields remain the same within the log over a defined period, the SIEM will ingest a single log with those identical fields. The additional log data field indicates how many of those events were aggregated into the single event. Since fewer events are ingested, aggregation can impact both EPS and Gigabytes per day. However, it's important to note that many SIEM vendors that license on EPS, license based on pre-aggregation EPS.
Note that while log aggregation may appear similar to log sampling, they are distinct concepts. Log sampling involves setting a predefined sampling interval, and collecting only a subset of logs within that interval (e.g., 1 log generated or sampled for every 1000 events). This practice is particularly popular and necessary for high-volume log sources like netflow. Log sampling is typically configured at the source of log generation (e.g., Firewalls for netflow). It's worth mentioning that many log shippers also support log sampling functionality.
Converting between EPS and Gigabytes per day models
Regardless of whether your SIEM is licensed based on EPS or Gigabytes per day, it's crucial to understand how to convert between the two. This knowledge proves valuable when dealing with different SIEM vendors. Understanding the EPS to Gigabytes per day conversion is particularly important in two scenarios:
Your current SIEM vendor licenses based on EPS, and you are evaluating a vendor that licenses on Gigabytes per day.
You are assessing the impact of a new log source being ingested into a Gigabytes per day licensed SIEM (including SaaS-based SIEMs). It is often more straightforward to extract EPS statistics from the new log source under evaluation.
Creating your spreadsheet for this purpose is advisable. Key toggles in the spreadsheet include the Events Per Second (EPS) field (Field B2) and the Log Size (Bytes) field (Field B3). The Log Size represents the average size of that particular type of log (Note: A single syslog message is typically around 800 bytes in size). The final row, GB Per Day, performs the calculation based on EPS and log size.
For instance, if you adjust the EPS of your syslog source to 40,000 EPS (Field B2), the Gigabytes per day increases to 2,575, equivalent to 2.5 Terabytes per day. In a lighthearted note, you might jest that you've just contributed to your SIEM sales rep's new boat fund—making them quite happy (LOL).
Managing SIEM Costs - Shift away from the mindset that every log must reside in a single SIEM
SIEM vendors may encourage this approach, primarily because they are less concerned about controlling costs and more focused on increasing your expenditure and securing your commitment to their specific SIEM. Consequently, many SIEMs end up containing an excess of unnecessary logs and a deficiency of crucial logs. This influx of unnecessary logs can also result in delayed query executions.
Distinguish between operational and security logs to avoid potential challenges in terms of costs and implementation. The cybersecurity team should take a lead in making this determination. It's important to recognize that high-volume logs, such as firewall flow logs, netflow logs, and server operational logs, may not be primarily used for alerting. Exercise caution in handling these logs, as they fall into the category of operational logs rather than security logs. If your SIEM constitutes the most expensive logging system, it can be financially burdensome to inundate it with operational logs. If collection is necessary, consider directing them to a more cost-effective backend logging system. Additional advice for managing high-volume logs includes:
Sending them to a more affordable backend logging system.
Implementing sampling (common with netflow logs).
Aggregating logs when feasible.
Filtering out noise-making logs.
Having multiple log locations is where SOAR (Security Orchestration, Automation, and Response) products can help. While a SIEM might excel at collecting, visualizing, and alerting on certain log types, it doesn't necessarily mean it's the best solution for all log types. SOAR implementations enable Security Operations Center (SOC) alerts to flow seamlessly from different log sources, facilitating pivots between various security tools and log repositories. In a practical example, the yellow log repository could be an example of any of the following:
A log repository bundled with a specific point security product where it serves as a collection point for that point security products alerts/events (Think EDR Console).
A log repository a fraction of the cost of the SIEM like a generic syslog server.
A specific log repository might simply be more efficient at collecting and storing that specific log type.
Managing SIEM Costs - Understanding SIEM is a tool and not a cybersecurity magic box
Firstly, it's crucial to move past the notion that a SIEM is an all-encompassing system capable of detecting and preventing every threat. The Center for Internet security (CIS) issues a list of controls where the 1st five are deemed essential. The fact that CIS version 7 included SIEM-like controls under "6) Maintenance, Monitoring, and Analysis of Audit Logs" (note it also doesn't explicitly say SIEM) is telling. In CIS version 8, this was shifted down to "8) Audit log management," placing it 8th out of 18 CIS controls. Consider whether your SIEM is allocated a budget equivalent to its position at #8 in your security stack. Also, evaluate whether your security engineers are effectively identifying and resolving most cybersecurity issues using non-SIEM security products in your security stack.
Managing SIEM Costs - Implement Sigma Rules
Adopt Sigma rules to streamline the development of ongoing SIEM content, resulting in cost savings. Each SIEM comes with its own query language, making it time-consuming to create rules and queries specific to certain event types and data within those events. However, the underlying data sources, such as Windows logs and syslog, remain essentially the same across SIEMs. While the fundamentals of the search are consistent, the syntax varies among SIEM platforms. Developing rules and queries for one SIEM, only to switch platforms and repeat the process, can be a lengthy endeavor.
The engineers behind Sigma (https://github.com/SigmaHQ/sigma) devised a system to abstract rules and queries into a YAML format. This format acts as a key, facilitating the automatic porting of rules and queries across different SIEM platforms. A paid version of Sigma, known as SOC Prime (https://socprime.com/partners/), offers an implementation that allows the porting of SIEM rules and queries across various SIEM platforms.
Comments