top of page

Threat Hunting - Kusto Query Language (KQL) externaldata Operator

brencronin

Updated: Nov 10, 2024

The externaldata KQL operator enables you to download data from the Internet and temporarily store it as a KQL table for querying. It supports the following storage services:


  • Azure Blob Storage

  • Azure Data Lake Storage Gen2

  • Azure Data Lake Storage Gen1

  • Amazon S3

  • HTTP web services



This post will provide a quick walkthrough on how to query IP address connections from your Microsoft telemetry to identify any that are connecting to IP addresses listed in the spamhaus block list. The spamhaus block list will be accessed as external data via an HTTP web service.


let droplist=externaldata(RawData:string)[h'https://www.spamhaus.org/drop/drop.txt']

| parse-where RawData with IPRange @" ; SBL" *

| distinct IPRange; DeviceNetworkEvents

| evaluate ipv4_lookup(droplist, RemoteIP, IPRange)


Walking though the KQL query below. The 'let' statement is used to set a variable name equal to the expression. in this case the variable name is 'droplist' and is set to be equal to the result of the expression beyond the =.


The externaldata operator requires two parameters: ColumnName and ColumnType. In this example, the ColumnName is RawData, and the ColumnType is a string.


The query then retrieves data from the HTTP web service at spamhaus.org, converting it into a Kusto table with the schema (RawData:string). To obfuscate sensitive data, such as API keys or personally identifiable information (PII), the Spamhaus list is prepended with a lowercase h, creating an obfuscated string literal. You can easily obfuscate a string in KQL by adding h or H before the string you want to protect.

The next part of the query starts with 'parse-where' which evaluates the RawData string expression, and parses its value into one or more calculated columns. The column is named IPRange and the values parses is based on the expression @" ; SBL" * where space ; space SBL###### is extra text on the line that is needed. Adding the @ character before a string literal makes it a verbatim string.

Next the 'distinct' operator is used to produces a table with the distinct combination of the provided columns of the input table of IP addresses from telemetry of DeviceNetworkEvents


Next, the evaluate operator is used to invoke the ipv4_lookup plugin. This plugin checks an IPv4 address against a lookup table, which in this case is the droplist table created by the externaldata query.


The ipv4_lookup plugin requires two additional arguments: SourceIPv4Key and IPv4LookupKey. The SourceIPv4Key is the IPv4 address you want to look up in the LookupTable—specifically, the RemoteIP field from your systems Microsoft network telemetry. The IPv4LookupKey, which is the IPRange column (bad IP blocks from spamhaus), is the part of the lookup table that is searched to find matches against the RemoteIP from your telemetry data.

Some notes on externaldata operator best practices from the book The Definitive Guide to KQL: Using Kusto Query Language for operations, defending, and threat hunting:


  • Ensure that the external storage artifact is accessible, and the connection string is accurate.

  • Validate and sanitize the data retrieved from external sources to avoid security risks and maintain data integrity.

  • Consider performance implications when working with large datasets. The externaldata operator is optimized for small reference tables rather than large data volumes.

  • Familiarize yourself with the available data formats and authentication methods supported by the externaldata operator.


References


Morowczynski, Mark; Trent, Rod; Zorich, Matthew. The Definitive Guide to KQL: Using Kusto Query Language for operations, defending, and threat hunting








82 views0 comments

Recent Posts

See All
Post: Blog2_Post
  • Facebook
  • Twitter
  • LinkedIn

©2021 by croninity. Proudly created with Wix.com

bottom of page