Network Detection Response (NDR) - HTTP Analysis Part 1

HTTP is one of the most widely recognized protocols, essential for daily internet communication. Its ubiquity ensures it’s readily available on most desktops and servers, and often allowed in and out of networks through firewalls without restriction. However, this widespread use also makes HTTP a prime target for cyber attackers.

Common malicious activities frequently leveraging HTTP include:

Delivering exploits (e.g., phishing attack links, fake software updates, etc)
Man-in-the-Middle systems that intercept HTTP requests and responses
Remote surreptitious access using web shells
Command-and-control (C&C) communication and data exfiltration

Overview of HTTP

To effectively analyze HTTP, identify its misuse, and uncover detection opportunities, it's essential to first understand the basics of how HTTP functions and the key data points it provides. This foundational knowledge is crucial for spotting potential security threats.

HTTP operates on a client-server stateless model, where the client (usually a web browser) sends a request to a server resource located at a Universal Resource Locator (URL), and the server responds with the requested data, typically in the form of web pages or other resources. The process follows a request-response cycle:

Client Request: The client initiates communication by sending an HTTP request to a server. This request includes a method (like GET, POST, PUT, DELETE) that indicates the desired action.
Server Response: The server processes the request and responds with an HTTP status code (e.g., 200 OK, 404 Not Found) and the requested resource, such as an HTML page or file.

The core language of HTTP is the Uniform Resource Locator (URL), which essentially acts as a structured path to locate specific resources on the web. Here’s how a URL is decoded using the example below:

http://site1.test.com:8000/myfolder/index.html?key1=val&key2=val

http: Protocol (could also be HTTPS, FTP, etc.)
site1: Subdomain
test.com: Parent domain
8000: Port number
myfolder: Directory or folder path
index.html: File name
key1=val&key2=val: GET parameters (used to pass data to the server)

HTTP Methods and Headers

HTTP methods and headers serve different purposes in the communication process between clients and servers.

HTTP methods (also called verbs) define the specific action that the client wants the server to perform. Each method specifies a particular type of interaction between the client and server. Common HTTP methods include:

GET: Requests data from the server (e.g., a web page or file).
- Sends data as part of the URL in the query string which makes it visible, and size limited based on URL string length.
- Example PHP code "server-side coding" that interprets the q= parameter runs a query and renders the results to the page: http:/testsite.com/host.php?q=1.1.1.1
POST: Sends data to the server (e.g., form submission, file upload).
- More commonly used for web forms, and logon.
- Sends data in the request body, separate from the URL making the data not visible in the URL and not size limited by the URL length.
PUT: Updates or creates a resource on the server.
DELETE: Removes a resource from the server.
HEAD: Retrieves headers only (no body) for a resource, useful for checking its existence or metadata.

Per their intuitive names GET is most often used to request or retrieve data, and POST is used to send data to the server, especially when submitting forms or performing actions that modify server data.

HTTP headers provide additional context and information about the request or response. They are key-value pairs that help control how the request is handled or what type of content is being sent. There are two types:

Request headers: Sent by the client to provide more information about the request. Fields include:

Method, path and HTTP version: Example: Get / HTTP/1.1
- Get = method
- / = path
- HTTP 1.1 = The version
Host. Destination Hostname/IP address port number. Example: 127.0.0.1:8080
- 127.0.01 = Host
- 8080 = Port
User-Agent: Identifies the client making the request (e.g., browser type). Example: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:131.0) Gecko/20100101 Firefox/131.0
Accept: The type of formatting the sender will accept as response from the server in the form of MIME types. Example: text/html, application/xhtml+xml, application/xml;q=0.9, */*;q=0.8
Accept-Language: The language the sender will accept in the response message. Example: en-US,en;q=0.5
Accept-Encoding: Content encoding (usually a compression algorithm) that the client can understand. Example: gzip, compress, deflate
Connection: Identifies the connection type. Controls whether the network connection stays open after the current transaction finishes. Examples: close or keep-alive
Authorization: Sends authentication credentials.

Response headers: Sent by the server to provide information about the response. Examples include:

Protocol version and status code. Example: HTTP/1.1 200 ok
- HTTP/1.1 = Protocol and version
- 200 OK = Status code. Some common response codes.
  - 100 – continue
  - 101 – switching protocols
  - 200 – OK
  - 302 – Moved permanently
  - 302 – found
  - 401 – unauthorized
  - 403 – forbidden
  - 404 – not found
  - 407 – Proxy auth required
  - 500 – internal server error
  - 503 – service unavailable
Content-Type: Indicates the media type of the response body (e.g., text/html, application/json). Example: text/html; charset=utf-8
Content Length: Indicates the size of the message body, in bytes, sent to the recipient.
Date:
Set-Cookie: Instructs the client user agent to store a cookie, so that the user agent can send it back to the server later. Example: <cookie-name>=<cookie-value>; Domain=<domain-value>; Secure; HttpOnly

Other HTTP headers of important are the referrer HTTP header and the Host header.

Often, the initial web server receiving a request doesn’t host the requested resource. In such cases, the server redirects the request to another server. During this process, the HTTP referrer header is added, indicating the URL of the site or page from which the request originated. One important aspect to monitor is the presence of referrers that don’t logically align with the site being referenced. This could indicate clickbait activity. For example, a link might say "Click here for information about ABC," but instead direct users to a malware site that has no apparent connection to ABC. This kind of behavior is a red flag and should be investigated further.

A single web server can often host multiple websites on the same IP address. The Host header in an HTTP request specifies which website should handle the incoming traffic, ensuring the server routes the request to the correct site.

Ways HTTP can be used for maliciousness

Installing malware or unwanted software

Returning to the analogy of the web as a vast file-sharing system, files hosted on a web server can include malicious code that, if downloaded, could be executed on the user’s machine. Additionally, web servers often run active code, intended to enhance the browsing experience that can also be exploited to run malicious scripts. Examples of such active code include CSS, JavaScript, images, Flash, and documents.

The browser functions as a sandbox, separating code from the operating system, interpreting files, and determining how to handle them. For HTML, images, or Flash, the browser typically displays the content on the page. For documents or executables, it prompts the user to either open the file or perform the default action based on the browser’s configuration. These types of attacks are sometimes referred to as drive-by downloads or watering hole attacks, depending on the context of their implementation.

HTML Smuggling

An increasingly common attack in this category is known as HTML smuggling. This technique leverages HTML5 Blob and File Object features to execute client-side attacks on victims' systems. Traditional security tools and browser defenses typically focus on detecting malware as it transfers between the web browser and server. However, in HTML smuggling, the malicious payload is embedded as an immutable blob within the webpage, containing binary data and JavaScript. Instead of downloading a file, the malware is built directly on the end user's computer, bypassing many security measures.

Users redirected to Man-in-the-Middle for session and credential stealing

Man-in-the-Middle (MitM) attacks can be used in the context of drive-by downloads or watering hole attacks, where users' browsers are redirected to a malicious MitM site. However, MitM sites are more commonly accessed through phishing links. Once users unknowingly interact with the MitM site, they are prompted to enter their credentials. Additionally, session tokens are captured, allowing attackers to use these stolen tokens and credentials to gain access to the users' systems.

Remote Access through web shells

A web shell is a type of malicious script that allows an attacker to remotely control a web server through a web-based interface. It is typically uploaded to a compromised server by exploiting vulnerabilities in web applications, such as file upload flaws or poorly secured server configurations. This diagram illustrates a scenario where an organization's web server, accessible via the internet, is compromised with a web shell. The attacker communicates with the system by sending HTTP requests containing commands to the web shell, which then executes those commands on the server itself. Some excellent articles on web shells can be found here:

Acunetix blog series on web shells: https://www.acunetix.com/blog/articles/introduction-web-shells-part-1/
Gigamon blog post on web shells: https://blog.gigamon.com/2022/09/28/investigating-web-shells/
Github page from NSA Cyber on mitigating webshells: https://github.com/nsacyber/Mitigating-Web-Shells

Command & Control Communications

Using HTTP for Command & Control (C2) and data exfiltration involves a compromised system with internet access over HTTP. Attackers leverage this connection to communicate between their internet-based system and the victim's compromised system. These communications are concealed within the organization's regular HTTP traffic, making the HTTP communication both allowed and difficult to detect.