File Analysis in Incident Response

brencronin

3 days ago5 min read

During incident response investigations, a critical task is analyzing files for suspicious behavior. There are multiple levels of analysis, each with its own advantages and limitations.

1. Hash-Based Analysis (Basic File Identification)

The simplest method of file analysis involves generating a cryptographic hash (e.g., MD5, SHA256) and submitting it to a threat intelligence service like VirusTotal. If the hash matches a known malicious file, the service provides insights into its nature.

Limitations:

If the file is a legitimate internal document, uploading it to a public service could expose sensitive data.
File hashes alone do not analyze the file’s contents or behavior, making this method insufficient for detecting new or modified threats.
Many analysts use hash lookups as a preliminary check but rely on deeper analysis when necessary.

2. Dynamic Analysis (Sandboxing)

Dynamic analysis involves running the file in a controlled environment (sandbox) to observe its behavior. Sandbox services detect suspicious activities such as system modifications, network connections, and process executions.

Limitations:

Public sandbox services may expose internal files, though some paid services offer private submissions.
Malware authors often implement sandbox evasion techniques to avoid detection.
Threats that do not spawn new processes can be challenging to identify through dynamic analysis alone.

3. Static Analysis (In-Depth File Inspection)

Static analysis involves examining a file’s contents without executing it. Specialized tools can identify suspicious indicators, such as embedded scripts, obfuscation techniques, and malicious code patterns.

Advantages:

Effective for detecting threats that do not create new processes.
Can uncover embedded malicious code in documents (e.g., PDFs, Office files).
Useful for analyzing executables before execution.

This article will explore key static analysis tools for examining files and identifying potential threats.

Static File Analysis

Malware analysis researcher Lenny Zeltser, outlines the following key steps in file-based malware analysis:

Identify Anomalies - Inspect the document for suspicious elements such as risky tags, scripts, or embedded artifacts.
Locate Embedded Code - Search for shellcode, macros, JavaScript, or other potentially malicious objects, including QR codes.
Detect Obfuscation - Determine whether the embedded code or objects are obfuscated to evade detection.
Extract Suspicious Content - Isolate and retrieve any suspicious code or objects from the file.
Deobfuscate & Analyze - If applicable, decode and examine macros, JavaScript, or other embedded scripts.
Emulate & Debug - If necessary, emulate, disassemble, or debug extracted shellcode to understand its functionality.
Trace the Infection Chain - Analyze how the malware interacts with the system and identify subsequent stages of the attack.

Document & PDF Analysis

At a fundamental level, most operating systems include a built-in tool for extracting readable text (strings) from any file. In Linux, this tool is called strings, while in Windows, findstr serves a similar purpose.

When no advanced analysis tools are available, these utilities can be used to scan a suspicious file for potentially malicious strings. However, since files often contain numerous benign strings, analysts typically filter the output to focus on key indicators, such as suspicious URLs, embedded commands, or encoded payloads.

For example, searching for suspicious URLs within a file can help identify potential threats hidden in scripts or executables.

strings file | grep http

Several specialized file analysis tools, available for download or pre-installed on the SANS SIFT workstation, can enhance forensic investigations:

pdfinfo - Extracts metadata from PDF documents
pdfid.py – Detects potentially malicious elements in PDF files.
pdf-parser.py – Parses and extracts embedded components from PDFs for deeper forensic analysis.
olevba – Analyzes VBA macros in Microsoft Office documents to identify malicious scripts.
oledump.py – Extracts and examines embedded OLE objects within Office documents.
exiftool - Extracts metadata from images, documents, and other files.

Security researcher Didier Stevens developed pdfid.py and pdf-parser.py to aid in PDF file analysis. A fundamental approach involves:

Initial Triage with pdfid.py – Quickly scan PDF documents for potentially malicious elements.
In-Depth Analysis with pdf-parser.py – Further investigate suspicious PDFs by extracting and analyzing embedded objects, scripts, and metadata.

pdfid.py

When analyzing PDFs with pdfid.py, the following flags and objects can indicate the presence of scripts, embedded files, or potentially malicious behavior:

/Page – Indicates the number of pages in the PDF (useful for context).
/JavaScript & /JS – Flags for embedded JavaScript, often used in malicious PDFs.
/AcroForm & /XFA – Indicates the presence of interactive forms, which can contain JavaScript.
/Launch & /EmbeddedFiles – Suggests embedded programs or external file execution, a common attack vector.
/AA & /OpenAction – Defines actions triggered upon opening the document.
/URI – Contains external URLs, potentially used for phishing.
/SubmitForm & /GoToR – Can send data to external URLs, which may be used for data exfiltration.
/RichMedia – Indicates embedded Flash content, often exploited in past attacks.
/ObjStm – Hides objects inside an object stream, a technique used for obfuscation.
/XObject – Can embed images, sometimes used in phishing attempts.

pdf-parse.py

This tool analyzes a PDF document by extracting and identifying its core structural elements without rendering the file. While the parser is designed for efficiency rather than perfection, it effectively provides the necessary insights for investigation.

Metadata Extraction

Metadata: Information about the document's creation, modification, author, etc. can help you understand the document's origin.

exiftool – Extracts metadata from images, documents, and other files.

pdfinfo – Extracts metadata from PDF documents

Example Abuse of documents

Ole Object Abuse

OLE (Object Linking and Embedding) objects are widely used in business environments to reference master data, such as diagrams, across multiple documents.

Linked Objects – These reference external data from another document, ensuring consistency across multiple files.
Use Case Example – A master drawing linked in a report allows updates in the original file to reflect across all linked documents.
Considerations – Maintaining access to both the source application and the linked document is crucial. Renaming or moving either file may require reestablishing the link.

(AutoDesk. About Object Linking and Embedding (OLE)) — (AutoDesk. *About Object Linking and Embedding (OLE))*

To show how easy this is; create a batch file in windows like startfirefox.bat and put this in it.

start firefox https://www.google.com

Then open a word document and select Insert/Object.

You will see a list of Object types (e.g. other Office documents, etc).

Select “Create from File” and browse to your batch file.

Now if someone clicks this object the script will run and start the browser to the URL you linked (Note: My script is opening Firefox but accidently named it Chrome).

Attackers can conceal OLE objects within documents to trick users into clicking on them. Ideally, users should receive a warning when a script is executed. Two network security monitoring techniques for detecting these threats include:

YARA Rules for IPS – Implementing YARA rules, as outlined in Write YARA Rules to Detect Embedded EXE Files in OLE Objects, can help identify suspicious embedded executables.
Sandbox Analysis – Sending documents through a sandbox environment enables dynamic analysis to detect potentially malicious behavior.

However, if your organization frequently exchanges documents containing OLE objects with external partners, these methods may generate a high number of false positives, requiring careful tuning.

References

Lenny Zeltser pdf analysis:

https://zeltser.com/tools-for-malicious-pdf-analysis/

Malicious document analysis cheat sheet:

https://zeltser.com/media/docs/analyzing-malicious-document-files.pdf

Didier stevens blog on pfid.py and pdf-parser.py:

https://blog.didierstevens.com/programs/pdf-tools/

12 Days of Defense - Day 1: PDF and Office Doc Malware IOC Extraction:

https://www.youtube.com/watch?v=3rh82pG7MME