FLARE IDA Pro Script Series: Automatic Recovery of Constructed Strings
The FireEye Labs Advanced Reverse Engineering (FLARE) Team is
dedicated to sharing knowledge and tools with the community. We
started with the release of the href="/content/fireeye-www/en_US/blog/threat-research/2014/07/announcing-the-flare-team-and-the-flare-on-challenge.html"
target="_blank">FLARE On Challenge in early July where thousands
of reverse engineers and security enthusiasts participated. Stay tuned
for a write-up of the challenge solutions in an upcoming blog post.
This post is the start of a series where we look to aid other
malware analysts in the field. Since IDA Pro is the most popular tool
used by malware analysts, we’ll focus on releasing scripts and
plug-ins to help make it an even more effective tool for fighting
evil. In the past, at Mandiant we released scripts on GitHub and we’ll
continue to do so at the following new location href="https://github.com/fireeye/flare-ida">https://github.com/fireeye/flare-ida.
This is where you will also find the plug-ins we released in the past:
Shellcode Hashes and Struct Typer. We hope you find all these scripts
as useful as we do.
Let’s start with a simple challenge. What two strings are printed
when executing the disassembly shown in Figure 1?
class=" wp-image-6129 portrait-sm" alt="figure1"
width="276" height="365" />
Figure 1: Disassembly challenge
If you answered
“Hello world\n” and
“Hello, good job! If you didn’t see it then Figure 2 makes
this more obvious. The bytes that make up the strings have been
converted to characters and the local variables are converted to
arrays to show buffer offsets.
href="/content/dam/legacy/blog/2014/08/Figure-2-Disassembly-challenge-with-markup.png"> class=" wp-image-6131 portrait-sm"
width="431" height="554" />
Figure 2: Disassembly challenge with markup
Reverse engineers are likely more accustomed to strings that are a
consecutive sequence of human-readable characters in the file, as
shown in Figure 3. IDA generally does a good job of cross-referencing
these strings in code as can be seen in Figure 4.
href="/content/dam/legacy/blog/2014/08/Figure-3-A-simple-string.png"> class=" wp-image-6132 landscape-med"
width="556" height="61" />
Figure 3: A simple string
href="/content/dam/legacy/blog/2014/08/Figure-4-Using-a-simple-string.png"> class=" wp-image-6133 landscape-med"
width="567" height="61" />
Figure 4: Using a simple string
Manually constructed strings like in Figure 1 are often seen in
malware. The bytes that make up the strings are stored within the
actual instructions rather than a traditional consecutive sequence of
bytes. Simple static analysis with tools such as strings cannot detect
these strings. The code in Figure 5, used to create the challenge
disassembly, shows how easy it is for a malware author to use this technique.
href="/content/dam/legacy/blog/2014/08/Screen-Shot-2014-08-01-at-1.04.59-PM.png"> class=" wp-image-6140 landscape-med"
width="654" height="88" />
Figure 5: Challenge source code
Automating the recovery of these strings during malware analysis is
simple if the compiler follows a basic pattern. A quick examination of
the disassembly in Figure 1 could lead you to write a script that
mov instructions that begin with the opcodes
C6 45 and then extract the stack offset and character
bytes. Modern compilers with optimizations enabled often complicate
matters as they may:
- Load frequently used characters in registers which are used to
copy bytes into the buffer
- Reuse a buffer for multiple
- Construct the string out of order
Figure 6 shows the disassembly of the same source code that was
compiled with optimizations enabled. This caused the compiler to load
some of the frequently occurring characters in registers to reduce the
size of the resulting assembly. Extra instructions are required to
load the registers with a value like the 2-byte mov instruction at
0040115A, but using these registers requires only a
4-byte mov instruction like at
mov instructions that contain hard-coded byte values are
5-bytes, such as at
href="/content/dam/legacy/blog/2014/08/Figure-6-Compiler-optimizations.png"> class=" wp-image-6142 portrait-sm"
width="470" height="501" />
Figure 6: Compiler optimizations
The StackStrings IDA Pro Plug-in
To help you defeat malware that contains these manually constructed
strings we’re releasing an IDA Pro plug-in named StackStrings that is
available at href="https://github.com/fireeye/flare-ida">https://github.com/fireeye/flare-ida.
The plug-in relies heavily on analysis by a Python library called href="http://visi.kenshoto.com/viki/MainPage">Vivisect. Vivisect
is a binary analysis framework frequently used to augment our
analysis. StackStrings uses Vivisect’s analysis and emulation
capabilities to track simple memory usage by the malware. The plug-in
identifies memory writes to consecutive memory addresses of likely
string data and then prints the strings and locations, and creates
comments where the string is constructed. Figure 7 shows the result of
running the above program with the plug-in.
href="/content/dam/legacy/blog/2014/08/Figure-7-StackStrings-plug-in-results.png"> class=" wp-image-6134 landscape-med"
width="529" height="411" />
Figure 7: StackStrings plug-in results
While the plug-in is called StackStrings, its analysis is not just
limited to the stack. It also tracks all memory segments accessed
during Vivisect’s analysis, so manually constructed strings in global
data are identified as well as shown in Figure 8.
href="/content/dam/legacy/blog/2014/08/Figure-8-Sample-global-string.png"> class=" wp-image-6135 landscape-med"
width="545" height="221" />
Figure 8: Sample global string
Simple, manually constructed WCHAR strings are also identified by
the plug-in as shown in Figure 9.
href="/content/dam/legacy/blog/2014/08/Figure-9-Sample-WCHAR-data.png"> class=" wp-image-6136 landscape-med"
width="521" height="229" />
Figure 9: Sample WCHAR data
Download Vivisect from href="http://visi.kenshoto.com/viki/MainPage">http://visi.kenshoto.com/viki/MainPage
and add the package to your PYTHONPATH environment variable if you
don’t already have it installed.
Clone the git repository at href="https://github.com/fireeye/flare-ida">https://github.com/fireeye/flare-ida.
python\stackstring.py file is the IDA Python script
that contains the plug-in logic. This can either be copied to your
%IDADIR%\python directory, or it can be in any directory
found in your PYTHONPATH. The
plugins\stackstrings_plugin.py file must be copied to the
Test the installation by running the following Python commands
within IDA Pro and ensure no error messages are produced:
class="alignleft wp-image-6141 landscape-med"
alt="Screen Shot 2014-08-01 at 1.06.24 PM"
width="602" height="53" />
To run the plugin in IDA Pro go to Edit – Plugins – StackStrings or
The compiler may aggressively optimize memory and register usage
when constructing strings. The worst-case scenario for recovering
these strings occurs when a memory buffer is reused multiple times
within a function, and if string construction spans multiple basic
blocks. Figure 10 shows the construction of
“Hello there\n”. The plug-in attempts
to deal with this by prompting the user by asking whether you want to
use the basic-block aggregator or function aggregator. Often the
basic-block level of memory aggregation is fine, but in this situation
running the plug-in both ways provides additional results.
href="/content/dam/legacy/blog/2014/08/Figure-10-Two-strings-one-buffer-multiple-basic-blocks.png"> class=" wp-image-6137 landscape-med"
width="587" height="296" />
Figure 10: Two strings, one buffer, multiple
You’ll likely get some false positives due to how Vivisect
initializes some data for its emulation. False positives should be
obvious when reviewing results, as seen in Figure 11.
href="/content/dam/legacy/blog/2014/08/Figure-11-False-positive-due-to-memory-initialization.png"> class=" wp-image-6138 landscape-sm"
width="396" height="146" />
Figure 11: False positive due to memory initialization
The plug-in aggressively checks for strings during aggregation
steps, so you’ll likely get some false positives if the compiler sets
null bytes in a stack buffer before the complete string is constructed.
The plug-in currently loads a separate Vivisect workspace for the
same executable loaded in IDA. If you’ve manually loaded additional
memory segments within your IDB file, Vivisect won’t be aware of that
and won’t process those.
Vivisect’s analysis does not always exactly match that of IDA Pro,
and differences in the way the stack pointer is tracked between the
two programs may affect the reconstruction of stack strings.
If the malware is storing a binary string that is later decoded,
even with a simple XOR mask, this plug-in likely won’t work.
The plug-in was originally written to analyze 32-bit x86 samples. It
has worked on test 64-bit samples, but it hasn’t been extensively
tested for that architecture.
StackStrings is just one of many internally developed tools we use
on the FLARE team to speed up our analysis. We hope it will help speed
up your analysis too. Stay tuned for our next post where we’ll release
another tool to improve your malware analysis workflow.
Source: FLARE IDA Pro Script Series: Automatic Recovery of Constructed Strings