Security-X

Forum Security-X => News => Discussion démarrée par: igor51 le février 09, 2018, 21:00:20

Titre: [FireEye]FLARE IDA Pro Script Series: Automatic Recovery of Constructed Strings in Malware
Posté par: igor51 le février 09, 2018, 21:00:20
FLARE IDA Pro Script Series: Automatic Recovery of Constructed Strings
in Malware


The FireEye Labs Advanced Reverse Engineering (FLARE) Team is
  dedicated to sharing knowledge and tools with the community. We
  started with the release of the     href="/content/fireeye-www/en_US/blog/threat-research/2014/07/announcing-the-flare-team-and-the-flare-on-challenge.html"
    target="_blank">FLARE On Challenge in early July where thousands
  of reverse engineers and security enthusiasts participated. Stay tuned
  for a write-up of the challenge solutions in an upcoming blog post.


 

This post is the start of a series where we look to aid other
  malware analysts in the field. Since IDA Pro is the most popular tool
  used by malware analysts, we’ll focus on releasing scripts and
  plug-ins to help make it an even more effective tool for fighting
  evil. In the past, at Mandiant we released scripts on GitHub and we’ll
  continue to do so at the following new location   href="https://github.com/fireeye/flare-ida">https://github.com/fireeye/flare-ida.
  This is where you will also find the plug-ins we released in the past:
  Shellcode Hashes and Struct Typer. We hope you find all these scripts
  as useful as we do.


 

Quick Challenge


 

Let’s start with a simple challenge. What two strings are printed
  when executing the disassembly shown in Figure 1?


 


        class=" wp-image-6129  portrait-sm" alt="figure1"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/figure1.png"
      width="276" height="365" />


 


  Figure 1: Disassembly challenge


 

If you answered “Hello world\n” and “Hello
  there\n”
, good job! If you didn’t see it then Figure 2 makes
  this more obvious. The bytes that make up the strings have been
  converted to characters and the local variables are converted to
  arrays to show buffer offsets.


 


        href="/content/dam/legacy/blog/2014/08/Figure-2-Disassembly-challenge-with-markup.png">      class=" wp-image-6131  portrait-sm"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Figure-2-Disassembly-challenge-with-markup.png"
      width="431" height="554" />


 


  Figure 2: Disassembly challenge with markup


 

Reverse engineers are likely more accustomed to strings that are a
  consecutive sequence of human-readable characters in the file, as
  shown in Figure 3. IDA generally does a good job of cross-referencing
  these strings in code as can be seen in Figure 4.


 


        href="/content/dam/legacy/blog/2014/08/Figure-3-A-simple-string.png">      class=" wp-image-6132  landscape-med"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Figure-3-A-simple-string.png"
      width="556" height="61" />


 


  Figure 3: A simple string


 


        href="/content/dam/legacy/blog/2014/08/Figure-4-Using-a-simple-string.png">      class=" wp-image-6133  landscape-med"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Figure-4-Using-a-simple-string.png"
      width="567" height="61" />


 


  Figure 4: Using a simple string


 

Manually constructed strings like in Figure 1 are often seen in
  malware. The bytes that make up the strings are stored within the
  actual instructions rather than a traditional consecutive sequence of
  bytes. Simple static analysis with tools such as strings cannot detect
  these strings. The code in Figure 5, used to create the challenge
  disassembly, shows how easy it is for a malware author to use this technique.


 


        href="/content/dam/legacy/blog/2014/08/Screen-Shot-2014-08-01-at-1.04.59-PM.png">      class=" wp-image-6140  landscape-med"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Screen-Shot-2014-08-01-at-1.04.59-PM.png"
      width="654" height="88" />


 


  Figure 5: Challenge source code


 

Automating the recovery of these strings during malware analysis is
  simple if the compiler follows a basic pattern. A quick examination of
  the disassembly in Figure 1 could lead you to write a script that
  searches for mov instructions that begin with the opcodes
    C6 45 and then extract the stack offset and character
  bytes. Modern compilers with optimizations enabled often complicate
  matters as they may:


 
 

Figure 6 shows the disassembly of the same source code that was
  compiled with optimizations enabled. This caused the compiler to load
  some of the frequently occurring characters in registers to reduce the
  size of the resulting assembly. Extra instructions are required to
  load the registers with a value like the 2-byte mov instruction at
  0040115A, but using these registers requires only a
  4-byte mov instruction like at 0040117D. The
  mov instructions that contain hard-coded byte values are
  5-bytes, such as at 0040118F.


 


        href="/content/dam/legacy/blog/2014/08/Figure-6-Compiler-optimizations.png">      class=" wp-image-6142  portrait-sm"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Figure-6-Compiler-optimizations.png"
      width="470" height="501" />


 


  Figure 6: Compiler optimizations


 

The StackStrings IDA Pro Plug-in


 

To help you defeat malware that contains these manually constructed
  strings we’re releasing an IDA Pro plug-in named StackStrings that is
  available at   href="https://github.com/fireeye/flare-ida">https://github.com/fireeye/flare-ida.
  The plug-in relies heavily on analysis by a Python library called   href="http://visi.kenshoto.com/viki/MainPage">Vivisect. Vivisect
  is a binary analysis framework frequently used to augment our
  analysis. StackStrings uses Vivisect’s analysis and emulation
  capabilities to track simple memory usage by the malware. The plug-in
  identifies memory writes to consecutive memory addresses of likely
  string data and then prints the strings and locations, and creates
  comments where the string is constructed. Figure 7 shows the result of
  running the above program with the plug-in.


 


        href="/content/dam/legacy/blog/2014/08/Figure-7-StackStrings-plug-in-results.png">      class=" wp-image-6134  landscape-med"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Figure-7-StackStrings-plug-in-results.png"
      width="529" height="411" />


 


  Figure 7: StackStrings plug-in results


 

While the plug-in is called StackStrings, its analysis is not just
  limited to the stack. It also tracks all memory segments accessed
  during Vivisect’s analysis, so manually constructed strings in global
  data are identified as well as shown in Figure 8.


 


        href="/content/dam/legacy/blog/2014/08/Figure-8-Sample-global-string.png">      class=" wp-image-6135  landscape-med"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Figure-8-Sample-global-string.png"
      width="545" height="221" />


 


  Figure 8: Sample global string


 

Simple, manually constructed WCHAR strings are also identified by
  the plug-in as shown in Figure 9.


 


        href="/content/dam/legacy/blog/2014/08/Figure-9-Sample-WCHAR-data.png">      class=" wp-image-6136  landscape-med"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Figure-9-Sample-WCHAR-data.png"
      width="521" height="229" />


 


  Figure 9: Sample WCHAR data


 

Installation


 

Download Vivisect from   href="http://visi.kenshoto.com/viki/MainPage">http://visi.kenshoto.com/viki/MainPage
  and add the package to your PYTHONPATH environment variable if you
  don’t already have it installed.


 

Clone the git repository at   href="https://github.com/fireeye/flare-ida">https://github.com/fireeye/flare-ida.
  The python\stackstring.py file is the IDA Python script
  that contains the plug-in logic. This can either be copied to your
  %IDADIR%\python directory, or it can be in any directory
  found in your PYTHONPATH. The
  plugins\stackstrings_plugin.py file must be copied to the
  %IDADIR%\plugins directory.


 

Test the installation by running the following Python commands
  within IDA Pro and ensure no error messages are produced:


 

        href="/content/dam/legacy/blog/2014/08/Screen-Shot-2014-08-01-at-1.06.24-PM.png">      class="alignleft  wp-image-6141 landscape-med"
      alt="Screen Shot 2014-08-01 at 1.06.24 PM"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Screen-Shot-2014-08-01-at-1.06.24-PM.png"
      width="602" height="53" />

 

To run the plugin in IDA Pro go to Edit – Plugins – StackStrings or
  press Alt+0.


 

Known Limitations


 

The compiler may aggressively optimize memory and register usage
  when constructing strings. The worst-case scenario for recovering
  these strings occurs when a memory buffer is reused multiple times
  within a function, and if string construction spans multiple basic
  blocks. Figure 10 shows the construction of “Hello
  world\n”
and “Hello there\n”. The plug-in attempts
  to deal with this by prompting the user by asking whether you want to
  use the basic-block aggregator or function aggregator.  Often the
  basic-block level of memory aggregation is fine, but in this situation
  running the plug-in both ways provides additional results.


 


        href="/content/dam/legacy/blog/2014/08/Figure-10-Two-strings-one-buffer-multiple-basic-blocks.png">      class=" wp-image-6137  landscape-med"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Figure-10-Two-strings-one-buffer-multiple-basic-blocks.png"
      width="587" height="296" />


 


  Figure 10: Two strings, one buffer, multiple
    basic blocks


 

You’ll likely get some false positives due to how Vivisect
  initializes some data for its emulation. False positives should be
  obvious when reviewing results, as seen in Figure 11.


 


        href="/content/dam/legacy/blog/2014/08/Figure-11-False-positive-due-to-memory-initialization.png">      class=" wp-image-6138  landscape-sm"
      src="https://www.fireeye.com/content/dam/legacy/blog/2014/08/Figure-11-False-positive-due-to-memory-initialization.png"
      width="396" height="146" />


 


  Figure 11: False positive due to memory initialization


 

The plug-in aggressively checks for strings during aggregation
  steps, so you’ll likely get some false positives if the compiler sets
  null bytes in a stack buffer before the complete string is constructed.


 

The plug-in currently loads a separate Vivisect workspace for the
  same executable loaded in IDA. If you’ve manually loaded additional
  memory segments within your IDB file, Vivisect won’t be aware of that
  and won’t process those.


 

Vivisect’s analysis does not always exactly match that of IDA Pro,
  and differences in the way the stack pointer is tracked between the
  two programs may affect the reconstruction of stack strings.


 

If the malware is storing a binary string that is later decoded,
  even with a simple XOR mask, this plug-in likely won’t work.


 

The plug-in was originally written to analyze 32-bit x86 samples. It
  has worked on test 64-bit samples, but it hasn’t been extensively
  tested for that architecture.


 

Conclusion


 

StackStrings is just one of many internally developed tools we use
  on the FLARE team to speed up our analysis. We hope it will help speed
  up your analysis too. Stay tuned for our next post where we’ll release
  another tool to improve your malware analysis workflow.


Source: FLARE IDA Pro Script Series: Automatic Recovery of Constructed Strings
in Malware (http://)