Security-X

Forum Security-X => News => Discussion démarrée par: igor51 le avril 22, 2019, 20:00:23

Titre: [FireEye]CARBANAK Week Part One: A Rare Occurrence
Posté par: igor51 le avril 22, 2019, 20:00:23
CARBANAK Week Part One: A Rare Occurrence


 


 

It is very unusual for FLARE to analyze a prolifically-used,
  privately-developed backdoor only to later have the source code and
  operator tools fall into our laps. Yet this is the extraordinary
  circumstance that sets the stage for CARBANAK Week, a four-part blog
  series that commences with this post.


 

CARBANAK is one of the most full-featured backdoors around. It was
  used to perpetrate millions of dollars in financial crimes, largely by
  the group we track as FIN7. In
  2017, Tom Bennett and Barry Vengerik published     href="https://www.fireeye.com/blog/threat-research/2017/06/behind-the-carbanak-backdoor.html">Behind
    the CARBANAK Backdoor, which was the product of a deep and broad
  analysis of CARBANAK samples and FIN7 activity across several years.
  On the heels of that publication, our colleague Nick Carr uncovered a
  pair of RAR archives containing CARBANAK source code, builders, and
  other tools (both available in VirusTotal:   href="https://www.virustotal.com/#/file/783b2eefdb90eb78cfda475073422ee86476aca65d67ff2c9cf6a6f9067ba5fa/detection">kb3r1p
  and apwmie).


 

FLARE malware analysis requests are typically limited to a few dozen
  files at most. But the CARBANAK source code was 20MB comprising 755
  files, with 39 binaries and 100,000 lines of code. Our goal was to
  find threat intelligence we missed in our previous analyses. How does
  an analyst respond to a request with such breadth and open-ended
  scope? And what did we find?


 

My friend Tom Bennett and I spoke about this briefly in our 2018
  FireEye Cyber Defense Summit talk,     href="https://www.fireeye.com/content/fireeye-summit/en_US/learn/tracks.html#technical-4">Hello,
  Carbanak! In this blog series, we will expound at length and share
  a written retrospective on the inferences drawn in our previous public
  analysis based on binary code reverse engineering. In this first part,
  I’ll discuss Russian language concerns, translated graphical user
  interfaces of CARBANAK tools, and anti-analysis tactics as seen from a
  source code perspective. We will also explain an interesting twist
  where analyzing the source code surprisingly proved to be just as
  difficult as analyzing the binary, if not more. There’s a lot here;
  buckle up!


 

File Encoding and Language Considerations


 

The objective of this analysis was to discover threat intelligence
  gaps and better protect our customers. To begin, I wanted to assemble
  a cross-reference of source code files and concepts of specific interest.


 

Reading the source code entailed two steps: displaying the files in
  the correct encoding, and learning enough Russian to be dangerous.
  Figure 1 shows CARBANAK source code in a text editor that is unaware
  of the correct encoding.


 


 
 
 Figure 1: File without proper decoding


 

Two good file encoding guesses are UTF-8 and code page 1251
  (Cyrillic). The files were mostly code page 1251 as shown in Figure 2.


 


 
 
 Figure 2: Code Page 1251 (Cyrillic)
    source code


 

Figure 2 is a C++ header file defining error values involved in
  backdoor command execution. Most identifiers were in English, but some
  were not particularly descriptive. Ergo, the second and more difficult
  step was learning some Russian to benefit from the context offered by
  the source code comments.


 

FLARE has fluent Russian speakers, but I took it upon myself to
  minimize my use of other analysts’ time. To this end, I wrote a script
  to tear through files and create a prioritized vocabulary list. The
  script, which is available in the     href="https://github.com/fireeye/vocab_scraper">FireEye
    vocab_scraper GitHub repository, walks source directories
  finding all character sequences outside the printable lower ASCII
  range: decimal values 32 (the space character) through 126 (the tilde
  character “~”) inclusive. The script adds each word to a Python   class="code">defaultdict_ and increments its count. Finally,
  the script orders this dictionary by frequency of occurrence and dumps
  it to a file.


 

The result was a 3,400+ word vocabulary list, partially shown in
  Figure 3.


 


 
 
 Figure 3: Top 19 Cyrillic character
    sequences from the CARBANAK source code


 

I spent several hours on Russian language learning websites to study
  the pronunciation of Cyrillic characters and Russian words. Then, I
  looked up the top 600+ words and created a small dictionary. I added
  Russian language input to an analysis VM and used Microsoft’s
  on-screen keyboard (osk.exe) to navigate the
  Cyrillic keyboard layout and look up definitions.


 

One helpful effect of learning to pronounce Cyrillic characters was
  my newfound recognition of English loan words (words that are borrowed
  from English and transliterated to Cyrillic). My small vocabulary
  allowed me to read many comments without looking anything up. Table 1
  shows a short sampling of some of the English loan words I encountered.


 
   
     
   
              valign="top">

224


   
              valign="top">

145


   
              valign="top">

52


   
              valign="top">

110+


   
              valign="top">

130


   
              valign="top">

116


   
              valign="top">

70


   
              valign="top">

130ish


          Cyrillic


          English Phonetic


          English


          Occurrences


          Rank

Файл

f ah y
        L

file

5

сервер

s e r v e
        r

server

13

адрес

a d r e
        s

address

134

команд

k o m a n
        d

command

27

бота

b o t a


     

bot

32

плагин

p l ah g
          ee n

plugin

39

сервис

s e r v
          ee s

service

46

процесс

p r o ts
          e s s

process

63


     

 


  Table 1: Sampling of English loan words in the
    CARBANAK source code


 

Aside from source code comments, understanding how to read and type
  in Cyrillic came in handy for translating the CARBANAK graphical user
  interfaces I found in the source code dump. Figure 4 shows a Command
  and Control (C2) user interface for CARBANAK that I translated.


 


 
 
 Figure 4: Translated C2 graphical user interface


 

These user interfaces included video management and playback
  applications as shown in Figure 5 and Figure 6 respectively. Tom will
  share some interesting work he did with these in a subsequent part of
  this blog series.


 


 
 
 Figure 5: Translated video management
    application user interface


 


 
 
 Figure 6: Translated video playback
    application user interface


 

Figure 7 shows the backdoor builder that was contained within the
  RAR archive of operator tools.


 


 
 
 Figure 7: Translated backdoor builder
    application user interface


 

The operator RAR archive also contained an operator’s manual
  explaining the semantics of all the backdoor commands. Figure 8 shows
  the first few commands in this manual, both in Russian and English (translated).


 


 
 
 Figure 8: Operator manual (left: original
    Russian; right: translated to English)


 

Down the Rabbit Hole: When Having Source Code Does Not Help


 

In simpler backdoors, a single function evaluates the command ID
  received from the C2 server and dispatches control to the correct
  function to carry out the command. For example, a backdoor might ask
  its C2 server for a command and receive a response bearing the command
  ID 0x67. The dispatch function in the
  backdoor will check the command ID against several different values,
  including 0x67, which as an example might
  call a function to shovel a reverse shell to the C2 server. Figure 9
  shows a control flow graph of such a function as viewed in IDA Pro.
  Each block of code checks against a command ID and either passes
  control to the appropriate command handling code, or moves on to check
  for the next command ID.


 


 
 
 Figure 9: A control flow graph of a
    simple command handling function


 

In this regard, CARBANAK is an entirely different beast. It utilizes
  a Windows mechanism called     href="https://docs.microsoft.com/en-us/windows/desktop/ipc/named-pipes">named
  pipes as a means of communication and coordination across all the
  threads, processes, and plugins under the backdoor’s control. When the
  CARBANAK tasking component receives a command, it forwards the command
  over a named pipe where it travels through several different functions
  that process the message, possibly writing it to one or more
  additional named pipes, until it arrives at its destination where the
  specified command is finally handled. Command handlers may even
  specify their own named pipe to request more data from the C2 server.
  When the C2 server returns the data, CARBANAK writes the result to
  this auxiliary named pipe and a callback function is triggered to
  handle the response data asynchronously. CARBANAK’s named pipe-based
  tasking component is flexible enough to control both inherent command
  handlers and plugins. It also allows for the possibility of a local
  client to dispatch commands to CARBANAK without the use of a network.
  In fact, not only did we write such a client to aid in analysis and
  testing, but such a client, named   class="code">botcmd.exe, was also present in the source dump.


 

Tom’s Perspective


 

Analyzing this command-handling mechanism within CARBANAK from a
  binary perspective was certainly challenging. It required maintaining
  tabs for many different views into the disassembly, and a sort of
  textual map of command ids and named pipe names to describe the
  journey of an inbound command through the various pipes and functions
  before arriving at its destination. Figure 10 shows the control flow
  graphs for seven of the named pipe message handling functions. While
  it was difficult to analyze this from a binary reverse engineering
  perspective, having compiled code combined with the features that a
  good disassembler such as IDA Pro provides made it less harrowing than
  Mike’s experience. The binary perspective saved me from having to
  search across several source files and deal with ambiguous function
  names. The disassembler features allowed me to easily follow
  cross-references for functions and global variables and to open
  multiple, related views into the code.


 


 
 
 Figure 10: Control flow graphs for the
    named pipe message handling functions


 

Mike’s Perspective


 

Having source code sounds like cheat-mode for malware analysis.
  Indeed, source code contains much information that is lost through the
  compilation and linking process. Even so, CARBANAK’s tasking component
  (for handling commands sent by the C2 server) serves as a
  counter-example. Depending on the C2 protocol used and the command
  being processed, control flow may take divergent paths through
  different functions only to converge again later and accomplish the
  same command. Analysis required bouncing around between almost 20
  functions in 5 files, often backtracking to recover information about
  function pointers and parameters that were passed in from as many as
  18 layers back. Analysis also entailed resolving matters of C++ class
  inheritance, scope ambiguity, overloaded functions, and control flow
  termination upon named pipe usage. The overall effect was that this
  was difficult to analyze, even in source code.


 

I only embarked on this top-to-bottom journey once, to search for
  any surprises. The effort gave me an appreciation for the baroque
  machinery the authors constructed either for the sake of obfuscation
  or flexibility. I felt like this was done at least in part to obscure
  relationships and hinder timely analysis.


 

Anti-Analysis Mechanisms in Source Code


 

CARBANAK’s executable code is filled with logic that pushes
  hexadecimal numbers to the same function, followed by an indirect call
  against the returned value. This is easily recognizable as obfuscated
  function import resolution, wherein CARBANAK uses a simple string hash
  known as PJW (named after its author, P.J. Weinberger) to locate
  Windows API functions without disclosing their names. A Python
  implementation of the PJW hash is shown in Figure 11 for reference.


 
   
     


          def pjw_hash(s):
      ctr = 0

                for i in range(len(s)):
          ctr = 0xffffffff
            & ((ctr << 4) + ord(s[i]))
          if ctr
            & 0xf0000000:
              ctr = (((ctr &
            0xf0000000) >> 24) ^ ctr) & 0x0fffffff


       

 


              return ctr


 


  Figure 11: PJW hash


 

This is used several hundred times in CARBANAK samples and impedes
  understanding of the malware’s functionality. Fortunately, reversers
  can use the     href="https://www.fireeye.com/blog/threat-research/2012/11/precalculated-string-hashes-reverse-engineering-shellcode.html">flare-ida
  scripts to annotate the obfuscated imports, as shown in Figure 12.


 


 
 
 Figure 12: Obfuscated import resolution
    annotated with FLARE's shellcode hash search


 

The CARBANAK authors achieved this obfuscated import resolution
  throughout their backdoor with relative ease using C preprocessor
  macros and a pre-compilation source code scanning step to calculate
  function hashes. Figure 13 shows the definition of the relevant   class="code">API macro and associated machinery.


 


 
 
 Figure 13: API macro for import resolution


 

The API macro allows the author to type
    API(SHLWAPI, PathFindFileNameA)(…) and
  have it replaced with GetApiAddrFunc(SHLWAPI,
  hashPathFindFileNameA)(…)
. SHLWAPI is
  a symbolic macro defined to be the constant 3, and   class="code">hashPathFindFileNameA is the string hash value
    0xE3685D1 as observed in the disassembly.
  But how was the hash defined?


 

The CARBANAK source code has a utility (unimaginatively named   class="code">tool) that scans source code for invocations of
  the API macro to build a header file
  defining string hashes for all the Windows API function names
  encountered in the entire codebase. Figure 14 shows the source code
  for this utility along with its output file, api_funcs_hash.h.


 


 
 
 Figure 14: Source code and output from
    string hash utility


 

When I reverse engineer obfuscated malware, I can’t help but try to
  theorize about how authors implement their obfuscations. The CARBANAK
  source code gives another data point into how malware authors wield
  the powerful C preprocessor along with custom code scanning and code
  generation tools to obfuscate without imposing an undue burden on
  developers. This might provide future perspective in terms of what to
  expect from malware authors in the future and may help identify units
  of potential code reuse in future projects as well as rate their
  significance. It would be trivial to apply this to new projects, but
  with the source code being on VirusTotal, this level of code sharing
  may not represent shared authorship. Also, the source code is
  accessibly instructive in why malware would push an integer as well as
  a hash to resolve functions: because the integer is an index into an
  array of module handles that are opened in advance and associated with
  these pre-defined integers.


 

Conclusion


 

The CARBANAK source code is illustrative of how these malware
  authors addressed some of the practical concerns of obfuscation. Both
  the tasking code and the Windows API resolution system represent
  significant investments in throwing malware analysts off the scent of
  this backdoor. Tune into tomorrow’s installment of this series for a
  round-up of antivirus evasions, exploits, secrets, key material,
  authorship artifacts, and network-based indicators.


Source: CARBANAK Week Part One: A Rare Occurrence (http://)