Scrutinizing the Scrutinizer

While conducting an assessment for a client earlier this year we encountered the Plixer Scrutinizer application in use on the internal network. Having never seen this particular application before, a quick search provided the following description:

Plixer Scrutinizer is a network monitoring and analysis appliance that collects, interprets, and contextualizes data from every digital exchange and transaction to deliver insightful network intelligence and security reports.

The product documentation also provided deployment guides for multiple virtual machine platforms, including KVM with a link to download an image (https://docs.plixer.com/projects/plixer-scrutinizer-docs/en/latest/deployment_guides/deploy_virtual/virtual_kvm.html).

Extracting the file system from the KVM QCOW disk can be done a few ways. I chose to utilize the nbd module from qemu-utils, the generic process for doing this is as follows:

# apt-get install qemu-utils
# modprobe nbd max_part=16
# qemu-nbd -c /dev/nbd0 /path/to/image.qcow2

With the new device setup, the partition table can be dumped to identify the disk layout:

# fdisk -l /dev/nbd0
Disk /dev/nbd0: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000a89ae

Device      Boot   Start       End   Sectors Size Id Type
/dev/nbd0p1 *       2048   2099199   2097152   1G 83 Linux
/dev/nbd0p2      2099200 209715199 207616000  99G 8e Linux LVM

The disk image contains two partitions, the first is for system boot and contains the bootloader, kernel, initial file system, while the second contains the system's root file system. The second partition type is Linux LVM, meaning it cannot be mounted directly and requires LVM utilities to access. The first step is to activate the LVM target using the pvscan command:

# pvscan --cache /dev/nbd0p2
  pvscan[1340564] PV /dev/nbd0p2 online.

With the LVM partition activated, the physical volumes can be listed using pvdisplay:

# pvdisplay /dev/nbd0p2
  --- Physical volume ---
  PV Name               /dev/nbd0p2
  VG Name               vg_scrut
  PV Size               <99.00 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              25343
  Free PE               0
  Allocated PE          25343
  PV UUID               qgr177-hDNb-efLX-Y8AB-lPuE-jUvU-ejn2t0

The output shows that the Volume Group (VG) is vg_scrut, vgdisplay can then be used to list the volumes within the VG:

# lvdisplay /dev/vg_scrut
  --- Logical volume ---
  LV Path                /dev/vg_scrut/lv_swap
  LV Name                lv_swap
  VG Name                vg_scrut
  LV UUID                glfyh1-2iiy-K2Ki-h6ii-exyR-Lqda-0qETJy
  LV Write Access        read/write
  LV Creation host, time localhost, 2022-03-16 17:53:56 +0000
  LV Status              available
  # open                 0
  LV Size                4.00 GiB
  Current LE             1024
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1

  --- Logical volume ---
  LV Path                /dev/vg_scrut/lv_root
  LV Name                lv_root
  VG Name                vg_scrut
  LV UUID                uatqDs-i3wS-yHVw-4qe1-hLuD-vfwR-nIBkMe
  LV Write Access        read/write
  LV Creation host, time localhost, 2022-03-16 17:53:56 +0000
  LV Status              available
  # open                 0
  LV Size                20.00 GiB
  Current LE             5120
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2

  --- Logical volume ---
  LV Path                /dev/vg_scrut/lv_db
  LV Name                lv_db
  VG Name                vg_scrut
  LV UUID                ArDzWb-ncPf-1mgJ-TD1u-2Dg1-NKEh-zI42kS
  LV Write Access        read/write
  LV Creation host, time localhost, 2022-03-16 17:53:57 +0000
  LV Status              available
  # open                 0
  LV Size                <75.00 GiB
  Current LE             19199
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

In this case we are looking for the root file system which is contained within lv_root. This partition can be mounted directly using the LV Path value:

# mount  /dev/vg_scrut/lv_root tmp
# ll tmp
total 88
dr-xr-xr-x. 19 root  root   4096 Apr 21  2022 ./
drwxrwxr-x   3 chris chris  4096 Oct 19 18:18 ../
lrwxrwxrwx.  1 root  root      7 Mar 16  2022 bin -> usr/bin/
drwxr-xr-x.  2 root  root   4096 Mar 16  2022 boot/
drwxr-xr-x.  2 root  root   4096 Mar 16  2022 dev/
drwxr-xr-x. 85 root  root   4096 Apr 21  2022 etc/
drwxr-xr-x.  5 root  root   4096 Apr 21  2022 home/
lrwxrwxrwx.  1 root  root      7 Mar 16  2022 lib -> usr/lib/
lrwxrwxrwx.  1 root  root      9 Mar 16  2022 lib64 -> usr/lib64/
drwx------.  2 root  root  16384 Mar 16  2022 lost+found/
drwxr-xr-x.  2 root  root   4096 Apr 11  2018 media/
drwxr-xr-x.  2 root  root   4096 Apr 11  2018 mnt/
drwxr-xr-x.  4 root  root   4096 Apr 21  2022 opt/
drwxr-xr-x.  2 chris chris  4096 Apr 21  2022 plxr_spool/
drwxr-xr-x.  2 root  root   4096 Mar 16  2022 proc/
dr-xr-x---.  4 root  root   4096 Apr 21  2022 root/
drwxr-xr-x.  2 root  root   4096 Mar 16  2022 run/
lrwxrwxrwx.  1 root  root      8 Mar 16  2022 sbin -> usr/sbin/
drwxr-xr-x.  2 root  root   4096 Apr 11  2018 srv/
drwxr-xr-x.  2 root  root   4096 Mar 16  2022 sys/
drwxrwxrwt.  7 root  root   4096 Apr 21  2022 tmp/
drwxr-xr-x. 14 root  root   4096 Apr 21  2022 usr/
drwxr-xr-x. 20 root  root   4096 Apr 21  2022 var/

With the root file system mounted it is now possible to inspect the application content in hopes of identifying vulnerabilities that can be used on the target within the client environment. Initial inspection of the system identified that the application is utilizing Apache with FastCGI. This was identified by reviewing the configuration file /home/scrutinizer/files/conf/httpd-plixer.conf:

# This will hold all the configurations for apache that Plixer makes.
# We will no longer be editing the default httpd.conf file.
...
## FASTCGI SETUP ##
ErrorLogFormat "[%t] [%l] %F: %E: %M"
FcgidIOTimeout 600
FcgidBusyTimeout 600
FcgidMaxProcesses 100
FcgidIdleTimeout 1800
FcgidProcessLifeTime 1800
FcgidMaxRequestLen 52428800
FcgidMinProcessesPerClass 5
FcgidMaxProcessesPerClass 100
FcgidInitialEnv PGDATABASE plixer
FcgidInitialEnv PGHOST localhost
FcgidInitialEnv PGUSER plixer
FcgidInitialEnv PGSSLKEY timber_badger:/usr/share/httpd/.postgresql/postgresql.key
AddType application/x-httpd-fcgi .fcgi
...
...
Alias /fcgi "/home/plixer/scrutinizer/html/fcgi"
<Directory "/home/plixer/scrutinizer/html/fcgi">
      RewriteEngine Off
      Options +ExecCGI
      AllowOverride None
      Order allow,deny
      Allow from all
</Directory>

Within the directory specified inside the Apache configuration file, a single 12mb file was found (scrut_fcgi.fcgi). The file contents can be seen in the following excerpt:

#!/opt/perl-5.34.0/bin/perl
#line 2 "/opt/perl/bin/par.pl"
eval 'exec /usr/bin/perl  -S $0 ${1+"$@"}'
    if 0; # not running under some shell

package __par_pl;

# --- This script must not use any modules at compile time ---
# use strict;
...
...
CORE::exit($1) if ($::__ERROR =~/^_TK_EXIT_\((\d+)\)/);
die $::__ERROR if $::__ERROR;

1;

#line 1006

 __END__
PK<BINARY CONTENT>

This application is written in Perl using the Perl Archive Toolkit (PAR) (https://metacpan.org/pod/PAR) as well as the PAR Crypto filter (https://metacpan.org/pod/PAR::Filter::Crypto).

In practice, this file uses Perl to extract the zip contents attached at the bottom of the file, unpacking to a directory in /tmp/. For instance, the application is extracted to /tmp/par-726f6f74 in the following example:

$ ll /tmp/par-726f6f74/cache-0f9488d5891e440457464a09412b8fd4a393c4a3
total 24
drwxr-xr-x 3 root root 4096 Oct 27 21:03 ./
drwxr-xr-x 3 root root 4096 Oct 27 20:57 ../
-rw-r--r-- 1 root root  178 Oct 26 21:03 _CANARY_.txt
-rw-r--r-- 1 root root 3322 Oct 27 21:03 d4787e12.pl
-rw-r--r-- 1 root root  657 Oct 27 21:03 e52e8794.pl
drwxr-xr-x 4 root root 4096 Oct 27 21:03 inc/
-rw-r--r-- 1 root root    0 Oct 27 21:03 inc.lock

The actual application contents are encrypted using the use Filter::Crypto::Decrypt module:

package main;
#line 1 "script/scrut_fcgi.pl"
use Filter::Crypto::Decrypt;
460aecfc30146bb6acd3f326e386638f66ba2f653bc6b.......

The module responsible for decrypting the application ships within the archive and can be found inside the inc directory:

$ ll /tmp/par-726f6f74/cache-0f9488d5891e440457464a09412b8fd4a393c4a3/inc/lib/auto/Filter/Crypto/Decrypt/
total 28
-r-xr-xr-x 1 root root 24728 May  9 18:09 Decrypt.so

While the source of the Perl module for the Crypto filter is available, I decided to take the approach of analyzing the extracted binary statically, as we often encounter instances where we are forced to analyze binary content that applies encryption and/or obfuscation (practice makes progress).

Within the shared object the function FilterCrypto_FilterDecrypt handles decryption by passing a hardcoded key filter_crypto_pswd into PKCS5_PBKDF2_HMAC_SHA1 with a known 'random' salt value to recreate a known unique password for each call:

EVP_CIPHER_CTX_init(ctx_1);
    if ( EVP_CipherInit_ex(ctx_1, aes_256_cbc, 0LL, 0LL, 0LL, enc) )
    {
      if ( EVP_CIPHER_CTX_set_key_length(ctx_1, 32LL) )
      {
        if ( PKCS5_PBKDF2_HMAC_SHA1(&filter_crypto_pswd, 32LL, in_pass, in_salt, 2048LL, 32LL) == 1 )
        {
          out_buf = 0LL;
          if ( EVP_CipherInit_ex(ctx_1, 0LL, 0LL, hmac_key, iv, enc) )

The hardcoded key material filter_crypto_pswd is stored within the library at offset 0x3A20:

.rodata:0000000000003A20 filter_crypto_pswd db 4Bh, 44h, 0B4h, 75h, 7Eh, 0EEh, 9, 1Dh, 0E6h, 72h, 0FDh; 0
.rodata:0000000000003A20                                         ; DATA XREF: FilterCrypto_FilterDecrypt+6B2↑o
.rodata:0000000000003A2B                 db 85h, 0EAh, 73h, 0B9h, 19h, 7Fh, 0F9h, 84h, 2Ah, 9Eh; 0Bh
.rodata:0000000000003A35                 db 0B3h, 5Ch, 0BBh, 38h, 80h, 9Eh, 49h, 0E7h, 13h, 0E2h; 15h
.rodata:0000000000003A3F                 db 4Eh                  ; 1Fh
.rodata:0000000000003A40 rng_seed        dq 405FC00000000000h    ; DATA XREF: FilterCrypto_PRNGInit+A0↑r

There are a few ways to proceed to retrieve the encrypted content, the documentation page for the module explicitly calls out the shortcomings (https://metacpan.org/pod/Filter::Crypto#WARNING):

None of the above checks are infallible, however, because unless the source code decryption filter module is statically 
linked against the Perl executable then users can always replace the Perl executable being used to run the script with 
their own version, perhaps hacked in such a way as to work around the above checks, and thus with debugging/deparsing 
capabilities enabled. Such a hacked version of the Perl executable can certainly be produced since Perl is open source 
itself.

Looking at how the library works internally; the easiest solution was to hook the SSL import calls using LD_PRELOAD. The LD_PRELOAD environment variable allows users to specify additional shared libraries to be loaded before others, enabling the override of function calls in those later-loaded libraries with custom implementations provided in the LD_PRELOAD libraries. The following example code implements a simple shared object that will print the key material as it is used as well as the decrypted Perl code:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <openssl/conf.h>
#include <openssl/evp.h>
#include <openssl/err.h>
#include <string.h>
#include <syslog.h>
#include <stdio.h>

// gcc evphook.c -o evphook.so -fPIC -shared -ldl -lcrypto

int key_len = 0;
void printHexString(const char* str) {
    int i;
    // Iterate over each character in the string
    for (i=0; i<key_len; i++) {
        // Print the hexadecimal representation of the character
        printf("%02X ", (unsigned char)str[i]);
    }
    printf("\n");
}

//function prototype -  int EVP_CipherUpdate(EVP_CIPHER_CTX *ctx, unsigned char *out,int *outl, const unsigned char *in, int inl);
int EVP_CipherUpdate(EVP_CIPHER_CTX *ctx, unsigned char *out,int *outl, const unsigned char *in, int inl) {
        int (*original_target)(EVP_CIPHER_CTX *ctx, unsigned char *out,int *outl, const unsigned char *in, int inl);
        int ret;

        original_target = dlsym(RTLD_NEXT, "EVP_CipherUpdate");
        ret = original_target(ctx,out,outl,in,inl);
        printf("%s",out);
        return ret;
}

//function prototype -  int EVP_CipherInit_ex(EVP_CIPHER_CTX *ctx, const EVP_CIPHER *type,ENGINE *impl, const unsigned char *key, const unsigned char *iv, int enc);
int EVP_CipherInit_ex(EVP_CIPHER_CTX *ctx, const EVP_CIPHER *type,ENGINE *impl, const unsigned char *key, const unsigned char *iv, int enc) {
    int (*original_target)(EVP_CIPHER_CTX *ctx, const EVP_CIPHER *type,ENGINE *impl, const unsigned char *key, const unsigned char *iv, int enc);
    *(void **)(&original_target) = dlsym(RTLD_NEXT, "EVP_CipherInit_ex");  
    if(key != '\x00'){
            printf("### Decrypt Init:\n#### Key: ");
            printHexString(key);
            printf("#### IV: ");
            printHexString(iv);
    }
    return((*original_target)(ctx,type,impl,key,iv,enc));
}

//function prototype -  int EVP_CIPHER_CTX_set_key_length(EVP_CIPHER_CTX *x, int keylen);
int EVP_CIPHER_CTX_set_key_length(EVP_CIPHER_CTX *x, int keylen) {
    int (*original_target)(EVP_CIPHER_CTX *x, int keylen);
        key_len = keylen;
        *(void **)(&original_target) = dlsym(RTLD_NEXT, "EVP_CIPHER_CTX_set_key_length");
        return((*original_target)(x,keylen));
}

//function prototype -  int EVP_CipherFinal_ex(EVP_CIPHER_CTX *ctx, unsigned char *outm, int *outl);
int EVP_CipherFinal_ex(EVP_CIPHER_CTX *ctx, unsigned char *outm, int *outl) {
        int (*original_target)(EVP_CIPHER_CTX *ctx, unsigned char *outm, int *outl);
       int ret;

        *(void **)(&original_target) = dlsym(RTLD_NEXT, "EVP_CipherFinal_ex");
        ret = original_target(ctx,outm,outl);
        printf(" %s\n##### CipherFinal\n",outm);
        return ret;
}

The compiled shared object is loaded using the LD_PRELOAD environment variable to hook the defined calls and output the decrypted application content:

# LD_PRELOAD="/home/plixer/evphook.so" perl /home/plixer/scrutinizer/html/fcgi/scrut_fcgi.fcgi
### Decrypt Init:
#### Key: 5B 1F 31 FC 73 F8 C5 5F E2 52 DA A2 3C 76 EA DC 0E AB 3A A9 9F 73 C1 E3 49 32 73 D5 17 2F D1 FC
#### IV: AC D3 F3 26 E3 86 63 8F 66 BA 2F 65 3B C6 BA 93 00 FB C2 01 00 00 00 00 61 02 00 00 00 00 00 00
#!/usr/bin/perl
#START #UTF-8#
# http://www.perl.com/pub/2012/04/perlunicook-standard-preamble.html #UTF-8#
use utf8;                       # so literals and identifiers can be in UTF-8 #UTF-8#
use v5.16;                      # or later to get "unicode_strings" feature #UTF-8#
use strict;                     # quote strings, declare variables #UTF-8#
use warnings;                   # on by default #UTF-8#
use warnings qw(FATAL utf8);    # fatalize encoding glitches #UTF-8#
use open qw(:std :utf8);        # undeclared streams in UTF-8 #UTF-8#

#END #UTF-8#

# sanitize known environment variables.
use Plixer::Util::Taint qw( untaint_environment );

BEGIN {
# Bug 24156 - force LANG=en_US.UTF-8 in Scrutinizer
$ENV{LANG} = 'en_US.UTF-8';
untaint_environment();
}

With access to the decrypted application content further testing identified multiple vulnerabilities, which resulted in unauthenticated users being able to compromise the application server and pivot further into the environment. The details of the vulnerabilities can be found in our public disclosure repository:

https://github.com/atredispartners/advisories/blob/master/ATREDIS-2023-0001.md

It is worth noting that Plixer made the disclosure process effortless and were communicative during the process, it was refreshing to work with a vendor who was accepting of our report and prioritized the remediation process.

A LibAFL Introductory Workshop

Intro

Why LibAFL

Fuzzing is great! Throwing randomized inputs at a target really fast can have unreasonable effectiveness with the right setup. When starting with a new target a fuzzing harness can iterate along with your reversing/auditing efforts and you can sleep well at night knowing your cores are taking the night watch. When looking for bugs our time is often limited; any effort spent on tooling needs to be time well spent. LibAFL is a great library that can let us quickly adapt a fuzzer to our specific target. Not every target fits nicely into the "Command-line program that parses a file" category, so LibAFL lets us craft fuzzers for our specific situations. This adaptability opens up the power of fuzzing for a wider range of targets.

Why a workshop

The following material comes from an internal workshop used as an introduction to LibAFL. This post is a summary of the workshop, and includes a repository of exercises and examples for following along at home. It expects some existing understanding of rust and fuzzing concepts. (If you need a refresher on rust: google's comprehensive rust is great.)

There are already a few good resources for learning about LibAFL.

This workshop seeks to add to the existing corpus of example fuzzers built with LibAFL, with a focus on customizing fuzzers to our targets. You will also find a few starter problems for getting hands on experience with LibAFL. Throughout the workshop we try to highlight the versatility and power of the library, letting you see where you can fit a fuzzer in your flow.

Course Teaser

As an aside, if you are interested in this kind of thing (security tooling, bugs, fuzzing), you may be interested in our Symbolic Execution course. We have a virtual session planned for Febuary 2024 with ringzer0. There is more information at the end of this post.

The Fuzzers

The target

Throughout the workshop we beat up on a simple target that runs on Linux. This target is not very interesting, but acts as a good example target for our fuzzers. It takes in some text, line by line, and replaces certain identifiers (like {{XXd3sMRBIGGGz5b2}}) with names. To do so, it contains a function with a very large lookup tree. In this function many lookup cases can result in a segmentation fault.

//...
    const char* uid_to_name(const char* uid) {
        /*...*/ // big nested mess of switch statements
                        switch (nbuf[14]) {
                        case 'b':
                            // regular case, no segfault
                            addr = &names[0x4b9];
                            LOG("UID matches known name at %p", addr);
                            return *addr;
                        /*...*/
                        case '7':
                            // a bad case
                            addr = ((const char**)0x68c2);
                            // SEGFAULT here
                            LOG("UID matches known name at %p", addr); 
                            return *addr;
                        /*...*/

This gives us a target that has many diverting code paths, and many reachable "bugs" to find. As we progress we will adapt our fuzzers to this target, showing off some common ways we can mold a fuzzer to a target with LibAFL.

You can find our target here, and the repository includes a couple variations that will be useful for later examples. ./fuzz_target/target.c

Pieces of a Fuzzer

Before we dive into the examples, let's establish an quick understanding of modern fuzzer internals. LibAFL breaks a fuzzer down into pieces that can be swapped out or changed. LibAFL makes great use of rust's trait system to do this. Below we have a diagram of a very simple fuzzer.

A block diagram of a minimal fuzzer

The script for this fuzzer could be as simple as the following.

while ! [ -f ./core.* ]
do
    head -c 900 /dev/urandom > ./testfile
    cat ./testfile | ./target
done

The simple fuzzer above follows three core steps.

1) Makes a randomized input

2) Runs the target with the new input

3) Keeps the created input if it causes a "win" (in this case a win is crash that produces a core file)

If you miss any of the above pieces, you won't have a very good fuzzer. We all have heard the sad tale of researchers who piped random inputs into their target, got an exciting crash, but were unable to ever reproduce the bug because they didn't save off the test case.

Even with the above pieces, that simple fuzzer will struggle to make any real progress toward finding bugs. It does not even have a notion of what progress means! Below we have a diagram of what a more modern fuzzer might look like.

A block diagram of a fuzzer with feedback

This fuzzer works off a set of existing inputs, which are randomly mutated to create the new test cases. The "mutations" are just a simple set of modifications to the input that can be quickly applied to generate new exciting inputs. Importantly, this fuzzer also uses observations from the executing target to know if a inputs was "interesting". Instead of only caring out crashes, a fuzzer with feedback can route mutated test cases back into the set of inputs to be mutated. This allows a fuzzer to progress by iterating on an input, tracking down interesting features in the target.

LibAFL provides tools for each of these "pieces" of a fuzzer.

There are other important traits we will see as well. Be sure to look at the "Implementors" section of the trait documentation to see useful implementations provided by the library.

Exec fuzzer

Which brings us to our first example! Let's walk through a bare-bones fuzzer using LibAFL.

./exec_fuzzer/src/main.rs

The source is well-commented, and you should read through it. Here we just highlight a few key sections of this simple fuzzer.

//...
        let mut executor = CommandExecutor::builder()
            .program("../fuzz_target/target")
            .build(tuple_list!())
            .unwrap();

        let mut state = StdState::new(
            StdRand::with_seed(current_nanos()),
            InMemoryCorpus::<BytesInput>::new(),
            OnDiskCorpus::new(PathBuf::from("./solutions")).unwrap(),
            &mut feedback,
            &mut objective,
        ).unwrap();

Our fuzzer uses a "state" object which tracks the set of input test cases, any solution test cases, and other metadata. Notice we are choosing to keep our inputs in memory, but save out the solution test cases to disk.

We use a CommandExecutor for executing our target program, which will run the target process and pass in the test case.

//...
        let mutator = StdScheduledMutator::with_max_stack_pow(
            havoc_mutations(),
            9,                 // maximum mutation iterations
        );

        let mut stages = tuple_list!(StdMutationalStage::new(mutator));

We build a very simple pipeline for our inputs. This pipeline only has one stage, which will randomly select from a set of mutations for each test case.

//...
        let scheduler = RandScheduler::new();
        let mut fuzzer = StdFuzzer::new(scheduler, feedback, objective);

        // load the initial corpus in our state
        // since we lack feedback in this fuzzer, we have to force this,
        state.load_initial_inputs_forced(&mut fuzzer, &mut executor, &mut mgr, &[PathBuf::from("../fuzz_target/corpus/")]).unwrap();

        // fuzz
        fuzzer.fuzz_loop(&mut stages, &mut executor, &mut state, &mut mgr).expect("Error in fuzz loop");

With a fuzzer built from a scheduler and some feedbacks (here we use a ConstFeedback::False to not have any feedback except for the objective feedback which is a CrashFeedback), we can load our initial entries and start to fuzz. We use the created stages, chosen executor, the state, and an event manager to start fuzzing. Our event manager will let us know when we start to get "wins".

[jordan exec_fuzzer]$ ./target/release/exec_fuzzer/

[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 1, objectives: 0, executions: 1, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 2, objectives: 0, executions: 2, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 3, objectives: 0, executions: 3, exec/sec: 0.000
[Objective #0] run time: 0h-0m-1s, clients: 1, corpus: 3, objectives: 1, executions: 3, exec/sec: 2.932
[Stats #0] run time: 0h-0m-15s, clients: 1, corpus: 3, objectives: 1, executions: 38863, exec/sec: 2.590k
[Objective #0] run time: 0h-0m-20s, clients: 1, corpus: 3, objectives: 2, executions: 38863, exec/sec: 1.885k
...

Our fragile target quickly starts giving us crashes, even with no feedback. Working from a small set of useful inputs helps our mutations be able to find crashing inputs.

This simple execution fuzzer gives us a good base to work from as we add features to our fuzzer.

Exec fuzzer with custom feedback

We can't effectively iterate on interesting inputs without feedback. Currently our random mutations must generate a crashing case in one go. If we can add feedback to our fuzzer, then we can identify test cases that did something interesting. We will loop those interesting test cases back into our set of cases for further mutation.

There are many different sources we could turn to for this information. For this example, let's use the fuzz_target/target_dbg binary which is a build of our target with some debug output on stderr. By looking at this debug output we can start to identify interesting cases. If a test case gets us some debug output we hadn't previously seen before, then we can say it is interesting and worth iterating on.

There isn't an existing implementation of this kind of feedback in the LibAFL library, so we will have to make our own! If you want to try this yourself, we have provided a template file in the repository.

./exec_fuzzer_stderr_template/

The LibAFL repo provides a StdErrObserver structure we can use with our CommandExecutor. This observer will allow our custom feedback structure to receive the stderr output from our run. All we need to do is create a structure that implements the is_interesting method of the Feedback trait, and we should be good to go. In that method we are provided with the state, the mutated input, the observers. We just have to get the debug output from the StdErrObserver and determine if we reached somewhere new.

impl<S> Feedback<S> for NewOutputFeedback
where
    S: UsesInput + HasClientPerfMonitor,
{
    fn is_interesting<EM, OT>(
        &mut self,
        _state: &mut S,
        _manager: &mut EM,
        _input: &S::Input,
        observers: &OT,
        _exit_kind: &ExitKind
    ) -> Result<bool, Error>
       where EM: EventFirer<State = S>,
             OT: ObserversTuple<S>
    {
        // return Ok(false) for uninteresting inputs
        // return Ok(true) for interesting ones
        Ok(false)
    }
}

I encourage you to try implementing this feedback yourself. You may want to find some heuristics to ignore unhelpful debug messages. We want to avoid reporting too many inputs as useful, so we don't overfill our input corpus. The input corpus is the set of inputs we use for generating new test cases. We will waste lots of time when there are inputs in that set that are not actually helping us dig towards a win. Ideally we want each of these inputs to be as small and quick to run as possible, while exercising a unique path in our target.

In our solution, we simply keep a set of seen hashes. We report an input to be interesting if we see it caused a unique hash.

./exec_fuzzer_stderr/src/main.rs

//...
        fn is_interesting<EM, OT>(
            &mut self,
            _state: &mut S,
            _manager: &mut EM,
            _input: &S::Input,
            observers: &OT,
            _exit_kind: &ExitKind
        ) -> Result<bool, Error>
           where EM: EventFirer<State = S>,
                 OT: ObserversTuple<S>
        {
            let observer = observers.match_name::<StdErrObserver>(&self.observer_name)
                .expect("A NewOutputFeedback needs a StdErrObserver");

            let mut hasher = DefaultHasher::new();
            hasher.write(&observer.stderr.clone().unwrap());
            let hash = hasher.finish();

            if self.hash_set.contains(&hash) {
                Ok(false)
            } else {
                self.hash_set.insert(hash);
                Ok(true)
            }
        }

This ends up finding "interesting" inputs very quickly, and blowing up our input corpus.

...
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 308, objectives: 0, executions: 4388, exec/sec: 2.520k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 309, objectives: 0, executions: 4423, exec/sec: 2.520k
[Objective #0] run time: 0h-0m-1s, clients: 1, corpus: 309, objectives: 1, executions: 4423, exec/sec: 2.497k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 310, objectives: 1, executions: 4532, exec/sec: 2.520k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 311, objectives: 1, executions: 4629, exec/sec: 2.521k
...

Code Coverage Feedback

Relying on the normal side-effects of a program (like debug output, system interactions, etc) is not very reliable for deeply exploring a target. There may be many interesting features that we miss using this kind of feedback. The feedback of choice for many modern fuzzers is "code coverage". By observing what blocks of code are being executed, we can gain insight into what inputs are exposing interesting logic.

Being able to collect that information, however, is not always straight forward. If you have access to the source code, you may be able to use a compiler to instrument the code with this information. If not, you may have to find ways to dynamically instrument your target either through binary modification, emulation, or other sources.

AFL++ provides a version of clang with compiler-level instrumentation for providing code coverage feedback. LibAFL can observe the information produced by this instrumentation, and we can use it for feedback. We have a build of our target using afl-clang-fast. With this build (target_instrumented), we can use the LibAFL ForkserverExecutor to communicate with our instrumented target. The HitcountsMapObserver can use shared memory for receiving our coverage information each run.

You can see our fuzzer's code here.

./aflcc_fuzzer/src/main.rs

//...
        let mut shmem_provider = UnixShMemProvider::new().unwrap();
        let mut shmem = shmem_provider.new_shmem(MAP_SIZE).unwrap();
        // write the id to the env var for the forkserver
        shmem.write_to_env("__AFL_SHM_ID").unwrap();
        let shmembuf = shmem.as_mut_slice();
        // build an observer based on that buffer shared with the target
        let edges_observer = unsafe {HitcountsMapObserver::new(StdMapObserver::new("shared_mem", shmembuf))};
        // use that observed coverage to feedback based on obtaining maximum coverage
        let mut feedback = MaxMapFeedback::tracking(&edges_observer, true, false);

        // This time we can use a fork server executor, which uses a instrumented in fork server
        // it gets a greater number of execs per sec by not having to init the process for each run
        let mut executor = ForkserverExecutor::builder()
            .program("../fuzz_target/target_instrumented")
            .shmem_provider(&mut shmem_provider)
            .coverage_map_size(MAP_SIZE)
            .build(tuple_list!(edges_observer))
            .unwrap();

The compiled-in fork server should also reduce our time needed to instantiate a run, by forking off partially instantiated processes instead of starting from scratch each time. This should offset some of the cost of our instrumentation.

When executed, our fuzzer quickly finds new paths through the process, building up our corpus of interesting cases and guiding our fuzzer.

[jordan aflcc_fuzzer]$ ./target/release/aflcc_fuzzer 

[Stats #0] run time: 0h-0m-0s, clients: 1, corpus: 0, objectives: 0, executions: 0, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 1, objectives: 0, executions: 1, exec/sec: 0.000
[Stats #0] run time: 0h-0m-0s, clients: 1, corpus: 1, objectives: 0, executions: 1, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 2, objectives: 0, executions: 2, exec/sec: 0.000
[Stats #0] run time: 0h-0m-0s, clients: 1, corpus: 2, objectives: 0, executions: 2, exec/sec: 0.000
...
[Testcase #0] run time: 0h-0m-10s, clients: 1, corpus: 100, objectives: 0, executions: 19152, exec/sec: 1.823k
[Objective #0] run time: 0h-0m-10s, clients: 1, corpus: 100, objectives: 1, executions: 19152, exec/sec: 1.762k
[Stats #0] run time: 0h-0m-11s, clients: 1, corpus: 100, objectives: 1, executions: 19152, exec/sec: 1.723k
[Testcase #0] run time: 0h-0m-11s, clients: 1, corpus: 101, objectives: 1, executions: 20250, exec/sec: 1.821k
...

Custom Mutation

So far we have been using the havoc_mutations, which you can see here is a set of mutations that are pretty good for lots of targets.

https://github.com/AFLplusplus/LibAFL/blob/bd12e060ca263ea650ece0a51a355ac714e7ce75/libafl/src/mutators/scheduled.rs#L296

Many of these mutations are wasteful for our target. In order to get to the vulnerable uid_to_name function, the input must first pass a valid_uid check. In this check, characters outside of the range A-Za-z0-9\-_ are rejected. Many of the havoc_mutations, such as the BytesRandInsertMutator, will introduce characters that are not in this range. This results in many test cases that are wasted.

With this knowledge about our target, we can use a custom mutator that will insert new bytes only in the desired range. Implementing the Mutator trait is simple, we just have to provide a mutate function.

//...
    impl<I, S> Mutator<I, S> for AlphaByteSwapMutator
    where
        I: HasBytesVec,
        S: HasRand,
    {
        fn mutate(
            &mut self,
            state: &mut S,
            input: &mut I,
            _stage_idx: i32,
        ) -> Result<MutationResult, Error> {

            /*
                return Ok(MutationResult::Mutated) when you mutate the input
                or Ok(MutationResult::Skipped) when you don't
            */

            Ok(MutationResult::Skipped)
        }
    }

If you want to try this for yourself, feel free to use the aflcc_custom_mut_template as a template to get started.

./aflcc_custom_mut_template/

In our solution we use a set of mutators, including our new AlphaByteSwapMutator and a few existing mutators. This set should hopefully result in a greater number of valid test cases that make it to the uid_to_name function.

//...
        // we will specify our custom mutator, as well as two other helpful mutators for growing or shrinking
        let mutator = StdScheduledMutator::with_max_stack_pow(
            tuple_list!(
                AlphaByteSwapMutator::new(),
                BytesDeleteMutator::new(),
                BytesInsertMutator::new(),
            ),
            9,
        );

Then in our mutator we use the state's source of random to choose a location, and a new byte from a set of valid characters.

//...
        fn mutate(
            &mut self,
            state: &mut S,
            input: &mut I,
            _stage_idx: i32,
        ) -> Result<MutationResult, Error> {
            // here we apply our random mutation
            // for our target, simply swapping a byte should be effective
            // so long as our new byte is 0-9A-Za-z or '-' or '_'

            // skip empty inputs
            if input.bytes().is_empty() {
                return Ok(MutationResult::Skipped)
            }

            // choose a random byte
            let byte: &mut u8 = state.rand_mut().choose(input.bytes_mut());

            // don't replace tag chars '{{}}'
            if *byte == b'{' || *byte == b'}' {
                return Ok(MutationResult::Skipped)
            }

            // now we can replace that byte with a known good byte
            *byte = *state.rand_mut().choose(&self.good_bytes);

            // technically we should say "skipped" if we replaced a byte with itself, but this is fine for now
            Ok(MutationResult::Mutated)
        }

And that is it! The custom mutator works seamlessly with the rest of the system. Being able to quickly tweak fuzzers like this is great for adapting to your target. Experiments like this can help us quickly iterate when combined with performance measurements.

...
[Stats #0] run time: 0h-0m-1s, clients: 1, corpus: 76, objectives: 1, executions: 2339, exec/sec: 1.895k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 77, objectives: 1, executions: 2386, exec/sec: 1.933k
[Stats #0] run time: 0h-0m-1s, clients: 1, corpus: 77, objectives: 1, executions: 2386, exec/sec: 1.928k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 78, objectives: 1, executions: 2392, exec/sec: 1.933k
...

Example Problem

At this point, we have a separate target you may want to experiment with! It is a program that contains a small maze, and gives you a chance to create a fuzzer with some custom feedback or mutations to better traverse the maze and discover a crash. Play around with some of the concepts we have introduced here, and see how fast your fuzzer can solve the maze.

./maze_target/

[jordan maze_target]$ ./maze -p

██████████████
█.██......█ ██
█....██ █.☺  █
██████  █ ██ █
██   ██████  █
█  █  █     ██
█ ███   ██████
█  ███ ██   ██
██   ███  █  █
████ ██  ███ █
█    █  ██ █ █
█ ████ ███ █ █
█          █  
████████████

Found:

  ############
  #          #
# # ### #### #
# # ##  #...@#
# ###  ##.####
#  #  ###...##
##   ## ###..#
######...###.#
##.....#..#..#
#..######...##
#.##.#  ######
#....# ##....#
## #......##.#
[Testcase #0] run time: 0h-0m-2s, clients: 1, corpus: 49, objectives: 0, executions: 5745, exec/sec: 2.585k
Found:

  ############
  #          #
# # ### ####@#
# # ##  #....#
# ###  ##.####
#  #  ###...##
##   ## ###..#
######...###.#
##.....#..#..#
#..######...##
#.##.#  ######
#....# ##....#
## #......##.#
[Testcase #0] run time: 0h-0m-3s, clients: 1, corpus: 50, objectives: 0, executions: 8892, exec/sec: 2.587k

Going Faster

Persistent Fuzzer

In previous examples, we have made use of the ForkserverExecutor which works with the forkserver that afl-clang-fast inserted into our target. While the fork server does give us a great speed boost by reducing the start-up time for each target process, we still require a new process for each test case. If we can instead run multiple test cases in one process, we can speed up our fuzzing greatly. Running multiple testcases per target process is often called "persistent mode" fuzzing.

As they say in the AFL++ documentation:

Basically, if you do not fuzz a target in persistent mode, then you are just doing it for a hobby and not professionally :-).

Some targets do not play well with persistent mode. Anything that changes lots of global state each run can have trouble, as we want each test case to run in isolation as much as possible. Even for targets well suited for persistent mode, we usually will have to create a harness around the target code. This harness is just a bit of code we write to call in to the target for fuzzing. The AFL++ documentation on persistent mode with LLVM is a great reference for writing these kinds of harnesses.

When we have created such a harness, the inserted fork server will detect the ability to persist, and can even use shared memory to provide the test cases. LibAFL's ForkserverExecutor can let us make use of these persistent harnesses.

Our fuzzer using a persistent harness is not much changed from our previous fuzzers.

./persistent_fuzzer/src/main.rs

The main change is in telling our ForkServerExecutor that it is_persistent(true).

//...
        let mut executor = ForkserverExecutor::builder()
            .program("../fuzz_target/target_persistent")
            .is_persistent(true)
            .shmem_provider(&mut shmem_provider)
            .coverage_map_size(MAP_SIZE)
            .build(tuple_list!(edges_observer))
            .unwrap();

The ForkserverExecutor takes care of the magic to make this all happen. Most of our work goes into actually creating an effective harness! If you want to try and craft your own, we have a bit of a template ready for you to get started.

./fuzz_target/target_persistent_template.c

In our harness we want to be careful to reset state each round, so we remain as true to our original as possible. Any modified global variables, heap allocations, or side-effects from a run that could change the behavior of future runs needs to be undone. Failure to clean the program state can result in false positives or instability. If we want our winning test cases from this fuzzer to also be able to crash the original target, then we need to emulate the original target's behavior as close as possible.

Sometimes it is not worth it to emulate the original, and instead use our harness to target deeper surface. For example in our target we could directly target the uid_to_name function, and then convert the solutions into solutions for our original target later. We would want to also call valid_uid in our harness, to ensure we don't report false positives that would never work against our original target.

You can inspect our persistent harness here; we choose to repeatedly call process_line for each line and take care to clean up after ourselves.

./fuzz_target/target_persistent.c

Where previously we saw around 2k executions per second for our fuzzers with code coverage feedback, we are now seeing around 5k or 6k, still with just one client.

[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 171, objectives: 4, executions: 95677, exec/sec: 5.826k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 172, objectives: 4, executions: 96236, exec/sec: 5.860k
[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 172, objectives: 4, executions: 96236, exec/sec: 5.821k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 173, objectives: 4, executions: 96933, exec/sec: 5.863k
[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 173, objectives: 4, executions: 96933, exec/sec: 5.798k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 174, objectives: 4, executions: 98077, exec/sec: 5.866k
[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 174, objectives: 4, executions: 98077, exec/sec: 5.855k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 175, objectives: 4, executions: 98283, exec/sec: 5.867k
[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 175, objectives: 4, executions: 98283, exec/sec: 5.853k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 176, objectives: 4, executions: 98488, exec/sec: 5.866k

In-Process Fuzzer

Using AFL++'s compiler and fork server is not the only way to achieve multiple test cases in one process. LibAFL is an extremely flexible library, and supports all sorts of scenarios. The InProcessExecutor allows us to run test cases directly in the same process as our fuzzing logic. This means if we can link with our target somehow, we can fuzz in the same process.

The versatility of LibAFL means we can build our entire fuzzer as a library, which we can link into our target, or even preload into our target dynamically. LibAFL even supports nostd (compilation without dependency on an OS or standard library), so we can treat our entire fuzzer as a blob to inject into our target's environment. As long as execution reaches our fuzzing code, we can fuzz.

In our example we build our fuzzer and link with our target built as a static library, calling into the C code directly using rust's FFI.

Building our fuzzer and causing it to link with our target is done by providing a build.rs file, which the rust compilation will use.

./inproc_fuzzer/build.rs

//...
    fn main() {
        let target_dir = "../fuzz_target".to_string();
        let target_lib = "target_libfuzzer".to_string();

        // force us to link with the file 'libtarget_libfuzzer.a'
        println!("cargo:rustc-link-search=native={}", &target_dir);
        println!("cargo:rustc-link-lib=static:+whole-archive={}", &target_lib);

        println!("cargo:rerun-if-changed=build.rs");
    }

LibAFL also provides tools to wrap the clang compiler, if you wish to create a compiler that will automatically inject your fuzzer into the target. You can see examples of this in the LibAFL examples.

We will want a harness for this target as well, so we can pass our test cases in as a buffer instead of having the target read lines from stdin. We will use the common interface used by libfuzzer, which has us create a function called LLVMFuzzerTestOneInput. LibAFL even has some helpers that will do the FFI calls for us.

Our harness can be very similar to the one we created for persistent mode fuzzing. We also have to watch out for the same kinds of global state or memory leaks that could make our fuzzing unstable. Again, we have a template for you if you want to craft the harness yourself.

./fuzz_target/target_libfuzzer_template.c

With LLVMFuzzerTestOneInput defined in our target, and a static library made, our fuzzer can directly call into the harness for each test case. We define a harness function which our executor will call with the test case data.

//...
        // our executor will be just a wrapper around a harness
        // that calls out the the libfuzzer style harness
        let mut harness = |input: &BytesInput| {
            let target = input.target_bytes();
            let buf = target.as_slice();
            // this is just some niceness to call the libfuzzer C function
            // but we don't need to use a libfuzzer harness to do inproc fuzzing
            // we can call whatever function we want in a harness, as long as it is linked
            libfuzzer_test_one_input(buf);
            return ExitKind::Ok;
        };

        let mut executor = InProcessExecutor::new(
            &mut harness,
            tuple_list!(edges_observer),
            &mut fuzzer,
            &mut state,
            &mut restarting_mgr,
        ).unwrap();

This easy interoperability with libfuzzer harnesses is nice, and again we see a huge speed improvement over our previous fuzzers.

[jordan inproc_fuzzer]$ ./target/release/inproc_fuzzer 

Starting up
[Stats       #1]  (GLOBAL) run time: 0h-0m-16s, clients: 2, corpus: 0, objectives: 0, executions: 0, exec/sec: 0.000
                  (CLIENT) corpus: 0, objectives: 0, executions: 0, exec/sec: 0.000, edges: 0/37494 (0%)
...
[Testcase    #1]  (GLOBAL) run time: 0h-0m-19s, clients: 2, corpus: 102, objectives: 5, executions: 106146, exec/sec: 30.79k
                  (CLIENT) corpus: 102, objectives: 5, executions: 106146, exec/sec: 30.79k, edges: 136/37494 (0%)
[Stats       #1]  (GLOBAL) run time: 0h-0m-19s, clients: 2, corpus: 102, objectives: 5, executions: 106146, exec/sec: 30.75k
                  (CLIENT) corpus: 102, objectives: 5, executions: 106146, exec/sec: 30.75k, edges: 137/37494 (0%)
[Testcase    #1]  (GLOBAL) run time: 0h-0m-19s, clients: 2, corpus: 103, objectives: 5, executions: 106626, exec/sec: 30.88k
                  (CLIENT) corpus: 103, objectives: 5, executions: 106626, exec/sec: 30.88k, edges: 137/37494 (0%)
[Objective   #1]  (GLOBAL) run time: 0h-0m-20s, clients: 2, corpus: 103, objectives: 6, executions: 106626, exec/sec: 28.32k
...

In this fuzzer we are also making use of a very important tool offered by LibAFL: the Low Level Message Passing (LLMP). This provides quick communication between multiple clients and lets us effectively scale our fuzzing to multiple cores or even multiple machines. The setup_restarting_mgr_std helper function creates an event manager that will manage the clients and restart them when they encounter crashes.

//...
        let monitor = MultiMonitor::new(|s| println!("{s}"));

        println!("Starting up");

        // we use a restarting manager which will restart
        // our process each time it crashes
        // this will set up a host manager, and we will have to start the other processes
        let (state, mut restarting_mgr) = setup_restarting_mgr_std(monitor, 1337, EventConfig::from_name("default"))
            .expect("Failed to setup the restarter!");

        // only clients will return from the above call
        println!("We are a client!");

This speed gain is important, and can make the difference between finding the juicy bug or not. Plus, it feels good to use all your cores and heat up your room a bit in the winter.

Emulation

Of course, not all targets are so easy to nicely link with or instrument with a compiler. In those cases, LibAFL provides a number of interesting tools like libafl_frida or libafl_nyx. In this next example we are going to use LibAFL's modified version of QEMU to give us code coverage feedback on a binary with no built in instrumentation. The modified version of QEMU will expose code coverage information to our fuzzer for feedback.

The setup will be similar to our in-process fuzzer, except now our harness will be in charge of running the emulator at the desired location in the target. By default the emulator state is not reset for you, and you will want to reset any global state changed between runs.

If you want to try it out for yourself, consult the Emulator documentation, and feel free to start with our template.

./qemu_fuzzer_template/

In our solution we first execute some initialization until a breakpoint, then save off the stack and return address. We will have to reset the stack each run, and put a breakpoint on the return address so that we can stop after our call. We also map an area in our target where we can place our input.

//...
        emu.set_breakpoint(mainptr);
        unsafe { emu.run() };

        let pc: GuestReg = emu.read_reg(Regs::Pc).unwrap();
        emu.remove_breakpoint(mainptr);

        // save the ret addr, so we can use it and stop
        let retaddr: GuestAddr = emu.read_return_address().unwrap();
        emu.set_breakpoint(retaddr);

        let savedsp: GuestAddr = emu.read_reg(Regs::Sp).unwrap();

        // now let's map an area in the target we will use for the input.
        let inputaddr = emu.map_private(0, 0x1000, MmapPerms::ReadWrite).unwrap();
        println!("Input page @ {inputaddr:#x}");

Now in the harness itself we will take the input and write it into the target, then start execution at the target function. This time we are executing the uid_to_name function directly, and using a mutator that will not add any invalid characters that valid_uid would have stopped.

//...
        let mut harness = |input: &BytesInput| {
            let target = input.target_bytes();
            let mut buf = target.as_slice();
            let mut len = buf.len();

            // limit out input size
            if len > 1024 {
                buf = &buf[0..1024];
                len = 1024;
            }

            // write our testcase into memory, null terminated
            unsafe {
                emu.write_mem(inputaddr, buf);
                emu.write_mem(inputaddr + (len as u64), b"\0\0\0\0");
            };
            // reset the registers as needed
            emu.write_reg(Regs::Pc, parseptr).unwrap();
            emu.write_reg(Regs::Sp, savedsp).unwrap();
            emu.write_return_address(retaddr).unwrap();
            emu.write_reg(Regs::Rdi, inputaddr).unwrap();

            // run until our breakpoint at the return address
            // or a crash
            unsafe { emu.run() };

            // if we didn't crash, we are okay
            ExitKind::Ok
        };

This emulation can be very quick, especially if we can get away without having to reset a lot of state each run. By targeting a deeper function here we are likely to reach crashes quickly.

...
[Stats #0] run time: 0h-0m-1s, clients: 1, corpus: 54, objectives: 0, executions: 33349, exec/sec: 31.56k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 55, objectives: 0, executions: 34717, exec/sec: 32.85k
[Stats #0] run time: 0h-0m-1s, clients: 1, corpus: 55, objectives: 0, executions: 34717, exec/sec: 31.59k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 56, objectives: 0, executions: 36124, exec/sec: 32.87k
[2023-11-25T20:24:02Z ERROR libafl::executors::inprocess::unix_signal_handler] Crashed with SIGSEGV
[2023-11-25T20:24:02Z ERROR libafl::executors::inprocess::unix_signal_handler] Child crashed! 
[Objective #0] run time: 0h-0m-1s, clients: 1, corpus: 56, objectives: 1, executions: 36124, exec/sec: 28.73k
...

LibAFL also provides some useful helpers such as QemuAsanHelper and QemuSnapshotHelper. There is even support for full system emulation, as opposed to usermode emulation. Being able to use emulators effectively when fuzzing opens up a whole new world of targets.

Generation

Our method of starting with some initial inputs and simply mutating them can be very effective for certain targets, but less so for more complicated inputs. If we start with an input of some javascript like:

if (a < b) {
    somefunc(a);
}

Our existing mutations might result in the following:

if\x00 (a << b) {
    somefu(a;;;;
}

Which might find some bugs in parsers, but is unlikely to find deeper bugs in any javascript engine. If we want to exercise the engine itself, we will want to mostly produce valid javascript. This is a good use case for generation! By defining a grammar of what valid javascript looks like, we can generate lots of test cases to throw against the engine.

A block diagram of a basic generative fuzzer

As you can see in the diagram above, with just generation alone we are no longer using a mutation+feedback loop. There are lots of successful fuzzers that have gotten wins off generation alone (domato, boofuzz, a bunch of weird midi files), but we would like to have some form of feedback and progress in our fuzzing.

In order to make use of feedback in our generation, we can create an intermediate representation (IR) of our generated data. Then we can feed back the interesting cases into our inputs to be further mutated.

So our earlier javascript could be expressed as tokens like:

(if
    (cond_lt (var a), (var b)),
    (code_block
        (func_call some_func,
            (arg_list (var a))
        )
    )
)

Our mutations on this tokenized version can do things like replace tokens with other valid tokens or add more nodes to the tree, creating a slightly different input. We can then use these IR inputs and mutations as we did earlier with code coverage feedback.

A block diagram of a generative fuzzer with mutation feedback

Now mutations on the IR could produce something like so:

(if
    (cond_lt (const 0), (var b)),
    (code_block
        (func_call some_func
            (arg_list
                (func_call some_func,
                    (arg_list ((var a), (var a)))
                )
            )
        )
    )
)

Which would render to valid javascript, and can be further mutated upon if it produces interesting feedback.

if (0 < b) {
    somefunc(somefunc(a,a));
}

LibAFL provides some great tools for getting your own generational fuzzer with feedback going. A version of the Nautilus fuzzer is included in LibAFL. To use it with our example, we first define a grammar describing what a valid input to our target looks like.

./aflcc_custom_gen/grammar.json

With LibAFL we can load this grammar into a NautilusContext that we can use for generation. We use a InProcessExecutor and in our harness we take in a NautilusInput which we render to bytes and pass to our LLVMFuzzerTestOneInput.

./aflcc_custom_gen/src/main.rs

//...
    // our executor will be just a wrapper around a harness closure
    let mut harness = |input: &NautilusInput| {
        // we need to convert our input from a natilus tree
        // into actual bytes
        input.unparse(&genctx, &mut bytes);

        let s = std::str::from_utf8(&bytes).unwrap();
        println!("Trying:\n{:?}", s);

        let buf = bytes.as_mut_slice();

        libfuzzer_test_one_input(&buf);

        return ExitKind::Ok;
    };

We also need to generate a few initial IR inputs and specify what mutations to use.

//...
    if state.must_load_initial_inputs() {
        // instead of loading from an inital corpus, we will generate our initial corpus of 9 NautilusInputs
        let mut generator = NautilusGenerator::new(&genctx);
        state.generate_initial_inputs_forced(&mut fuzzer, &mut executor, &mut generator, &mut restarting_mgr, 9).unwrap();
        println!("Created initial inputs");
    }

    // we can't use normal byte mutations, so we use mutations that work on our generator trees
    let mutator = StdScheduledMutator::with_max_stack_pow(
        tuple_list!(
            NautilusRandomMutator::new(&genctx),
            NautilusRandomMutator::new(&genctx),
            NautilusRandomMutator::new(&genctx),
            NautilusRecursionMutator::new(&genctx),
            NautilusSpliceMutator::new(&genctx),
            NautilusSpliceMutator::new(&genctx),
        ),
        3,
    );

With this all in place, we can run and get the combined benefits of generation, code coverage, and in-process execution. To iterate on this, we can further improve our grammar as we better understand our target.

//...
                  (CLIENT) corpus: 145, objectives: 2, executions: 40968, exec/sec: 1.800k, edges: 167/37494 (0%)
[Testcase    #1]  (GLOBAL) run time: 0h-0m-26s, clients: 2, corpus: 146, objectives: 2, executions: 41229, exec/sec: 1.811k
                  (CLIENT) corpus: 146, objectives: 2, executions: 41229, exec/sec: 1.811k, edges: 167/37494 (0%)
[Objective   #1]  (GLOBAL) run time: 0h-0m-26s, clients: 2, corpus: 146, objectives: 3, executions: 41229, exec/sec: 1.780k
                  (CLIENT) corpus: 146, objectives: 3, executions: 41229, exec/sec: 1.780k, edges: 167/37494 (0%)
[Stats       #1]  (GLOBAL) run time: 0h-0m-27s, clients: 2, corpus: 146, objectives: 3, executions: 41229, exec/sec: 1.755k

Note that our saved solutions are just serialized NautilusInputs and will not work when used against our original target. We have created a separate project that will render these solutions out to bytes with our grammar.

./gen_solution_render/src/main.rs

//...
    let input: NautilusInput = NautilusInput::from_file(path).unwrap();
    let mut b = vec![];

    let tree_depth = 0x45;
    let genctx = NautilusContext::from_file(tree_depth, grammarpath);

    input.unparse(&genctx, &mut b);

    let s = std::str::from_utf8(&b).unwrap();
    println!("{s}");
[jordan gen_solution_render]$ ./target/release/gen_solution_render ../aflcc_custom_gen/solutions/id\:0

bar{{PLvkLizOcGccywcS}}foo

{{EGgkWs-PxeqpwBZK}}foo

bar{{hlNeoKiwMTNfqO_h}}

[jordan gen_solution_render]$ ./target/release/gen_solution_render ../aflcc_custom_gen/solutions/id\:0 | ../fuzz_target/target

Segmentation fault (core dumped)

Example Problem 2

This brings us to our second take home problem! We have a chat client that is vulnerable to a number of issues. Fuzzing this binary could be made easier though good use of generation and/or emulation. As you find some noisy bugs you may wish to either avoid those paths in your fuzzer, or patch the bugs in your target. Bugs can often mask other bugs. You can find the target here.

./chat_target/

As well as one example solution that can fuzz the chat client.

./chat_solution/src/main.rs

-- Ping from    16937944: D�DAAAATt'AAAAPt'%222�%%%%%%9999'pRR9&&&%%%%%2Tt�{�''pRt�'%99999999'pRR9&&&&&&999AATt'%&'pRt�'%TTTTTTTTTTTTTT9999999'a%''AAA��TTt�'% --
-- Error sending message: Bad file descriptor --
[Stats #0] run time: 0h-0m-5s, clients: 1, corpus: 531, objectives: 13, executions: 26752, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-5s, clients: 1, corpus: 532, objectives: 13, executions: 26760, exec/sec: 0.000
-- Ping from    16937944: D�DAAAATT'%'aRt�'%9999'pRR����������T'%'LLLLLLLLLLLa%'nnnnnmnnnT'AA''��'A�'%'p%''A9999'pRR����������'pRR��R�� --
[2023-11-25T21:29:19Z ERROR libafl::executors::inprocess::unix_signal_handler] Crashed with SIGSEGV
[2023-11-25T21:29:19Z ERROR libafl::executors::inprocess::unix_signal_handler] Child crashed!

Conclusion

The goal of this workshop is to show the versatility of LibAFL and encourage its use. Hopefully these examples have sparked some ideas of how you can encorporate custom fuzzers against some of your targets. Let us know if you have any questions or spot any issues with our examples. Alternately, if you have an interesting target and want us to find bugs in it for you, please contact us.

Course Plug

Thanks again for reading! If you like this kind of stuff, you may be interested in our course "Practical Symbolic Execution for VR and RE" where you will learn to create your own symbolic execution harnesses for: reverse engineering, deobfuscation, vulnerability detection, exploit development, and more. The next public offering is in Febuary 2024 as part of ringzer0's BOOTSTRAP24. We are also available for private offerings on request.

More info here. https://ringzer0.training/trainings/practical-symbolic-execution.html

Symbolic Triage: Making the Best of a Good Situation

Symbolic Execution can get a bad rap. Generic symbex tools have a hard time proving their worth when confronted with a sufficiently complex target. However, I have found symbolic execution can be very helpful in certain targeted situations. One of those situations is when triaging a large number of crashes coming out of a fuzzer, especially in cases where dealing with a complicated or opaque target. This is the "Good Situation" I have found myself in before, where my fuzzer handed me a large load of crashes that resisted normal minimization and de-duplication. By building a small symbolic debugger I managed a much faster turnaround time from fuzz-case to full understanding.

In this post I want to share my process for writing symbolic execution tooling for triaging crashes, and try to highlight tricks I use to make the tooling effective and flexible. The examples here all use the great Triton library for our symbolic execution and solving. The examples all use code hosted at: github.com/atredis-jordan/SymbolicTriagePost

(Oh BTW, we have a course!) Do you reverse engineer and symbolically execute in your workflows, or want to?

Are you using fuzzing today but want to find more opportunities to improve it and find deeper and more interesting bugs?

Can you jam with the console cowboys in cyberspace?

We've developed a 4-day course called "Practical Symbolic Execution for VR and RE" that's tailored towards these exact goals. It’s fun and practical, with lots of demos and labs to practice applying these concepts in creative ways. If that sounds interesting to you, there is more information at the bottom of this post. Hope to see you there!

We will be using a bunch of crashes in Procmon64.exe for our examples. Procmon's parsing of PML (Process Monitor Log) files is pretty easy to knock over, and we can quickly get lots of crashes out of a short fuzzing session. It is a large opaque binary with some non-determinism to the crashes, so useful tooling here will help us to speed up our reverse engineering efforts. Note that we weren't exhaustive in trying to find bugs in Procmon; so although these bugs we will talk about here don't appear super useful to an attacker, I won't be opening any untrusted PML files any time soon.

I gathered a bunch of crashes by making a few very small PML files and throwing Jackalope at the target. After a few hours we had 200ish odd crashes to play with. Many of the crashes were unstable, and only reproduced occasionally.

..\Jackalope\Release\fuzzer.exe -iterations_per_round 30 -minimize_samples false -crash_retry 0 -nthreads 32 -in - -resume -out .\out -t 5000 -file_extension PML -instrument_module procmon64.exe -- procmon64.exe /OpenLog @@ /Quiet /Runtime 1 /NoFilter /NoConnect

Fuzzing Procmon's PML paser with Jackalope

A Simple Debugger

With all this hyping up symbolic execution, our first step is to not use Symbolic Execution! Knowing when to turn to symbolic execution and when just to use emulation or a debugger is a good skill to have. In this case, we are going to write a very simple debugger using the Windows debugging API. This debugger can be used to re-run our crashing inputs, find out how stable they are, see if they all happen in the main thread, gather stack traces, etc.

Also, having a programmatic debugger will be very useful when we start symbolically executing. We will talk about that here in a second, first let's get our debugger off the ground.

Quick aside. All my code examples here are in python, because I like being able to pop into IPython in my debuggers. I defined a bunch of ctypes structures in the win_types.py file. I recommend having some programmatic way to to generate the types you need. Look into PDBRipper or cvdump as a good place to start.

Okay, so first we want a debugger that can run the process until it crashes and collect the exception information. The basic premise is we start a process as debugged (our connect_debugger function in triage.py), and then wait on it until we get an unhandled exception. Like so:

    handle, main_tid = connect_debugger(cmd)

    log("process", 3, f": -- ")

    event = dbg_wait(handle, None)
    code = event.dwDebugEventCode

    if code == EXIT_PROCESS_DEBUG_EVENT:
        log("crash", 1, f" Closed with no crash")
    elif code == EXCEPTION_DEBUG_EVENT:
        # exception to investigate
        log("crash", 1, f" crashed:")
        er = event.u.Exception.ExceptionRecord
        log("crash", 1, exceptionstr(handle, er, event.dwThreadId))

    else:
        log("process", 1, f" hit unexpected Debug Event ")

    dbg_kill(handle)

A piece of triage.py's handle_case, running a single test case

Running the above code to get the exception information from a crash

Many of the crashes will not happen every time due to some non-determinism. Running through all our test cases multiple times in our debugger, we can build a picture of which crashes are the most stable, if they stay in the main thread, and what kind of exception is happening.

.\crsh\access_violation_0000xxxxxxxxx008_00000xxxxxxxx5AA_1.PML -- 100% (18) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x520 read at 0x5aa
         EXCEPTION_STACK_BUFFER_OVERRUN(0xc0000409) @ 0x83c
.\crsh\access_violation_0000xxxxxxxxx008_00000xxxxxxxx5AA_2.PML -- 100% (34) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x520 read at 0x5aa
.\crsh\access_violation_0000xxxxxxxxx063_00000xxxxxxxx3ED_1.PML -- 100% (34) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x520 read at 0x3ed
...
.\crsh\access_violation_0000xxxxxxxxx3D4_00000xxxxxxxxED1_2.PML -- 52% (23) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed1
.\crsh\access_violation_0000xxxxxxxxx234_00000xxxxxxxxED4_3.PML -- 45% (22) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x649 read at 0xa2
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed4
.\crsh\access_violation_0000xxxxxxxxx3CA_00000xxxxxxxxED1_1.PML -- 45% (22) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed1
.\crsh\access_violation_0000xxxxxxxxx5EC_00000xxxxxxxx0A2_1.PML -- 45% (22) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed4
.\crsh\access_violation_0000xxxxxxxxx5EF_00000xxxxxxxxF27_1.PML -- 45% (22) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x649 read at 0xa2
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xecb
.\crsh\access_violation_0000xxxxxxxxxB46_00000xxxxxxxxFF4_1.PML -- 44% (18) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed4
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x649 read at 0xa2
.\crsh\access_violation_0000xxxxxxxxx25A_00000xxxxxxxxED4_1.PML -- 38% (21) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x649 read at 0xa2
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xecb
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x19d read at 0x184
...

Gathered information from multiple runs

Let me have another quick aside. Windows exceptions are nice because they can contain extra information. The exception record tells us if an access violation is a read or a write, as well as the pointer that lead to the fault. On Linux, it can be hard to get that information programmatically, as a SEGFAULT is just a SEGFAULT. Here we can use our symbolic execution engine to lift only the faulting instruction. The engine will provide us the missing information on what loads or stores happened, letting us differentiate between a boring NULL read and an exciting write past the end of a page.

Getting our Symbolic Execution Running, and a Few Tricks

We now have a simple debugger working using the Windows debugger API (or ptrace or whatever). Now we can add our symbolic engine into the mix. The game plan is to use our debugger to run the target until our input is in memory somewhere. Then we will mark the input as symbolic and trace through the rest of the instructions in our symbolic engine.

Marking input as “symbolic” here means we are telling our engine that these values are to be tracked as variables, instead of just numbers. This will let the expressions we see all be in terms of our input variables, like “rax: (add INPUT_12 0x12)” instead of just “rax: 0x53”. A better term would be “concolic” (concrete-symbolic) because we are still using the actual value of these input bytes, just adding the symbolic information on top of them. I just use the term symbolic in this post, though.

Our debugger will tell us when we reach the exception. From there we should be able to inspect the state at the crash in terms of our symbolic input. For an access violation we hope to see that the pointer dereferenced is symbolically "(0xwhatever + INPUT_3c)" or some other symbolic expression, showing us what in our input caused the crash.

This information is useful for root causing the crash (we will see a couple cool tricks for working with this information in the next section). We gather this symbolic info so we can take the constraints that kept us on the crashing path, along with our own constraints, and send those to a solver. Using the solver we can ask "What input would make this pointer be X instead?" This lets us quickly identify a Write-What-Where from a Read8-AroundHere, or a Write-That-ThereGiveOrTake100. We can break our symbolic debugger at any point in a trace and use the solver to answer our questions.

Stopping at a breakpoint, and seeing what input would make RBX equal 0x12340000 here

Note: I should point out that it isn't strictly necessary to use a debugger at all. We could just load procmon64.exe and it's libraries into our symbolic execution engine, and then emulate the instructions without a debugger's help. If you see the great examples in the Triton repo, you will notice that none of them step along with a debugger. I like using a symbolic execution engine alongside a debugger for a couple of reasons. I’ll highlight a few of those reasons in the following paragraphs.

The main reason is probably to avoid gaslighting myself. With a debugger or a concrete execution trace I have a ground truth I can follow along with. Without that it is easy to make a mistake when setting up our execution environment and not realize until much later. Things like improperly loading libraries, handling relocations, or setting up the TEB and PEB on windows. By using a debugger, we can just setup our execution environment by pulling in chunks of memory from the actual process. We can also load the memory on demand, so we can save time on very large processes. In our example we load the memory lazily with Triton's GET/SET_CONCRETE_MEMORY_VALUE callbacks.

def tri_init(handle, onlyonsym=False, memarray=False):
    # do the base initialization of a TritonContext

    ctx = TritonContext(ARCH.X86_64)
    ctx.setMode(MODE.ONLY_ON_SYMBOLIZED, onlyonsym)
    if memarray:
        ctx.setMode(MODE.MEMORY_ARRAY, True)
    else:
        ctx.setMode(MODE.ALIGNED_MEMORY, True)
    ctx.setMode(MODE.AST_OPTIMIZATIONS, True)

    # set lazy memory loading
    def getmemcb(ctx, ma):
        addr = ma.getAddress()
        sz = ma.getSize()
        # will only load pages that have not been previously loaded
        tri_load_dbg_mem(ctx, handle, addr, sz, False)

    def setmemcb(ctx, ma, val):
        addr = ma.getAddress()
        sz = ma.getSize()
        # will only load pages that have not been previously loaded
        tri_load_dbg_mem(ctx, handle, addr, sz, True)

    ctx.addCallback(CALLBACK.GET_CONCRETE_MEMORY_VALUE, getmemcb)
    ctx.addCallback(CALLBACK.SET_CONCRETE_MEMORY_VALUE, setmemcb)

    return ctx

Setting up Triton in triage.py

The debugger also lets us handle instructions that are unknown to our symbolic execution engine. For example, Triton does not have a definition for the 'rdrand' instruction. By single stepping alongside our debugger, we can simply fix up any changed registers when we encounter unknown instructions. This could lead to a loss of symbolic information if the instruction is doing something with our symbolic inputs, but for the most part we can get away with just ignoring these instructions.

Lastly, using our debugger gives us another really nice benefit; we can just skip over whole swaths of irrelevant code! We have to be very careful with what we mark as irrelevant, because getting it wrong can mean we lose a bunch of symbolic information. With procmon, I marked most of the drawing code as irrelevant. When hitting one of these imports from user32 or gdi32, we place a breakpoint and let the debugger step over those calls, then resume single stepping with Triton. This saves a ton of time, as symbolic execution is orders of magnitude slower than actual execution. Any irrelevant code we can step over can make a huge difference.

Without a debugger we can still do this, but it usually involves writing hooks that will handle any important return values or side effects from those calls, instead of just bypassing them with our debugger. Building profiling into our tooling can help us identify those areas of concern, and adjust our tooling to gain back some of that speed.

    # skip drawing code
    if skip_imports:
        impfuncs = dbg_get_imports_from(handle, base, ["user32.dll", "gdi32.dll", "comdlg32.dll", "comctl32.dll"])
        for name in impfuncs:
            addr = impfuncs[name]

            # don't skip a few user32 ones
            skip = True
            for ds in ["PostMessage", "DefWindowProc", "PostQuitMessage", "GetMessagePos", "PeekMessage", "DispatchMessage", "GetMessage", "TranslateMessage", "SendMessage", "CallWindowProc", "CallNextHook"]:
                if ds.lower() in name.lower():
                    skip = False
                    break

            if skip:
                hooks[addr] = (skipfunc_hook, name)

Skipping unneeded imports in triage.py

For our target, skipping imports wasn't enough. We were still spending lots of time in loops inside the procmon binary. A quick look confirmed that these were a statically included memset and memcpy. We can't just skip over memcpy because we will lose the symbolic information being copied. So for these two, we wrote a hook that would handle the operation symbolically in our python, without having to emulate each instruction. We made sure that copied bytes got a copy of the symbolic expression in the source data.

    for i in range(size):
        sa = MemoryAccess(src + i, 1)
        da = MemoryAccess(dst + i, 1)
        cell = ctx.getMemoryAst(sa)
        expr = ctx.newSymbolicExpression(cell, "memcpy byte")
        ctx.assignSymbolicExpressionToMemory(expr, da)

Transfering symbolic information in our memcpy hook

These kind of hooks not only save us time, but they are a great opportunity to check the symbolic arguments going into the memcpy or memset. Even if the current trace is not going to crash inside of the memcpy, we have the ability to look at those symbolic arguments and ask "Could this memcpy reach unmapped memory?" or "Could the size argument be unreasonably large?". This can help us find other vulnerabilities, or other expressions of the issues we are already tracing. Below is a small check that tries to see if the end of a memcpy's destination could be some large amount away.

        astctx = ctx.getAstContext()
        cond = ctx.getPathPredicate()
        # dst + size
        dstendast = ctx.getRegisterAst(ctx.registers.rcx) + ctx.getRegisterAst(ctx.registers.r8)
        # concrete value of the dst + size
        dstendcon = dst + size
        testpast = 0x414141
        cond = astctx.land([cond, (dstendcon + testpast) <= dstendast])

        log("hook", 5, "Trying to solve for a big memcpy")
        model, status, _ = ctx.getModel(cond, True)
        if status == SOLVER_STATE.SAT:
            # can go that far
            # this may not be the cause of our crash though, so let's just report it, not raise it
            log("crash", 2, "Symbolic memcpy could go really far!")

A simple check in our memcpy hook from triage.py

The tradeoff of these checks is that invoking the solver often can add to our runtime, and you probably don't want them enabled all the time.

At this point we have most of what we need to run through our crashing cases and start de-duplicating and root-causing. However, some of our access violations were still saying that the bad dereference did not depend on our input. This didn't make sense to me, so I suspected we were losing symbolic information along the way somehow. Sometimes this can happen due to concretization of pointers, so turning on Triton's new MEMORY_ARRAY mode can help us recover that information (at the cost of a lot of speed).

In this case, however, I had my tooling print out all the imported functions being called along the trace. I wanted to see if any of the system calls on the path were causing a loss of symbolic information. Or if there was a call that re-introduced the input without it being symbolized. I found that there was another second call to MapViewOfFile that was remapping our input file into memory in a different location. With a hook added to symbolize the remapped input, all our crashes were now reporting their symbolic relation to the input correctly!

Our tooling showing an AST for one of the bad dereferences

Using our Symbolic Debugger

Cool! Now we have symbolic information for our crashes. What do we do with it?

Well first, we can quickly group our crashes by what input they depend on. This is quite helpful; even though some issues can lead to crashes in multiple locations, we can still group them together by what exact input is lacking bounds checks. This can help us understand a bug better, and also see different ways the bug can interact with the system.

By grouping our crashes, it looked like our 200ish crashes boil down to four distinct groups: three controlled pointers being read from and one call to __fastfail.

One neat tool Triton gives us is backward slicing! Because Triton can keep a reference of the associated instruction when building it's symbolic expressions, we can generate a instruction trace that only contains the instructions relevant to our final expression. I used this to cut out most code along the trace as irrelevant, and be able to walk just the pieces of code between the input and the crash that were relevant. Below we gather relevant instructions that created the bad pointer that was dereferenced in one of our crashes.

def backslice_expr(ctx, symbexp, print_expr=True):
    # sort by refId to put things temporal
    # to get a symbolic expression from a load access, do something like:
    # symbexp = inst.getLoadAccess()[0][0].getLeaAst().getSymbolicExpression()
    items = sorted(ctx.sliceExpressions(symbexp).items(), key=lambda x: x[0])
    for _, expr in items:
        if print_expr:
            print(expr)
        da = expr.getDisassembly()
        if len(da) > 0:
            print("\t" if print_expr else "", da)

A back-slicing helper in triage.py-

Back-slicing with the above code

Being able to drop into the IPython REPL at any point of the trace and see the program state in terms of my input is very helpful during my RE process.

For the call to __fastfail (kinda like an abort for Windows), we don't have a bad dereference to back-slice here, instead we have the path constraints our engine gathered. These constraints are grabbed any time the engine sees that we could symbolically go either way at a junction. To stay tied to our concrete path, the engine notes down the condition required for our path. For example: if we take a jne branch after having compared the INPUT_5 byte against 0, the engine will add a path constraint saying "If you want to stay on the path we took, make sure INPUT_5 is not 0", or "(not (= INPUT_5 (_ bv0 8))" in AST-speak.

These path constraints are super useful. We can use them to generate other inputs that would go down unexplored paths. There are lots of nice symbolic execution tools that use this to help a fuzzer by generating interesting new inputs. (SymCC, KLEE, Driller, to name three)

In our case, we can inspect them to find out why we ended up at the __fastfail. By just looking at the most recent path constraint, we can see where our path forked off most recently due to our input.

The most recent path constraint leading to the __fastfail

The disassembly around the most recent path constraint

The path constraint tells us that the conditional jump at 0x7FF7A3F43517 in the above disassembly is where our path last forked due to one of our input values. When we follow the trace after this fork, we can see that it always leads directly to our fatal condition. To get more information on why the compare before the fork failed, I dropped into an IPython shell for us at that junction. Our tooling makes it easy to determine the control we have over the pointer in RCX being dereferenced before the branch. That makes this another crash due to a controlled pointer read.

Showing the AST for the dereferenced register before the branch

Where To Go From Here

So from here we have a pretty good understanding of why these issues exist in procmon64.exe. Digging in a little deeper into the crashes shows that they are probably not useful for crafting a malicious PML file. If I wanted to keep going down this road, my next steps would include:

  • Generating interesting test cases for our fuzzer based off the known unchecked areas in the input

  • Identifying juicy looking functions in our exploit path. With our tooling we can gather information on what control we have in those functions. With this information we can start to path explore or generate fuzz cases that follow our intuition about what areas look interesting.

  • Patch out the uninteresting crash locations, and let our fuzzer find better paths without being stopped by the low-hanging fruit.

  • Generate a really small crashing input for fun.

The official policy of Microsoft is "All Sysinternals tools are offered 'as is' with no official Microsoft support." We were unable to find a suitable place to report these issues. If anyone with ties to the Sysinternals Suite wants more information, please contact us.

Hope this Helped! Come Take the Course!

I hope this post helped you see useful ways in which creative symbolic execution tooling could help their workflow! If anyone has questions or wants to talk about it, you can message me @jordan9001 or at jordan.whitehead@atredis.com.

If you got all the way to the end of this post, you would probably like our course!

"Practical Symbolic Execution for VR and RE" is hands-on. I had a lot of fun making it, and we look at a variety of ways to apply these concepts. Students spend time becoming comfortable with a few frameworks, deobfuscating binaries, detecting time-of-check time-of-use bugs, and other interesting stuff.

You can give us your information below, and we will email you when we next offer a public course. (No spam or anything else, I promise.)

If you have a group that would be interested in receiving this kind of training privately, feel free to contact us about that as well! I’d like to see you in a class sometime!

Thanks!

Part 1: Ransomware – To Pay or Not to Pay

The consultants here at Atredis Partners have delivered a lot of Incident Response table-top exercises over the years and personally, I learn something new nearly every time. Sure, the basic premise stays the same, but every client / organization is different, not only because of the idiosyncrasies of their industry verticals and unique business requirements, but also because their employees bring their own personal experiences and perspectives to the table.

Given the prevalence of ransomware attacks over the last few years, with what seems to be no slowing down, many clients are reaching out to us seeking focused ransomware incident response table-top exercises to better understand their ability to detect, respond, and manage a ransomware incident. In this blog, the first of three focused on ransomware, we will address one of the key questions that many organizations are not thinking about, or at least maybe not considering deeply enough. 

While some organizations realize the very real threat that ransomware attacks pose to their operations and are asking good questions to help understand their preparedness, we have found that the questions being asked are usually falling short of the questions that really need to be asked…the challenging questions and maybe the questions people just are not thinking about. The typical questions most organizations want to be addressed are:

  • What are we doing to prevent or limit our exposure to ransomware attacks?

  • Can we detect a ransomware attack?

  • How quickly could we respond to a ransomware attack?

  • How would we mitigate an attack?

  • Can we recover from an attack?

Occasionally, this question comes up:

  • Would we pay an attacker if we determined that we could not contain or recover from an attack that is having a major impact on our business operations?

That last question regarding paying an attacker is one that is not being asked enough because it is a complex question to answer. But even with that, we are only scratching the surface of what businesses should be asking as it relates to a ransomware attack and the potential decision about making a ransom payment to attackers.

A common line of thinking (even recommended from some sources (including the FBI – see link) is that “an organization should never pay the ransom” because that only elicits more criminal behavior, and is somehow seen as taking the easy way out, making it worse for others down the line.

That sounds good in an academic sense; however, the real world is much different, and who are we to advise a CEO of any company that is facing a potentially financially devastating ransomware attack scenario due to an ongoing outage situation that the ransom should not be paid. Imagine scenarios where an organization is not able to provide critical services such as emergency patient care because of a ransomware attack. Should that organization be faced with the decision of patient harm or lives lost due to a hard stance on never paying the ransom?  Like much of what we provide guidance on as Risk and Advisory Consultants, making this type of decision is a risk-based one that only an organization’s leadership can make.

Our job is to help make sure it is a well-informed decision made by the right people within the organization based on the business mission and risk tolerance, and not a decision made by public opinion.

Even as some organizations are starting to consider how to answer the tough questions about paying (or not paying) a ransom and what leads to that decision, it still may not be enough to be fully prepared.

Let’s say that you have talked with your leadership about key decision factors and as to whether your organization would pay attackers in the event of a ransomware incident. Your leadership has decided that if certain criteria were met, they would, in fact, make the difficult decision to pay attackers a ransom to restore business operations impacted by the attack. You have documented processes, procedures, and workflow Visios to drive that decision.

This is a great start, but it is only the beginning. Once an organization has made the decision to pay a ransom, there are many other actions that need to be executed after that decision that must be considered well in advance.

The first thing the business needs to consider is – if it is going to pay a ransom, how is the ransom actually paid? You might think this is easy… just pay the attackers some Bitcoin, right? Well, probably yes, but it’s not that simple. Making a Bitcoin payment, or any cryptocurrency payment is not as easy as buying your favorite things from Amazon. Although many attackers request Bitcoin as a ransom payment, some may ask for other types of cryptocurrencies.

Another key question to consider is: does your organization want to make the payment on its own, or will it want to leverage an outside firm that specializes in this type of service? We typically advise utilizing an experienced outside firm for these reasons:

  • The organization’s cyber liability insurance may require it.

  • Many of the above considerations are managed by the firm’s experts.

  • Additionally, the expertise of an outside firm to handle things like negotiations and data recovery is invaluable.

  • There are plenty of reputable firms that specialize in helping organizations navigate critical ransomware payment activities, and they are generally much better suited to manage these activities on your behalf. ­­­

While we recommend leveraging an outside firm, there may be cases when managing the payment in-house is the right option for your organization. If the decision is made to try and make the payment without assistance from outside experts, then there are other questions that need to be considered well ahead of time, so all necessary preparations have been made:

1.     How does an organization obtain cryptocurrency?

a. You will need to establish a crypto wallet through an established service. There are several to choose from and many considerations needed to select the one that will meet your needs.

b. A critical component to keep in mind is that once you establish a crypto wallet, it will take 3-5 days to exchange your traditional currency into cryptocurrency.

c.  Other less desirable options include using cryptocurrency ATMs, but due to certain limitations, this will likely not meet your needs in the scenarios we are evaluating here.

2.     How much cryptocurrency is typically needed? 

a. This will be different based on risk tolerance and will require research to determine the right amount to maintain in a crypto wallet.

b. Remember that any cryptocurrency obtained will be subject to the ebbs and flows of the market, so you are essentially gambling that your funds will remain and hopefully not disappear.

3.     How does an organization manage the wallet/cryptocurrency?

a. Not to be forgotten here is considering who within the organization is going to manage the wallet and cryptocurrency. This could be a significant amount of money.

b. It needs to be managed responsibly, and likely under the control of more than one individual.

4.     Should an organization negotiate with the attackers before making a ransom payment?

a.  This may or may not be feasible, but in either case, negotiation planning, and terms should be outlined well in advance.

b. The organization would need to research the legalities and determine how and when to notify the FBI, etc.  

5.     How does an organization execute a ransom payment? 

a. This is not as simple as it seems. There are many things to consider at the tactical level to execute a payment.

i. What accounts or email addresses do we use to make the payment transaction?

ii. Which device do we make the payment transaction from?

b. Should that device be internal or external to our network?

i. Do we need to install and/or use a TOR browser (or similar) for making the payment?

6.     Once an organization makes the payment, how do they ensure that decryption tools are provided in return? 

a. Once payment is made, there is still work to be done to recover.

b. Once the decryption keys/tools are provided, the organization will need to recover systems, and consider all the caveats that go along with recovery.

At a minimum, organizations should at least start asking these questions and thoughtfully making decisions well in advance of an actual ransomware attack.

Anyone who has managed a challenging incident response scenario of any kind knows that the time to make critical decisions such as these is not during a stressful real time incident. In our next blog in this series, we’ll dive into what it means to be “ready” for a Ransomware event….and not just “ready”, but ”REALLY ready”.


This blog post was written by Bill Carver with support from Kiston Finney and Taryn Yager, then edited for the web by Lacey Kasten at Atredis Partners.

This post is Part 1 of a series crafted by the Risk Advisory consultants at Atredis Partners. As the other parts are published, we will update this post with relevant links to the other parts of the series.

Researching Crestron WinCE Devices

The setup

In the past, I’ve come across Crestron devices on corporate networks. They are enterprise appliances for handling and automating audio/visual data and peripherals. Think conferencing or display automation. With default or weak credentials, the appliances can be accessed over several ports including FTP, Telnet, SSH, and others.

I acquired several current-gen Crestron 3-series devices through an auction at my local university. They were upgrading their gear and getting rid of what they had that was out of warranty, even if it was still perfectly good.

I soon realized, though, that they are little more than sophisticated input switchers if you also don’t have Crestron Toolbox. This limitation in the enterprise-grade Crestron devices is a common problem. I started searching on how to actually use the devices after getting them home and quickly found Reddit and other posts opining the same issue. The devices were expensive manual switchers without Crestron Toolbox.

Crestron Toolbox and its ancillary Simpl framework and IDE enable Crestron administrators to develop applications to automate not only Crestron devices, but audio/visual devices by other manufacturers as well. Crestron Toolbox and Simpl installers are very hard to get a hold of if you are not a Crestron vendor, and it is held closely with an NDA.

The Simpl IDE created by Crestron allows an administrator to develop applications visually without code for particular Crestron devices. If you are feeling brave though, you can also begin writing modules of re-usable functions in a language called Simpl+. Simpl+ gets converted into C# and compiled into a sandbox. This sandbox is effectively a set of methods exposed to Simpl+ that Crestron controls fully for file system or other access.

Simpl applications are signed using a special Crestron certificate. This prevents an arbitrary application from being loaded onto the devices. The code must have been compiled by the Simpl IDE in order to be run on the devices.

The research

By default, command prompt access to Crestron devices is sandboxed. Even as an administrator, you do not have full file system access. My 3-Series devices were running Windows CE 7.0 with .NET Compact Framework 3.5. With authentication disabled, the default credentials are crestron:<blank>. I’m not much of a hardware person so I made no attempt to open up the devices. I wanted to do purely remote research. I started with nmap.

After logging into Telnet and SSH and looking around the exposed file system (while also navigating Crestron PDFs), I decided to focus on the Simpl applications. Simpl applications have a lpz extension and are just zip files.

Attempting to replace specific executables within the lpz file didn’t work, but I’m not sure this is still a dead end. I attempted this testing before I fully understood the CPU architecture and executable requirements of Windows CE. It may still be possible to make a quick swap and achieve a similar sandbox breakout. In the end, the code you compile from Simpl ends up as a signed .NET DLL in the lpz application, loaded by a signed harness application.

The sandbox is enforced at compile-time, not while running on the device. This may seem counter-intuitive, but it effectively means the functions exposed to Simpl+ at compile-time are controlled by Crestron, not the .NET Framework. Listing directories in Simpl will show you the directories Crestron wants to show you, not what is actually there. In order to sign my own C# code, I started investigating how Simpl went from Simpl+ code -> C# -> compiled DLL -> signed DLL.

During compilation, Crestron Toolbox creates a working directory which it outputs the new C# before compiling and signing. I used ProcMon to monitor what processes were spawned during the compilation process and csc.exe is called directly on the C# code in the working directory. I ended up writing a small wrapper for csc.exe that would copy my C# code (altered from a copy of the real C# generated) over the generated C# before passing the args along to the real csc.exe to compile the tampered-with code. It worked and I had a signed lpz file with my own C# code that would run outside of the intended application sandbox.

There may be more issues with signing in general, but I stopped looking once I could sign my own code outside of the sandbox.

The results

There is no real usable command shell on these devices outside of the sandbox shell you log in to. I had arbitrary code execution, but writing new code and deploying a new application is more than tedious when you want to change something slightly. Unfortunately, there don’t seem to be any cookie-cutter WinCE Metasploit payloads.

I ended up writing a small connect-back shell with basic functionality. This shell is uploaded as a signed application in Github, along with the source code. You can load this application without Crestron Toolbox, you only need remote authenticated access. You can see the difference immediately in the exposed file system to the connect-back shell vs accessing over SSH or telnet.

The potential

Many Crestron devices of this nature have two interfaces, one for LAN and one for Control. If configured with both a LAN and Control port, they could make excellent pivot points. These are also Windows machines, but aren’t running any host-based IDS or AV. They would be an great location for persistence.