As promised in my somewhat lengthy “Hello World” post, although much later than intended, I finally got down to writing a follow-up post.
I remember one of the very first lectures during my uni course – when assembly
language was being introduced. Nothing too deep, really. However, I recall a statement being made, that probably still lives in the minds of the many that heard it (and perhaps even more that didn’t). Namely – that assembly
code is not portable.
In this post we’re going to take a look at how misleading that statement is and explore writing polyglot assembly code.
On portability
First of all let’s take a look at the words portability and polyglot themselves. According to Wikipedia, portability
is the usability of the same software in different environments
whereas polyglot code
is a computer program written in a valid form of multiple programming languages, which performs the same operations or output independent of the programming language used to compile or interpret it
Knowing that, let us all agree – for the sake of this article – that the following is not a far-fetched statement: Polyglot code is a subset of portable code. While polyglot code might be used for different reasons, polyglot machine code most likely won’t.
So, what’s the fuss?
I bet almost everyone (but the developers maybe) appreciates portable software. No matter the architecture, one can just compile what they need and use it. But what if there’s a better way?
Obviously, I left a (blunt) hint in the previous section! Polyglot code is the better way to portability (at least from the end user’s perspective). Have you ever been under the impression that compiling again is not necessary? Well, I have. What if I could just compile my code once and never have to worry about doing it again?
That’s the whole idea behind polyglot machine code: write once – run everywhere.
But before we continue with writing our little polyglot, let’s examine whether we need it at all.
The Code
For our tests we’re going to use the following program, which just reads machine code from a file given as its argument and executes it.
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <limits.h>
#include <stdio.h>
#define MAX_SIZE 2048
char buf[MAX_SIZE];
int main(int argc, char** argv) {
char* filename = argv[1];
int f = open(filename, O_RDONLY);
int size = read(f, buf, MAX_SIZE);
size_t pagesize = getpagesize();
size_t start = ((size_t) buf) & -pagesize;
mprotect((void*) start, MAX_SIZE, PROT_WRITE | PROT_EXEC);
size_t bits = sizeof(void*) * CHAR_BIT;
printf("Compiled for x86-%d\n", bits);
printf("Loaded %d bytes of code\n", size);
void (*fun)(void) = (void (*)(void)) buf;
fun();
return 0;
}
Now, let’s write a piece of assembly that will generate our machine code.
One thing to note before that: as I’m writing this post, I’m sitting on a Linux machine, hence the ABI is different from what you could see in the previous post – for now you don’t have to worry about it, I’m going to cover this in the following post.
BITS 32
mov eax, 4
push word 0x0a74
push dword 0x69623233
mov ebx, 1
mov ecx, esp
mov edx, 6
int 0x80
mov eax, 1
mov ebx, 123
int 0x80
Since we only want to translate our assembly code into machine code there are no sections and global symbols, instead we’re going to turn it into raw bytes using
mewa@sea$ nasm poly32.asm -o poly32 && hexdump -C poly32 && echo --- hex end && ndisasm -b32 poly32
00000000 b8 04 00 00 00 66 68 74 0a 68 33 32 62 69 bb 01 |.....fht.h32bi..|
00000010 00 00 00 89 e1 ba 06 00 00 00 cd 80 b8 01 00 00 |................|
00000020 00 bb 7b 00 00 00 cd 80 |..{.....|
00000028
--- hex end
00000000 B804000000 mov eax,0x4
00000005 6668740A push word 0xa74
00000009 6833326269 push dword 0x69623233
0000000E BB01000000 mov ebx,0x1
00000013 89E1 mov ecx,esp
00000015 BA06000000 mov edx,0x6
0000001A CD80 int 0x80
0000001C B801000000 mov eax,0x1
00000021 BB7B000000 mov ebx,0x7b
00000026 CD80 int 0x80
Let’s compile our tester program for 32 bit x86 architecture and check the result
mewa@sea$ gcc -m32 tester.c -o tester
mewa@sea$ ./tester poly32; echo $?
Compiled for x86-32
Loaded 40 bytes of code
32bit
123
As expected, it worked. Now let’s try it again for x86-64
mewa@sea$ gcc -m64 tester.c -o tester
mewa@sea$ ./tester poly32; echo $?
Compiled for x86-64
Loaded 40 bytes of code
123
Ok, so we got the exit code at least. But where is the string that was supposed to be printed? Let’s analyze what is happening.
- we are running a 64 bit program
- we are then injecting 32 bit machine code into it
- the
exit
syscall is working, since we got the123
status code - the
write
syscall is somehow failing
mewa@sea$ gdb ./tester
(gdb) disas main
...
0x0000000000000815 <+171>:lea 0xbd(%rip),%rdi # 0x8d9
0x000000000000081c <+178>:mov $0x0,%eax
0x0000000000000821 <+183>:callq 0x600 <printf@plt>
0x0000000000000826 <+188>:lea 0x200853(%rip),%rax # 0x201080 <buf>
0x000000000000082d <+195>:mov %rax,-0x8(%rbp)
0x0000000000000831 <+199>:mov -0x8(%rbp),%rax
0x0000000000000835 <+203>:callq *%rax
0x0000000000000837 <+205>:mov $0x0,%eax
...
(gdb) break *main+203
(gdb) disp/4i $rip
(gdb) r poly32
1: x/4i $rip
=> 0x555555554835 <main+203>:callq *%rax
0x555555554837 <main+205>:mov $0x0,%eax
0x55555555483c <main+210>:leaveq
0x55555555483d <main+211>:retq
(gdb) si
1: x/4i $rip
=> 0x555555755080 <buf>:mov $0x4,%eax
0x555555755085 <buf+5>:pushw $0xa74
0x555555755089 <buf+9>:pushq $0x69623233
0x55555575508e <buf+14>:mov $0x1,%ebx
(gdb) si 4
=> 0x555555755093 <buf+19>:mov %esp,%ecx
0x555555755095 <buf+21>:mov $0x6,%edx
0x55555575509a <buf+26>:int $0x80 <-- write syscall
0x55555575509c <buf+28>:mov $0x1,%eax
# let's examine the return code
(gdb) disp $eax
2: $eax = 4
(gdb) si 3
1: x/4i $rip
=> 0x55555575509c <buf+28>:mov $0x1,%eax
0x5555557550a1 <buf+33>:mov $0x7b,%ebx
0x5555557550a6 <buf+38>:int $0x80
0x5555557550a8 <buf+40>:add %al,(%rax)
2: $eax = -14 <-- we got an error! errno == 14
Aha! According to the headers it’s EFAULT
, meaning something is wrong with address of the string passed.
#define EFAULT 14 /* Bad address */
(gdb) info reg esp
esp 0xffffdf0e -8434
(gdb) x/s $esp
0xffffffffffffdf0e: <error: Cannot access memory at address 0xffffffffffffdf0e>
Let’s see. What is the stack pointer register pointing to? The stack obviously. It’s an address, which is 64 bits wide, due to our program having been compiled for x86-64. And esp
is a 32 bit register. That’s why the address pointed to by the lower half of the 64 bit rsp
, which holds the stack pointer in 64 bit programs, is not within our process’ address space and hence invalid!
On a side note – this is obviously a very lucky case, since we hit the backwards compatibility mode and our int 0x80
syscalls were actually called, had it not existed it would’ve result in a total disaster (a.k.a a crash).
Anyway, now that we know it could be useful to distinguish between our runtime architecture let’s get to the whole point of this article.
The polyglot
In order to proceed we’ll need 64 bit specific machine code
BITS 64
mov rax, 1
sub rsp, 0x06
mov word [rsp+4], 0x0a74
mov dword [rsp], 0x69623436
mov rdi, 1
mov rsi, rsp
mov rdx, 6
syscall
mov rax, 60
mov rdi, 123
syscall
Let’s see if it works
mewa@sea$ ./tester poly64; echo $?
Compiled for x86-64
Loaded 49 bytes of code
64bit
123
Now we have 2 separate machine codes and we need run them correspondingly.
In order to detect what architecture we are currently running on, we have to extract the difference in information available from running the same piece of machine code based on the architecture. This implies that such piece of code has to be valid for both x86-32 and x86-64 but at the same time give different results. Sounds tricky? It certainly is if you’re not familiar with the opcodes and their extensions.
But first things first – I haven’t mentioned what machine code reallly is. In order to give you a better view let’s look closely what a CPU is and what it is not. First of all, it’s not magic! It’s entirely possible, although not the most convenient way imaginable, to write a program using nothing but raw sequences of bytes. That’s because a CPU is just an electronic circuit that acts upon instructions given at its input. Based on what instruction is given it performs some computations, such as addition, decrementation, etc. While humans usually talk about instructions in general terms such as “addition”, CPUs are given numbers which describe (encode the information) what action to perform, e.g. for a x86-32 machine inc eax
would be translated to byte 0x40
, whereas inc ebx
to 0x41
. When the CPU encounters 0x40
at its input it will perform incrementation of the eax
register. Such numbers are called opcodes and machine code is just a sequence of opcodes.
Luckily enough if you want to look for a specific opcode you don’t have to dig through AMD’s or Intel’s manuals – there already exist dedicated resources that have this information extracted. If you’re curious you can look here for more information on this subject.
To distinguish between the x86-32 and x86-64 architectures I’m going to use the exact 0x40
opcode. On x86-32, as mentioned above, it simply encodes an incrementation instruction. But on x86-64 it describes a REX prefix, so an opcode that is used to change the behaviour of an instruction. REX prefixes were only introduced on the x86-64 architecture and we’re going to exploit that fact.
We’re going to deliberately place a REX prefix that will change the way our program behaves.
BITS 32
xor eax, eax
db 0x40
mov bh, 1
test eax, eax
jz near x64
x86:
nop ; x86-86 relevant code
x64:
nop ; x86-64 relevant code
Let’s use ndisasm
to discover how our code would be interpreted.
mewa@sea:polyglot$ nasm stub.asm -o stub && echo --- 32 bit && ndisasm -b32 stub && echo --- 64 bit && ndisasm -b64 stub
--- 32 bit
00000000 31C0 xor eax,eax
00000002 40 inc eax
00000003 B701 mov bh,0x1
00000005 85C0 test eax,eax
00000007 0F8401000000 jz near 0xe
0000000D 90 nop
0000000E 90 nop
--- 64 bit
00000000 31C0 xor eax,eax
00000002 40B701 mov dil,0x1
00000005 85C0 test eax,eax
00000007 0F8401000000 jz near 0xe
0000000D 90 nop
0000000E 90 nop
As you can see, even though we have the same sequence of bytes, they will act differently. The 32 bit version will increment the eax
register, while the 64 bit version won’t. We’re then testing against eax and jump to 64 bit code if it’s zero.
Let’s create our fully functional polyglot at last.
Here’s the contents of poly32.asm
BITS 32
xor eax, eax
db 0x40
mov bh, 1
test eax, eax
jz x64
mov eax, 4
push word 0x0a74
push dword 0x69623233
mov ebx, 0
mov ecx, esp
mov edx, 6
int 0x80
mov eax, 1
mov ebx, 123
int 0x80
x64:
and poly64.asm
BITS 64
mov rax, 1
sub rsp, 0x06
mov word [rsp+4], 0x0a74
mov dword [rsp], 0x69623436
xor edi, edi
mov rsi, rsp
xor edx, edx
mov rdx, 6
syscall
mov rax, 60
mov rdi, 123
syscall
Let’s combine it together
mewa@sea$ nasm poly32.asm -o poly32 && nasm poly64.asm -o poly64
mewa@sea$ cat poly32 > poly && cat poly64 >> poly
The last thing remaining on our list is to test our little polyglot and see if it works!
mewa@sea$ gcc -m32 tester.c -o tester && ./tester poly; echo $?
Compiled for x86-32
Loaded 103 bytes of code
32bit
123
mewa@sea$ gcc -m64 tester.c -o tester && ./tester poly; echo $?
Compiled for x86-64
Loaded 103 bytes of code
64bit
123
Great success!
We have successfully constructed our first polyglot!
If you want to give it a try yourself all the sources are available on my GitHub.
As promised, we’ll continue to explore the world of machine code polyglots and in the upcoming post we’re going to take a look at how to handle different operating systems and tackle with their ABIs.
I hope you enjoyed this mini article and as always – I encourage you to discuss further and leave your comments below!