ARM exploitation for IoT – Episode 1

Introduction and motivation

Few weeks ago while attending a conference I noticed that the proposed ARM exploitation course for IoT price tag was quite substantial and decided to write my own, to allow those who can’t to spend that much to still be able to study the topic. I will present this course in three different episodes.

Surely these articles are not comparable to a live course, but still I feel like making my own small contribution.

The content will be divided as follows:
– Episode 1: Reversing ARM applications
– Episode 2: ARM shellcoding
– Episode 3: ARM exploitations

Episode 1: Reversing ARM applications

Environment: Raspberry pi 3

I have chose a very cheap and easy configurable environment, probably Android could be another good options.

HARDWARE:

This is the exact model I used for tests:

  • Raspberry Pi 3 Model B ARM-Cortex-A53

SOFTWARE:

These are some information regarding the software used for the 3 episodes


[email protected]:/home/pi# cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 8 (jessie)"
NAME="Raspbian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

[email protected]:/home/pi# cat /etc/rpi-issue
Raspberry Pi reference 2017-03-02
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, f563e32202fad7180c9058dc3ad70bfb7c09f0fb, stage2

For the operating system installation look at the following link

https://www.raspberrypi.org/documentation/installation/installing-images/linux.md

The following link to configure a remote access via ssh

https://www.raspberrypi.org/documentation/remote-access/ssh/

COMPILER

For all the code(C, C++, assembly) we will use the Gnu Compiler Collection (GCC), the Raspbian operating system include it.

The version of the GCC is

[email protected]:/home/pi/arm/episode1# gcc --version
gcc (Raspbian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

One important thing to know about the compiler is that the GCC directives are different from those used by others compiler. I suggest you take a look at these directive, for example from here http://www.ic.unicamp.br/~celio/mc404-2014/docs/gnu-arm-directives.pdf

Source code

All the code that has been used for this episode can be found on my github. I created the following repository https://github.com/invictus1306/ARM-episodes/tree/master/Episode1

Compiler options

Compiler options are important to know and understand, in this section we will see 3 different options and for each option a practical example will be made.

This is our source code that we will use for all the compiler options (file: compiler_options.c)

#include <stdio.h>
#include <string.h>

static char password[] = "compiler_options";

int main()
{
  char input_pwd[20]={0};

  fgets(input_pwd, sizeof(input_pwd), stdin);

  int size = sizeof(password);

  if(input_pwd[size] != 0)
  {
    printf("The password is not correct! \n");
    return 0;
  }

  int ret = strncmp(password, input_pwd, size-1);

  if (ret==0)
  {
    printf("Good done! \n");
  }
  else
  {
    printf("The password is not correct! \n");
  }

  return 0;
}

Debugging symbols

The option -g produce debugging information (symbols table), that are stored in the executable.
Compile our example (compiler_options.c) with without -g option and with the -g option, in order to compare the sizes of the two ELF files.


[email protected]:/home/pi/arm/episode1# gcc -o compiler_options compiler_options.c

[email protected]:/home/pi/arm/episode1# ls -l

total 12

-rwxr-xr-x 1 root root 6288 Jun 14 20:21 compiler_options

-rw-r--r-- 1 root root 488 Jun 14 19:41 compiler_options.c

[email protected]:/home/pi/arm/episode1# gcc -o compiler_options compiler_options.c -g

[email protected]:/home/pi/arm/episode1# ls -l

total 16

-rwxr-xr-x 1 root root 8648 Jun 14 20:21 compiler_options

-rw-r--r-- 1 root root 488 Jun 14 19:41 compiler_options.c

We can see that in the second case the size is larger, this means that other information has been added to the ELF file.

We could use different method for see the debugging information into the executable file, we use this time the readelf program with -S option (Display the sections’ header).


[email protected]:/home/pi/arm/episode1# readelf -S compiler_options | grep debug
 [27] .debug_aranges PROGBITS 00000000 0007f2 000020 00 0 0 1
 [28] .debug_info PROGBITS 00000000 000812 000318 00 0 0 1
 [29] .debug_abbrev PROGBITS 00000000 000b2a 0000da 00 0 0 1
 [30] .debug_line PROGBITS 00000000 000c04 0000de 00 0 0 1
 [31] .debug_frame PROGBITS 00000000 000ce4 000030 00 0 0 4
 [32] .debug_str PROGBITS 00000000 000d14 000267 01 MS 0 0 1

You can see the all the sections that contains the debugging information that are stored in DWARF debugging format, the default used by the GCC compiler.

For see the content of these section we can use the objdump program.

[email protected]:/home/pi/arm/episode1# objdump --dwarf=info ./compiler_options
…
Abbrev Number: 14 (DW_TAG_variable)
DW_AT_name : (indirect string, offset: 0x8a): password
DW_AT_decl_file : 1
DW_AT_decl_line : 4
DW_AT_type : <0x2eb>
DW_AT_location : 5 byte block: 3 70 7 2 0 (DW_OP_addr: 20770)
Abbrev Number: 16 (DW_TAG_variable)
DW_AT_name : (indirect string, offset: 0x215): stdin
DW_AT_decl_file : 5
DW_AT_decl_line : 168
DW_AT_type : <0x26b>
DW_AT_external : 1
DW_AT_declaration : 1
Abbrev Number: 0

The .debug_info section contains important information, that are used by the debugger.

Remove all symbol table and relocation information

With the GCC compiler we have the possibility to remove all the symbol table and relocation information, the option for do that is -s.

[email protected]:/home/pi/arm/episode1# gcc -o compiler_options compiler_options.c
[email protected]:/home/pi/arm/episode1# readelf --sym compiler_options
Symbol table '.dynsym' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
2: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)
3: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)
4: 00020788 4 OBJECT GLOBAL DEFAULT 24 [email protected]_2.4 (2)
5: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)
6: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)
7: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)
Symbol table '.symtab' contains 115 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00010134 0 SECTION LOCAL DEFAULT 1
2: 00010150 0 SECTION LOCAL DEFAULT 2
...
112: 00000000 0 FUNC GLOBAL DEFAULT UND strncmp@@GLIBC_2.4
113: 00000000 0 FUNC GLOBAL DEFAULT UND abort@@GLIBC_2.4
114: 00010318 0 FUNC GLOBAL DEFAULT 11 _init

As we have seen the .symtab has many local symbols and these are not necessary for running the program, then this section can be removed.

[email protected]:/home/pi/arm/episode1# gcc -o compiler_options compiler_options.c -s
[email protected]:/home/pi/arm/episode1# readelf --sym compiler_options
Symbol table '.dynsym' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
2: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)
3: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)
4: 00020788 4 OBJECT GLOBAL DEFAULT 24 [email protected]_2.4 (2)
5: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)
6: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)
7: 00000000 0 FUNC GLOBAL DEFAULT UND [email protected]_2.4 (2)

After the compilation with the -s option, access to functions name and some other information has been removed, and the life of a reverse engineer is a little more complicated.

In the episode 3 we will see other compilation options that are important for exploit development.

ARM Hello World

We will begin by writing a simple hello world program, and we will do this in two different ways:

  1. using Raspbian syscall

  2. using libc functions

1- using Raspbian syscall

As first step we will see a simple hello world program with using Raspbian syscall (file: rasp_syscall.s )

.data
string: .asciz "Hello World!\n"
len = . - string

.text
.global _start

_start:
  mov r0, #1      @ stdout
  ldr r1, =string @ string address
  ldr r2, =len    @ string length
  mov r7, #4      @ write syscall
  swi 0           @ execute syscall

_exit:
  mov r7, #1      @ exit syscall
  swi 0           @ execute syscall

Assemble and link the program

[email protected]:/home/pi/arm/episode1# as -o rasp_syscall.o rasp_syscall.s
[email protected]:/home/pi/arm/episode1# ld -o rasp_syscall rasp_syscall.o

Note:

If we compile using gcc

[email protected]:/home/pi/arm/episode1# gcc -o rasp_syscall rasp_syscall.s
/tmp/ccChPTEP.o: In function `_start':
(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/arm-linux-gnueabihf/4.9/../../../arm-linux-gnueabihf/crt1.o:/build/glibc-g3vikB/glibc-2.19/csu/../ports/sysdeps/arm/start.S:79: first defined here
/usr/lib/gcc/arm-linux-gnueabihf/4.9/../../../arm-linux-gnueabihf/crt1.o: In function `_start':
/build/glibc-g3vikB/glibc-2.19/csu/../ports/sysdeps/arm/start.S:119: undefined reference to `main'
collect2: error: ld returned 1 exit status

We get an error like this:

undefined reference to `main'

Because there is not the main function in the source program.

We will see the gcc compilation in the next implementation of the hello world program.

Execute the program

[email protected]:/home/pi/arm/episode1# ./rasp_syscall
Hello World!

Get some informations with gdb

[email protected]:/home/pi/arm/episode1# gdb -q ./rasp_syscall
Reading symbols from ./rasp_syscall...(no debugging symbols found)...done.
(gdb) info files
Symbols from "/home/pi/arm/episode1/rasp_syscall".
Local exec file:
`/home/pi/arm/episode1/rasp_syscall', file type elf32-littlearm.
Entry point: 0x10074
0x00010074 - 0x00010094 is .text
0x00020094 - 0x000200a2 is .data
(gdb) b *0x00010074
Breakpoint 1 at 0x10074
(gdb) r
Starting program: /home/pi/arm/episode1/rasp_syscall
Breakpoint 1, 0x00010074 in _start ()
(gdb) x/7i $pc
=> 0x10074 <_start>: mov r0, #1
0x10078 <_start+4>: ldr r1, [pc, #16] ; 0x10090 <_exit+8>
0x1007c <_start+8>: mov r2, #14
0x10080 <_start+12>: mov r7, #4
0x10084 <_start+16>: svc 0x00000000
0x10088 <_exit>: mov r7, #1
0x1008c <_exit+4>: svc 0x00000000

We can see all the instructions of our hello world program in the .text section, the instruction at address 0x10078 means load into the register r1 an address (located in the .data section) that is the value pointed by the address 0x10090

(gdb) x/14c *(int*)0x10090
0x20094: 72 'H' 101 'e' 108 'l' 108 'l' 111 'o' 32 ' ' 87 'W' 111 'o'
0x2009c: 114 'r' 108 'l' 100 'd' 33 '!' 10 '\n' 0 '\000'

2- using libc functions

We want use this time the printf function for the hello world program. We have to make some changes to the previous program, for example we have to replace the .global _start definition with .global main and something else, which I will describe later (file: libc_functions.s).

.data
  string: .asciz "Hello World!\n"
.text
.global main
.func main
main:
 stmfd sp!, {lr}    @ save lr
 ldr r0, =string    @ store string address into R0
 bl printf          @ call printf
 ldmfd sp!, {pc}    @ restore pc
_exit:
 mov lr, pc         @ exit

The compiler use the new definitions(.global main, .func main, main:) to tell libc where the main (of the program) is located.

Assemble and link the program

[email protected]:/home/pi/arm/episode1# as -o libc_functions.o libc_functions.s
[email protected]:/home/pi/arm/episode1# ld -o libc_functions libc_functions.o
ld: warning: cannot find entry symbol _start; defaulting to 00010074
libc_functions.o: In function `main':
(.text+0x8): undefined reference to `printf'

The assembler and linker are just a small part of the GCC compiler, in our example we will use some features that the GCC compiler provides, we will see how to use GCC for compile the program.

Compilation using GCC

[email protected]:/home/pi/arm/episode1# gcc -o libc_functions libc_functions.s

Get some informations with gdb

[email protected]:/home/pi/arm/episode1# gdb -q ./libc_functions
Reading symbols from ./libc_functions...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x10420
(gdb) r
Starting program: /home/pi/arm/episode1/libc_functions
Breakpoint 1, 0x00010420 in main ()
(gdb) info proc mappings
process 2023
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x10000 0x11000 0x1000 0x0 /home/pi/arm/episode1/libc_functions
0x20000 0x21000 0x1000 0x0 /home/pi/arm/episode1/libc_functions
0x76e79000 0x76fa4000 0x12b000 0x0 /lib/arm-linux-gnueabihf/libc-2.19.so
0x76fa4000 0x76fb4000 0x10000 0x12b000 /lib/arm-linux-gnueabihf/libc-2.19.so
0x76fb4000 0x76fb6000 0x2000 0x12b000 /lib/arm-linux-gnueabihf/libc-2.19.so
0x76fb6000 0x76fb7000 0x1000 0x12d000 /lib/arm-linux-gnueabihf/libc-2.19.so
0x76fb7000 0x76fba000 0x3000 0x0
0x76fba000 0x76fbf000 0x5000 0x0 /usr/lib/arm-linux-gnueabihf/libarmmem.so
0x76fbf000 0x76fce000 0xf000 0x5000 /usr/lib/arm-linux-gnueabihf/libarmmem.so
0x76fce000 0x76fcf000 0x1000 0x4000 /usr/lib/arm-linux-gnueabihf/libarmmem.so
0x76fcf000 0x76fef000 0x20000 0x0 /lib/arm-linux-gnueabihf/ld-2.19.so
0x76ff1000 0x76ff3000 0x2000 0x0
0x76ff9000 0x76ffb000 0x2000 0x0
0x76ffb000 0x76ffc000 0x1000 0x0 [sigpage]
0x76ffc000 0x76ffd000 0x1000 0x0 [vvar]
0x76ffd000 0x76ffe000 0x1000 0x0 [vdso]
0x76ffe000 0x76fff000 0x1000 0x1f000 /lib/arm-linux-gnueabihf/ld-2.19.so
0x76fff000 0x77000000 0x1000 0x20000 /lib/arm-linux-gnueabihf/ld-2.19.so
0x7efdf000 0x7f000000 0x21000 0x0 [stack]
0xffff0000 0xffff1000 0x1000 0x0 [vectors]

You can see the presence of the libc shared library (libc-2.19.so) in the address spaces of the process, then let’s look at the source code

(gdb) x/5i $pc
=> 0x10420 <main>:	stmfd	sp!, {lr} 
   0x10424 <main+4>:	ldr	r0, [pc, #8]	; 0x10434 <_exit+4> 
   0x10428 <main+8>:	bl	0x102c8 
   0x1042c <main+12>:	ldmfd	sp!, {pc} 
   0x10430 <_exit>:	mov	lr, pc

At the address 0x10428 there is the calling to the printf function, in details the address 0x10428 is just an entry of the PLT (procedure linkage table), that have a corresponding entry in the GOT segment which contains the offset to the real printf function (at runtime). Let’s see in details

When we compile the program with GCC, libc is not include in the binary file (libc_functions), but libc will be dinamically linked to this binary. We can use ldd for see the dynamic library referenced from this binary

[email protected]:/home/pi/arm/episode1# ldd libc_functions
linux-vdso.so.1 (0x7eeb1000)
/usr/lib/arm-linux-gnueabihf/libarmmem.so (0x76fe6000)
libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x76e9f000)
/lib/ld-linux-armhf.so.3 (0x54b6d000)

We can see that libc is required by the binary, if you run ldd otehrs time you could note that the address of libc is different, this becouse ASLR is enabled. Let’s open the binary with IDA

At the location 0x10428 there is the calling to the printf function, we can notice that we don’t reach libc

but we are in the PLT section, and at line 0x102D0 we can see the jump (LDR PC, […]) to an address that is stored in another location

We landed into the GOT section, the address stored here refers to an externar symbol.

Time to debug with gdb, we can set a breakpoint at address 0x10428 (where the printf function is called in the main function)

Breakpoint 2, 0x00010428 in main ()Breakpoint 2, 0x00010428 in main ()
(gdb) x/i $pc
=> 0x10428 <main+8>: bl 0x102c8

the go on with the stepi command

If we go ahead with a few instructions, we reach the dl_runtime_resolve function that is contained in the ld binary

ldd is a dynamic linker/loader, so the function of this library is to set up the external reference to libc.

For more details see http://eli.thegreenplace.net/2011/11/03/position-independent-code-pic-in-shared-libraries/

Introduction to reverse engineering

In this section I will not provide the source code of the programs that we will analyze, we will see the source code only for this first program (algorithm_reversing).

Reversing the algorithm

We begin with a real simple program, which receives a message, this message is processed by a simple algorithm, and outputs another message. The purpose of this exercise is to understand the algorithm used so that the output message is the string “Hello”.

strIN -------[algorithm]-------strOUT
strOUT = Hello

This is the source code of the program to reverse (I said that I will provide the source code just for the first program :))

file: algorithm_reversing.s

.data
.balign 4
  info: .asciz "Please enter your string: "
  format: .asciz "%5s"
.balign 4
  strIN: .skip 5
  strOUT: .skip 5
  val: .byte 0x5
  output: .asciz "your input: %s\n"
.text
.global main
.extern printf
.extern scanf

main:
  push {ip, lr}      @ push return address + dummy register
  ldr r0, =info      @ print the info
  bl printf
  ldr r0, =format
  ldr r1, =strIN
  bl scanf
  @ parsing of the message
  ldr r5, =strOUT
  ldr r1, =strIN
  ldrb r2, [r1]
  ldrb r3, [r1,#1]
  eor r0, r2, r3
  str r0, [r5]
  ldrb r4, [r1,#2]
  eor r0, r4, r3
  str r0, [r5,#1]
  add r2, #0x5
  str r2, [r5,#2]
  ldrb r4, [r1,#3]
  eor r0, r3, r4
  str r0, [r5,#3]
  ldrb r2, [r1,#4]
  eor r0, r2, r4
  str r0, [r5,#4]
  @ print of the final string
  ldr r0, =strOUT    @ print num formatted by output string.
  bl printf
  pop {ip, pc}       @ pop return address into pc

Compile it

[email protected]:/home/pi/arm/episode1# gcc -o algorithm_reversing algorithm_reversing.s

Debug it in order to understand the algorithm

[email protected]:/home/pi/arm/episode1# gdb -q ./algorithm_reversing
Reading symbols from ./algorithm_reversing...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x10450
(gdb) r
Starting program: /home/pi/arm/episode1/algorithm_reversing
Breakpoint 1, 0x00010450 in main ()
(gdb) x/10i $pc
=> 0x10450 <main>:	push	{r12, lr} 
   0x10454 <main+4>:	ldr	r0, [pc, #92]	; 0x104b8 <main+104> 
   0x10458 <main+8>:	bl	0x102ec 
   0x1045c <main+12>:	ldr	r0, [pc, #88]	; 0x104bc <main+108> 
   0x10460 <main+16>:	ldr	r1, [pc, #88]	; 0x104c0 <main+112> 
   0x10464 <main+20>:	bl	0x10304 
   0x10468 <main+24>:	ldr	r5, [pc, #84]	; 0x104c4 <main+116> 
   0x1046c <main+28>:	ldr	r1, [pc, #76]	; 0x104c0 <main+112> 
   0x10470 <main+32>:	ldrb	r2, [r1] 
   0x10474 <main+36>:	ldrb	r3, [r1, #1] 

Go on (with nexti) at the next instruction 0x10454, it means:

r0=*(pc+92)

Look at the content of the address at pc+92

(gdb) x/x 0x104b8
0x104b8 <main+104>:	0x00020668

It is an address that is within the data section, let’s analyze the content

(gdb) x/s 0x20668
0x20668: "Please enter your string: "

At the address 0x20668 there is the argument of the first printf function.

Go on until we reach the address 0x10464 (scanf function), the r0 argument contains the address of the format, r1 contains the address of the input string

(gdb) i r $r0 $r1
r0 0x20683 132739
r1 0x20688 132744
(gdb) nexti

Then it is the time to digit the input message, from the source code we saw that

format: .asciz "%5s"
strIN: .skip

we know that the length of the message must be 5.

Then we could try to insert for example the string “ABCDE”

(gdb) nexti
Please enter your string: ABCDE

With the instructions at 0x10468 and 0x1046c, we fill r5 with the address of the output string and r1 with the address of the input string, then go on to the instruction at 0x10470 (the algorithm part)

(gdb) x/18i $pc 
=> 0x10470 <main+32>:	ldrb	r2, [r1] 
   0x10474 <main+36>:	ldrb	r3, [r1, #1] 
   0x10478 <main+40>:	eor	r0, r2, r3 
   0x1047c <main+44>:	str	r0, [r5] 
   0x10480 <main+48>:	ldrb	r4, [r1, #2] 
   0x10484 <main+52>:	eor	r0, r4, r3 
   0x10488 <main+56>:	str	r0, [r5, #1] 
   0x1048c <main+60>:	add	r2, r2, #5 
   0x10490 <main+64>:	str	r2, [r5, #2] 
   0x10494 <main+68>:	ldrb	r4, [r1, #3] 
   0x10498 <main+72>:	eor	r0, r3, r4 
   0x1049c <main+76>:	str	r0, [r5, #3] 
   0x104a0 <main+80>:	ldrb	r2, [r1, #4] 
   0x104a4 <main+84>:	eor	r0, r2, r4 
   0x104a8 <main+88>:	str	r0, [r5, #4] 
   0x104ac <main+92>:	ldr	r0, [pc, #16]	; 0x104c4 <main+116> 
   0x104b0 <main+96>:	bl	0x102ec 
   0x104b4 <main+100>:	pop	{r12, pc} 

Let’s take a look at the following instructions (see the in line comments)

   0x10470 <main+32>:	ldrb	r2, [r1]        ; r2 <- *r1
   0x10474 <main+36>:	ldrb	r3, [r1, #1]  ; r3 <-*(r1+1)
   0x10478 <main+40>:	eor	r0, r2, r3     ; r0=r2 xor r3
   0x1047c <main+44>:	str	r0, [r5]        ; r0 -> *r5

Go on at 0x10480 address (with nexti) and check the the content of the r0, r2 and r3 registers

(gdb) i r $r0 $r2 $r3
r0 0x3 3
r2 0x41 65
r3 0x42 66

This means

*r5 = r2 xor r3

that we can rewrite as:

byte1strOut = byte1strInput xor byte2strInput

the output string begins to be built.

For example in our case (for generate the “Hello” output string) we want r0=0x48 (H).

We continue with the analysis from the address 0x10480

(gdb) x/8i $pc 
=> 0x10480 <main+48>:	ldrb	r4, [r1, #2] 
   0x10484 <main+52>:	eor	r0, r4, r3 
   0x10488 <main+56>:	str	r0, [r5, #1] 
   0x1048c <main+60>:	add	r2, r2, #5 
   0x10490 <main+64>:	str	r2, [r5, #2] 
   0x10494 <main+68>:	ldrb	r4, [r1, #3] 
   0x10498 <main+72>:	eor	r0, r3, r4 
   0x1049c <main+76>:	str	r0, [r5, #3] 

Let’s take a look at the following instructions (see the in line comments)

   0x10480 <main+48>:	ldrb	r4, [r1, #2]   ; r4 <- *(r1+2) 
   0x10484 <main+52>:	eor	r0, r4, r3      ; r0=r4 xor r3
   0x10488 <main+56>:	str	r0, [r5, #1]   ; r0 -> *(r5+1)

Let’s go to the 0x1048c instruction and look at the contents of the registers r0, r3 and r4

(gdb) i r $r0 $r3 $r4
r0 0x1 1
r3 0x42 66
r4 0x43 67

This means

*(r5+1) = r4 xor r3

that we can rewrite as:
byte2strOut = byte2strInput xor byte3strInput
Go on and let’s analyze these two instructions

0x1048c <main+60>:	add	r2, r2, #5 
0x10490 <main+64>:	str	r2, [r5, #2] 

This means

*(r5+2) = r2 + 0x5

that we can rewrite as:

byte3outStr = byte1strInput + 0x5

We can now get the fourth byte output

0x10494 <main+68>:	ldrb	r4, [r1, #3] 
0x10498 <main+72>:	eor	r0, r3, r4 
0x1049c <main+76>:	str	r0, [r5, #3]

This means

*(r5+3) = r3 xor r4

that we can rewrite as:

byte4strOut = byte2strInput xor byte4strInput

Finally there is the fifth byte of the output string

0x104a0 <main+80>:	ldrb	r2, [r1, #4] 
0x104a4 <main+84>:	eor	r0, r2, r4 
0x104a8 <main+88>:	str	r0, [r5, #4]

This means

*(r5+4) = r4 xor r2

that we can rewrite as:

byte5strOut = byte4strInput xor byte5strInput

Perfect, we can put all the pieces together

byte1strOut = byte1strInput xor byte2strInput

byte2strOut = byte2strInput xor byte3strInput

byte3strOut = byte2strInput + 0x5

byte4strOut = byte2strInput xor byte4strInput

byte5strOut = byte4strInput xor byte5strInput

Replace the output byte

‘H’ = 0x48 = byte1strInput xor byte2strInput

‘e’ = 0x65 = byte2strInput xor byte3strInput

‘l’ = 0x6c = byte1strInput + 0x5

‘l’ = 0x6c = byte2strInput xor byte4strInput

‘o’ = 0x6f = byte4strInput xor byte5strInput

Now we can solve it

byte1strInput = 0x6c – 0x5 = 0x67 (g)

byte2strInput = 0x48 xor 0x67 = 0x2f (/)

byte3strInput = 0x2f xor 0x65 = 0x4a (J)

byte4strInput = 0x2f xor 0x6c = 0x43 (C)

byte5strInput = 0x43 xor 0x6f = 0x2c (,)

The algorithm seems to be resolved, let’s try to test it

[email protected]:/home/pi/arm/episode1# ./algorithm_reversing
Please enter your string: g/JC,
Hello

Reversing a simple loader

This new program is a simple loader, its task is to load the instructions in memory and execute the instructions in memory once you print a message.

The purpose of this exercise is to print the following outgoing message: “WIN”. You have to print the “WIN” string by changing the value of an xor key

The program name is: loader_reversing

[email protected]:/home/pi/arm/episode1# file loader_reversing
loader_reversing: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), statically linked, not stripped
[email protected]:/home/pi/arm/episode1# strings loader_reversing
Andrea Sindoni @invictus1306
aeabi
.symtab
.strtab
.shstrtab
.text
.data
.ARM.attributes
loader_reversing.o
mystr
code
_loop
_exit
_bss_end__
__bss_start__
__bss_end__
_start
__bss_start
__end__
_edata
_end

Open the file with IDA

We can see in the _start routine that a system call is called (at the address 0x10090), the system call number is 0xc0 (mmap syscall)

Let’s analyze in details

mov r4, #0xffffffff  @file descriptor
ldr r0, =0x00030000  @address
ldr r1, =0x1000      @size of the mapping table
mov r2, #7           @prot
mov r3, #0x32        @flags
mov r5, #0           @offset
mov r7, #192         @syscall number
swi #0 @ mmap2(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, -1, 0)

After the mmap syscall we can see the new allocated area (0x30000)

(gdb) info proc mappings
process 2405
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x10000 0x11000 0x1000 0x0 /home/pi/arm/episode1/loader_reversing
0x20000 0x21000 0x1000 0x0 /home/pi/arm/episode1/loader_reversing
0x30000 0x31000 0x1000 0x0
0x76ffd000 0x76ffe000 0x1000 0x0 [sigpage]
0x76ffe000 0x76fff000 0x1000 0x0 [vvar]
0x76fff000 0x77000000 0x1000 0x0 [vdso]
0x7efdf000 0x7f000000 0x21000 0x0 [stack]
0xffff0000 0xffff1000 0x1000 0x0 [vectors]

The instruction at the address 0x10098

.text:00010098 LDR R1, =code

load into r1 the address of a variable (this is an initialized variable), look at the content of the variable

(gdb) i r $r1
r1 0x200f1 131313
(gdb) x/10x 0x200f1
0x200f1: 0xe93f7c56 0xe25fe45e 0xe1b2745b 0xe3b21468
0x20101: 0xe3b20454 0xe3b264c0 0xe0302453 0xe49f2457
0x20111: 0xe2501448 0xe0302453

These bytes does not seem arm code, then go on at the instruction 0x100A4

.text:000100A4 LDR R2, [R1,R4]

Load into r2 the value pointed by (r1+r4) (r4 seem an index and the first time is 0), r1 is the address of the code variable. Then in the next instruction

.text:000100A8 EOR R2, R2, R6

a xor operation is executed between r2 and r6, the value of r6 is 0x123456 (xor key), while the value of r2 (the first time) is 0x56.

The result of the xor operation is stored into r2 that in the next instruction is saved into the mmap allocated area at the address 0x30000 (note r0 is the return value of the mmap syscall)

.text:000100AC STR R2, [R0,R4]

the loop is used to decrypt all the bytes of the code variable, to decrypt we will use gdb now (after we will use also IDA for do that), then set a breakpoint at the address 0x100BC, and look at the address 0x30000

(gdb) b *0x100bc 
Breakpoint 3 at 0x100bc 
(gdb) c 
Continuing. 
Breakpoint 3, 0x000100bc in _loop () 
(gdb) x/24i 0x30000 
 0x30000: push {r11, lr} 
 0x30004: sub sp, sp, #8 
 0x30008: mov r4, sp 
 0x3000c: mov r2, #62 ; 0x3e 
 0x30010: mov r3, #2 
 0x30014: mov r5, #150 ; 0x96 
 0x30018: eor r1, r2, r5 
 0x3001c: str r1, [sp], #1 
 0x30020: sub r2, r2, #30 
 0x30024: eor r1, r2, r5 
 0x30028: str r1, [sp], #1 
 0x3002c: add r2, r2, #7 
 0x30030: subs r3, r3, #1 
 0x30034: bne 0x30024 
 0x30038: mov r0, #1 
 0x3003c: mov r3, #10 
 0x30040: str r3, [sp], #1 
 0x30044: mov r1, r4 
 0x30048: mov r2, #4 
 0x3004c: mov r7, #4 
 0x30050: svc 0x00000000 
 0x30054: add sp, sp, #4 
 0x30058: pop {r11, pc} 
 0x3005c: andeq r0, r0, r0

as you can see we have the new ARM instructions

We could use also a simple idc script to decrypt the instructions

auto i, t;
auto start=0x200f1;
for (i=0;i<=0x5C;i=i+4) 
{ 
  t = Dword(start)^0x123456;
  PatchDword(start,t);
  start=start+4;
}

We have now to analyze the new decrypted code

=> 0x30004:	sub	sp, sp, #8 
   0x30008:	mov	r4, sp 
   0x3000c:	mov	r2, #62	; 0x3e 
   0x30010:	mov	r3, #2 
   0x30014:	mov	r5, #150	; 0x96 
   0x30018:	eor	r1, r2, r5 
   0x3001c:	str	r1, [sp], #1 
   0x30020:	sub	r2, r2, #30 
   0x30024:	eor	r1, r2, r5 
   0x30028:	str	r1, [sp], #1 
   0x3002c:	add	r2, r2, #7 
   0x30030:	subs	r3, r3, #1 
   0x30034:	bne	0x30024 
   0x30038:	mov	r0, #1 
   0x3003c:	mov	r3, #10 
   0x30040:	str	r3, [sp], #1 
   0x30044:	mov	r1, r4 
   0x30048:	mov	r2, #4 
   0x3004c:	mov	r7, #4 
   0x30050:	svc	0x00000000 
   0x30054:	add	sp, sp, #4 
   0x30058:	pop	{r11, pc} 

After the first five instruction (from 0x30004 to 0x30014), the stack pointer is decremented by 8 (local variable), the address of the stack pointer is stored into r4, the r2 register contains the 0x3e value, the r3 register contains the 0x2 value and the r5 register contains the 0x96 value.

(gdb) i r $r2 $r3 $r4 $r5 $sp 
r2 0x3e 62 
r3 0x2 2 
r4 0x7efff7b0 2130704304 
r5 0x96 150 
sp 0x7efff7b0 0x7efff7b0

In the next two instruction (0x30018 and 0x3001c) the xor operation between r2 and r5 store into r1 the value 0xa8, this value is saved on the stack and the sp is incremented by 1

After the instruction at 0x3001c (str r1, [sp], #1) we have

(gdb) x/x 0x7efff7b0 
0x7efff7b0: 0x000000a8
(gdb) i r $sp
sp 0x7efff7b1 0x7efff7b1

At the address 0x30020, the register r2 is decremented by the value 0x1e, after the execution we have

(gdb) i r $r2
r2 0x20 32

Now at the instruction 0x30024 there is a simple loop

=> 0x30024:	eor	r1, r2, r5 
   0x30028:	str	r1, [sp], #1 
   0x3002c:	add	r2, r2, #7 
   0x30030:	subs	r3, r3, #1 
   0x30034:	bne	0x30024 

For every cycle we have always a xor operation between r2 and r5 and always the result of the xor operation was stored into the stack with consequent increase by 1 (of the sp).

We can see that the index of the loop is r3, the initial value of r3 is 2 and it is decremented by 1 (address 0x30030) at every cycle, then the loop is executed just 2 times.

When the cycle is concluded, we reach the address 0x30038, let’s look the content at 0x7efff7b0 (local variable)

(gdb) x/4bx 0x7efff7b0 
0x7efff7b0: 0xa8 0xb6 0xb1 0x00

Others two bytes was store into the stack pointer and the value of the stack pointer now is

(gdb) i r $sp 
sp 0x7efff7b3 0x7efff7b3

Go on at address 0x3003c, in the following two instructions another byte is stored into the stack pointer

0x3003c: mov r3, #10 
0x30040: str r3, [sp], #1

After the instruction at 0x30040 the content of the local variable (0x7efff7b0) is

(gdb) x/4bx 0x7efff7b0 
0x7efff7b0: 0xa8 0xb6 0xb1 0x0a

if we go on we find the write syscall

0x30038: mov r0, #1  @ fd: stdout
...
0x30044: mov r1, r4  @ buf: r4 (the buffer stored at 0xbefff7e0;)
0x30048: mov r2, #4  @ count: len of the buffer
0x3004c: mov r7, #4  @ write syscall number
0x30050: svc 0x00000000

After the write syscall, this is the result

(gdb) nexti 
���

But we want the WIN string as result, then as suggest at the beginning of this section, we have to change the xor key in order to push into the stack (set the local variable) the correct following values:

0x57 0x49 0x4e

We could look at the first xor instruction at 0x30018

0x30018: eor r1, r2, r5

The r2 register change every time the r5 register contain the xor key, we have to change it in order to have

r1 = r2 xor r5 = 0x57

The value of r2 is 0x3e, then the value of the r5 register (xor key) should be 0x69

(gdb) set $r5=0x69
(gdb) i r $r5
r5 0x69 105

Also for the two others xor instructions we have the same key, then the problem is solved.

(gdb) c
Continuing.
WIN

Basic anti-debug technique

This is the last program to reverse, the purpose is to understand the algorithm and bypass some basic anti-debug technique so that the output message is the string “Good”.

The program name is: anti_dbg

[email protected]:/home/pi/arm/episode1# file anti_dbg
anti_dbg: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 2.6.32, BuildID[sha1]=7028a279e2161c298caeb4db163a96ee2b2c49f3, not stripped

We can try to run the program with the debugger:

[email protected]:/home/pi/arm/episode1# gdb -q ./anti_dbg
Reading symbols from ./anti_dbg...(no debugging symbols found)...done
(gdb) r
Starting program: /home/pi/arm/episode1/anti_dbg
You want debug me?
[Inferior 1 (process 2497) exited normally]

The same output is printed even if we use the strace/ltrace commands.

We can try to open the program with IDA

Let start with the analysis of this instruction

ldr r2, =aAd

This is the aAd variable

We can convert the variable to date to better understand the values of the array

The address (0x10988) of this array (of 4 element) was stored into the var_C local variable. After there is another local variable, var_10, we are interested at the value of aAd+4 (ldr r2, =(aAd+4))

As you can see the local variable var_10 contains the address (0x1098C) of the new array (of 3 elements).

Now we have to analyze (see the in-line comments) the following instructions:

LDRH R1, [R2]                        @ load an halfword (2 byte) into R1
LDRB R2, [R2,#(unk_109CE – 0x109CC)] @ load the next byte(0x44) into r2
STRH R1, [R3]                        @ store into *R3 the first two bytes (0x22, 0x41)
STRB R2, [R3,#2]                     @ store the last byte 0x44 into *(R3+2)

Summarizing we have two array, the first one (var_C) contains 4 elements

0x7, 0x2f, 0x2f, 0x24

the second one (var_10) contains 3 elements

0x22, 0x41, 0x44

There is an interesting variable flag, before look inside this variable, we follow the code of the main function

as you can see from the above figure, if flag variable is equal to 1, the code after the red line is executed, otherwise (flag!=1) the code after the green line (loc_10858) is executed.

If we follow the case flag=1, we can see that the register r3 is initialized to 0 and after compared with 3.

If we follow the case flag!=1, we can see that the register r3 is initialized to 0 and after compared with 2.

with the case flag=1, we reach loc_107F8. The most interesting instruction is:

ADD R3, R3, #0x40

The content of r3 is

r3 = *(var_C+var_8)

and the values of var_C and var_8 are

var_C = address of the array with 4 elements

var_8 = 0 index (first iteration)

Then after the add instructions the value of r3 is

r3 = 0x7 + 0x40 = 0x47

We can create a simple idc script for resolve all the element of the first array (var_C)

The output

Let’s look at the case flag!=1, or rather the loc_10864, the cycle this time is only for three elements (index=r3), and the array is var_10. The most interesting instruction is:

ADD R3, R3, #0x20

Just like we did before, we can create a idc script for the resolution of the final string

and the output string

that this time is Bad.

The solution to the problem is to print as the output message the string “Good”, our purpose now is to understand where the flag variable change his value.

We can also note that in the main function there is no checks that verify the presence of the debugger and also there is no trace for the “You want debug me?” string.

Let’s start with xrefs of the flag variable

From the image above we can see the presence of a function called ptrace_capt, this function is called automatically before execution enters in main (you can verify it also with gdb setting a breakpoint in the ptrace_capt function), for understand better, we can look into the .ctors (or .init_array) section, this section provide a list of the functions (in our case created with the constructor attribute) which are executed before an application starts/ends (in our case before the main function).

Look into the ptrace_capt function

Very well, we reach the ptrace check, it is a very simple check like

if(ptrace(PTRACE_TRACEME, 0, 0, 0) < 0)
{
  printf("You want debug me?\n");
  exit(0);
}

We can easily bypass this check with the debugger, we will see this shortly.

Go on and analyze the code from loc_10690

we can summarize:

1- open the file password.raw in reading

fopen("password.raw", "r")

2- calculate the size

.text:000106B4 LDR R0, [R11,#var_10] ; load the file descriptor into r0
.text:000106B8 MOV R1, #0            ; offset
.text:000106BC MOV R2, #2            ; SEEK_END
.text:000106C0 BL fseek              ; seek to end of file
.text:000106C4 LDR R0, [R11,#var_10] ; load the file descriptor into r0
.text:000106C8 BL ftell              ; size

3- Verify if the file size is minor of 6

.text:000106E4 LDR R3, [R11,#var_14]
.text:000106E8 CMP R3, #6
.text:000106EC BLS loc_106F
.text:000106F0 MOV R0, #0
.text:000106F4 BL exit

If the file size is less then 6 (otherwise the program end) we reach loc_10700

If we go on we can quicky understand that it is a loop

Look at the function fgetc

.text:00010700 LDR R0, [R11,#var_10] ; load into r0 the file descriptor
.text:00010704 BL fgetc
.text:00010708 STR R0, [R11,#var_18  ; save r0 into the local variable var_18
after we have the function feof
.text:0001070C LDR R0, [R11,#var_10] ; load into r0 the file descriptor
.text:00010710 BL feof
.text:00010714 MOV R3, R0            ; mov the reterun value into r3
.text:00010718 CMP R3, #0            ; compare r3 with 0
.text:0001071C BEQ loc_10750         ; associated with the stream is not set (r3=0) branch to loc_10750

Case r3=0 (We did not reach the end of the file)

This is the disassembly code for the case r3=0

.text:00010750 loc_10750 ; CODE XREF: ptrace_capt+D0#j
.text:00010750 SUB R3, R11, #-var_1C  ; r3 = address of var_1C
.text:00010754 LDR R0, [R11,#var_18]  ; r0 ← *(r11+var_18)
.text:00010758 LDR R1, [R11,#var_8]   ; r1 ← *(r11+var_8)
.text:0001075C MOV R2, R3             ; r2 = r3
.text:00010760 BL sub0

var_18 is the local variable that contains the character readed, while the value of var_8 (index) in the first cycle is 0. Then we have

sub0(var_18, var_8, &var_1C);

In the following image we can see the code for the sub0 function

Which translated into a pseudo C code:

if(var_C==0 || var_C==2)
{
  //loc_1060C
  *var_10=var_8|0x55;
}
else
{
  //loc_10620
  *var_10=var_8^0x69 | var_8<<3;
}

When the function sub0 return, the following code is executed (remember that var_1C contains the returned value)

.text:00010764 LDR R3, [R11,#var_1C]
.text:00010768 LDR R2, [R11,#var_C]
.text:0001076C ADD R3, R2, R3
.text:00010770 STR R3, [R11,#var_C]
.text:00010774 LDR R3, [R11,#var_8]
.text:00010778 ADD R3, R3, #1
.text:0001077C STR R3, [R11,#var_8]
.text:00010780 B loc_10700

We can write the corresponding pseudo C code

var_C = var_1C + var_C;
var_8++; //increment the index

Case r3!=0 (We reached the end of the file)

This is the disassembly code for the case r3=0

.text:00010720 LDR R3, [R11,#var_C]
.text:00010724 LDR R2, =0x997
.text:00010728 CMP R3, R2
.text:0001072C BNE loc_10740
.text:00010730 LDR R3, =flag
.text:00010734 MOV R2, #1
.text:00010738 STR R2, [R3]
.text:0001073C B loc_10784
.text:00010740 loc_10740 ; CODE XREF: ptrace_capt+E0#j
.text:00010740 LDR R3, =flag
.text:00010744 MOV R2, #2
.text:00010748 STR R2, [R3]
.text:0001074C B loc_10784

Also in this case we can write the pseudo C code

if (var_C==0x997)
{
  flag=1;
}
else
{
  //loc_10740
  flag=2;
}

And finally, we can see from the above code the point where the flag variabile is set, for the solution of the challange we need flag=1.

We must first create the password.raw file, and write 5 characters inside the file

# vim password.raw
bbbbb

I use vim with the setting that delete the new line (LF)

:set noendofline binary

Run the program

We need to run it with gdb, being careful to the ptrace check.

# gdb ./3b

Then we can set a breakpoint at 0x10678 and modify the value of r3 in order to bypass the ptrace control.

Now we can continue the analysis with gdb, my strategy is very simple, I want to change just the last byte and check if flag is equal to 1 (var_C=0x997). I wrote in the file

I want change only the fifth byte for reach the condition var_C=0x997. For do it, we need to know the value of var_C at the interaction 4.

Then we can set a breakpoint at the address 0x10774 (after the instruction var_C = var_1C+var_C)

From image above, we can note that the index is 3 (interaction 4), and the value of var_C is 0x724. Let try to change the fifth byte in order to reach the condition var_C=0x977.

I wrote a simple python (https://github.com/invictus1306/ARM-episodes/blob/master/Episode1/python_Script/antiDbgAlgho.py) script to change the fifth bytes

num = 0x997-0x724
for c in range (0x20,0x7f):
  ref = c^0x69 | (c<<3)
  if (ref==num):
    print "The number is " + hex(c)
print "End!"

Run the python script

# python antDgbAlgho.py
The number is 0x4a
End!

And we get the correct value for the fifth byte, now we can modify the file password.raw

# vim password.raw
bbbbJ

Remember the setting that delete the new line (LF)

:set noendofline binary

Launch the program

And the “Good” string is printed.

 

Speak Your Mind