Having fun with signal handlers
Posted on Sat 21 November 2020 in programming
As every C and C++ programmer knows far too well, if you dereference a pointer that points outside of the space mapped on your process' memory, you get a segmentation fault and your programs crashes. As far as the language itself is concerned, you don't have a second chance and you cannot know in advance whether that dereferencing operation is going to set a bomb off or not. In technical terms, you are invoking undefined behaviour, and you should never do that: you are responsible for knowing in advance if your pointers are valid, and if they are not you keep the pieces.
However, turns out that most actual operating system give you a second chance, although with a lot of fine print attached. So I tried to implement a function that tries to dereference a pointer: if it can, it gives you the value; if it can't, it tells you it couldn't. Again, I stress this should never happen in a real program, except possibly for debugging (or for having fun).
The prototype is
word_t peek(word_t *addr, int *success);
The function is basically equivalent to return *addr
, except that if
addr
is not mapped it doesn't crash, and if success
is not NULL it
is set to 0
or 1
to indicate that addr
was not mapped or
mapped. If addr
was not mapped the return value is meaningless.
I won't explain it in detail to leave you some fun. Basically the idea
is to install a handler for SIGSEGV
: if the address is invalid, the
handler is called, which basically fixes everything by advancing a
little bit the instruction pointer, in order to skip the faulting
instruction. The dereferencing instruction is written as hardcoded
Assembly bytes, so that I know exactly how many bytes I need to skip.
Of course this is very architecture-dependent: I wrote the i386
and
amd64
variants (no x32
). And I don't guarantee there are no bugs
or subtelties!
Another solution would have been to just parse /proc/self/maps
before dereferencing and check whether the pointer is in a mapped
area, but it would have suffered of a
TOCTTOU
problem: another thread might have changed the mappings between the
time when /proc/self/maps
was parsed and when the pointer was
dereferenced (also, parsing that file can take a relatively long
amount of time). Another less architecture-dependent but still not
pure-C approach would have been to establish a setjmp
before
attempting the dereference and longjmp
-ing back from the signal
handler (but again you would need to use different setjmp
contexts
in different threads to exclude race conditions).
Have fun! (and again, don't try this in real programs)
EDIT I realized I should specify the language for source code highlighting to work decently. Now it's better!
EDIT 2 I also realized that my version of peek
has problems when
there are other threads, because signal actions are per-process, not
per-thread (as I initially thought). See the comments for a better
version (though not perfect).
#define _GNU_SOURCE
#include <stdint.h>
#include <signal.h>
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <ucontext.h>
#ifdef __i386__
typedef uint32_t word_t;
#define IP_REG REG_EIP
#define IP_REG_SKIP 3
#define READ_CODE __asm__ __volatile__(".byte 0x8b, 0x03\n" /* mov (%ebx), %eax */ \
".byte 0x41\n" /* inc %ecx */ \
: "=a"(ret), "=c"(tmp) : "b"(addr), "c"(tmp));
#endif
#ifdef __x86_64__
typedef uint64_t word_t;
#define IP_REG REG_RIP
#define IP_REG_SKIP 6
#define READ_CODE __asm__ __volatile__(".byte 0x48, 0x8b, 0x03\n" /* mov (%rbx), %rax */ \
".byte 0x48, 0xff, 0xc1\n" /* inc %rcx */ \
: "=a"(ret), "=c"(tmp) : "b"(addr), "c"(tmp));
#endif
static void segv_action(int sig, siginfo_t *info, void *ucontext) {
(void) sig;
(void) info;
ucontext_t *uctx = (ucontext_t*) ucontext;
uctx->uc_mcontext.gregs[IP_REG] += IP_REG_SKIP;
}
struct sigaction peek_sigaction = {
.sa_sigaction = segv_action,
.sa_flags = SA_SIGINFO,
.sa_mask = 0,
};
word_t peek(word_t *addr, int *success) {
word_t ret;
int tmp, res;
struct sigaction prev_act;
res = sigaction(SIGSEGV, &peek_sigaction, &prev_act);
assert(res == 0);
tmp = 0;
READ_CODE
res = sigaction(SIGSEGV, &prev_act, NULL);
assert(res == 0);
if (success) {
*success = tmp;
}
return ret;
}
int main() {
int success;
word_t number = 22;
word_t value;
number = 22;
value = peek(&number, &success);
printf("%d %d\n", success, value);
value = peek(NULL, &success);
printf("%d %d\n", success, value);
value = peek((word_t*)0x1234, &success);
printf("%d %d\n", success, value);
return 0;
}
Leave a comment
Comment will be manually reviewed before being published.
Comments
-
MatthiasU said, on 2020-11-22 14:08:22+01:00:
Nice, though I'd rearrange the interface of
peek
to bebool peek(word_t *addr, word_t *&value)
. That way you can directly put it in anif()
without resorting to the comma operator. -
Krunch said, on 2020-11-23 07:08:05+01:00:
Consider the following (from memory, untested):
word_t peek(word_t *addr, int *success) { word_t ret; errno = 0; access(addr, F_OK); if (success) *success = (errno != EFAULT); if (errno != EFAULT) ret = *addr; return ret; }
I first learnt about this trick in http://www.gcu-squad.org/2005/08/non-ce-n-est-pas-sale%21/.
-
Giovanni Mascellani said, on 2020-11-26 08:40:00+01:00:
MatthiasU, this is probably sensible for general usage. In my case, I didn't really want to distinguish the case of reading
0
or failing to read, so I just modified the function to always return0
on failure and that was ok for me. -
Giovanni Mascellani said, on 2020-11-26 08:45:00+01:00:
Krunch, that's clever, good idea! It suffers of a TOCTTOU problem, because another thread might change the mapping between
access
and the dereferencing operation.However I also realized that my version deals bad with multithreading too, because the signal action is per-process, not per-thread. So if the program makes non-trivial usage of the SIGSEGV action in another thread, things will go bad if you use my version.
So for the moment I don't have a fully satisfying solution to this problem, but it seems that the
access
based solution provided by you is the best compromise (the exposure window is very little).