technology from back to front

What has happened to the segment registers?

16-bit days



There were days when computers had 16-bit registers and 20-bit addressable memory. That is a total of 1MB memory – some claimed that it ought to be enough for anybody. Memory address space was flat and not protected by anything, now it’s known as the real mode. How was it possible to address 20-bit memory with only 16-bit registers?

Well, it was necessary to use two registers. One pointed to the high bits of physical location – a segment, the other was a relative offset. Physical memory location was just counted as: segment*16 + offset. By the way, it wasn’t possible to access more than 64KB data at a time – that’s the maximum value for the offset.




32-bit days

Years passed and we got 32 bit processors. 16 bit AX become 32 bit EAX, memory isn’t flat any more (for simplicity, let’s skip the 286). Protected mode with paging and proper access restrictions replaced good old flat real mode. Using 32-bit registers you could address up to 4GB RAM – that sounded like more than enough.


Almost no-one cared about segment registers any more. The funny thing is that they are still there, deep in your CPU, but they are forgotten. For years they were used only by OS kernel, but recently they started to play a completely new role. First let me explain how they behave in a protected mode.



As opposed to general purpose registers, segment registers haven’t been extended to 32 bits. They still hold only 16-bit value. In protected mode, instead of pointing to segment memory location directly they store a descriptor number, which points to memory via a Local Descriptors Table data structure. From our point of view this data structure has two interesting fields: base_addr and limit. The former stores the address of the beginning of the segment, the latter is setting segment length. There is a Linux/BSD/MacOSX syscall which allows access to this data structure on a per-process basis – modify_ldt. It doesn’t require any special permissions, every process can freely play with its own LDT descriptors (this is safe due to paging).


Even though segment registers changed their meaning they are still used by the processor, every access to memory is affected by the currently selected descriptor and the LDT data structure it points to. 


By default, when the process starts, the segment registers point to a descriptor that sets a flat view of the process memory. That’s how unix processes work and what compilers assume.


But it’s just a default that the memory is viewed as flat, it can be modified on per-process basis via segment machinery.







Boxes of sand


The segmentation mechanism was recently rediscovered to do intra-process sandboxing. In other words – to run foreign binary code safely inside a trusted process.


Segmentation is a very nice mechanism to restrict memory access without the burden of whole operating system machinery. To restrict memory access of some code fragment to a certain ‘”subspace” it is enough to modify descriptors via modify_ldt syscall and run the code with segment registers pointing accordingly. Such modified code fragment cannot, while running, affect any code from its neighborhood inside the same OS process. 


Unfortunately the real sandbox implementation is more complicated – apart from LDT it needs to go through machine code and check for some processor commands that can escape from the sandbox. For example it’s necessary to disallow modification of segment registers. This technique leads to a stable machinery which can be used to run arbitrary i386 machine code without noticable performance degradation.



This idea is used by Google Native Client to run foreign binaries natively, despite the fact that they’re downloaded from an untrusted source. Apart from this browser-specific project, there’s also a very nice and lightweight VX32 library




Perfect library


Vx32 is a simple library that allows an untrusted binary code to run natively, inside your application. It was created especially to be embedded. The API is designed very well and reading the code is a pleasure. Enough compliments though, let’s present an example skeleton which can run a simple ‘hello world’ untrusted executable:


void run_elf(char *elf_filename) {
    struct vxproc *p = vxproc_alloc();
    vxproc_loadelffile(p, elf_filename, ...);

    for (;;) {
        int rc = vxproc_run(p); // run the binary!
        switch(rc) {
        case VXTRAP_SYSCALL:
            switch(p->cpu->reg[EAX]) {
            case VXSYSWRITE:
                ... handle the syscall ...
                break;
            ...
            }
            break;

        default:
            // handle other traps - like segmentation fault
            printf("vxproc_run trap %#x\n", rc);
            result = -1;
            goto out;
        }
    }

out:
    vxproc_free(p);
}


Full source code of this example is on my github.


But don’t get it wrong, you can’t directly run any executable. The main trick used in vx32 is to compile the source of foreign code with a modified libc library which has a special implementation of syscall function. Normal syscall() implementation won’t work.






Summary


Techniques used in vx32 allow untrusted binary to run inside your trusted process with:
  • negligible performance penalty
  • restricted memory access
  • fault isolation: segmentation fault inside a binary doesn’t affect the main process


Effectively, vx32 sandboxing introduces yet another abstraction layer in computing. It can be described as a ‘process’ inside operating system process. This is a very powerful tool.




Vx32 could form a basis for future experiments with ‘processes’ of different granularity which provide the comfort of true isolation without sacrificing high efficiency. Think about: 

  • application scripting including Excel like macros or SQL stored procedures that could be written by users in C
  • implementation of OS primitives in user space – normal process could become a new kernel
  • providing isolation where it is badly needed, for example when running cryptographical functions




Disclaimer


Although vx32 is created with security in mind it was never audited. No one can warrant that the authors haven’t missed something. It’s even possible that the bugs found in NaCL can apply to it.




More about segmentation


There are a lot of very interesting articles on segmentation:




by
marek
on
31/03/10
  1. Gates never said it: http://www.wired.com /politics/law/news/1997/01/1484

  2. Nice post! The link to “bugs found in NaCL” is broken, though. Here’s a working version: http://code.google.com/p/nativeclient/issues/detail?id=86

  3. Social comments and analytics for this post…

    This post was mentioned on Reddit by kylotan: That sort of thing is pretty awe[some|ful]. I’ve not decided which yet….

  4. [...] What has happened to the segment registers? « LShift Ltd. http://www.lshift.net/blog/2010/03/31/what-has-happened-to-the-segment-registers – view page – cached There were days when computers had 16-bit registers and 20-bit addressable memory. That is a total of 1MB memory – some claimed that it ought to be enough for anybody. Memory address space was flat and not protected by anything, now it’s known as the real mode. How was it possible to address 20-bit memory with only 16-bit registers? Well, it was necessary to use two registers. One pointed to… Read moreThere were days when computers had 16-bit registers and 20-bit addressable memory. That is a total of 1MB memory – some claimed that it ought to be enough for anybody. Memory address space was flat and not protected by anything, now it’s known as the real mode. How was it possible to address 20-bit memory with only 16-bit registers? Well, it was necessary to use two registers. One pointed to the high bits of physical location – a segment, the other was a relative offset. Physical memory location was just counted as: segment*16 + offset. By the way, it wasn’t possible to access more than 64KB data at a time – that’s the maximum value for the offset. View page Filter tweets [...]

  5. How can it be an “untrusted binary” if it’s been compiled with the special libc(3)? Did I miss something? Why not simply provide the special syscall() as part of vx32? Unless the foreign code was statically linked, you should be OK there.

  6. I don’t understand how you can make sure the untrusted code doesn’t just change the segment register back. Sure you can scan through it beforehand looking for instructions that do this, but once it’s running can’t it just synthesize some? Like this:

    int badcode[10];
    badcode[0]=0×1; // constants here would
    badcode[1]=0×2; // be malicious machine code
    ((void (*)())badcode)(); // run it

  7. Doc:

    The code must be statically linked. The loader doesn’t support dynamic linking, and I wouldn’t trust it even if it did. I’m pretty sure that it’s possible to dynamically rewrite normal syscalls to run vx32 syscall machinery. Though, I don’t think that was the goal of this library.

    It would be much more problematic to play nicely with other glibc features – like thread local storage on fs/gs register.
    http://www.nynaeve.net/?p=180 I’m pretty sure that TLS implementation is impossible on VX32.

    The idea of “untrusted binary” is that it can be compiled by third parties. But to make it runnable it has to be compiled against special libc.

  8. ed409:

    There is a machinery to go which check for bad commands. You can’t play with segment registers, you can’t run dynamic code. You can’t jump to dynamic offset.

    That will crash the vx32 process.

  9. I’m looking forward to a system that can run dynamic code. (In principle, IIUC, vx32′s core design won’t need changing to cope.)

 
 


× 1 = two

2000-14 LShift Ltd, 1st Floor, Hoxton Point, 6 Rufus Street, London, N1 6PE, UK+44 (0)20 7729 7060   Contact us