Playing silly buffers: How bad programming lets viruses in
Published: 15 Jan 2004 17:40 GMT
One of the most prevalent types of attack on networked computers is the buffer overflow. Searching for such vulnerabilities in Microsoft's Knowledge Base returns many hundreds of examples -- it's the mechanism of choice for the discerning malware merchant. New hardware and software techniques are reducing the incidence of this perennial problem, but it's unlikely ever to go away completely.
So, what is a buffer and why does it overflow? In the simplest terms, a buffer is an area of memory used to pass data between different devices or parts of software. Sometimes a buffer can be a physical memory chip -- printers and video cards have dedicated buffers, where the processor sends the data to be output -- but most often it's a temporary area in system memory set aside by one piece of software exchanging data with another. It's this kind of buffer that comes under attack, and there are hundreds if not thousands of them created and destroyed constantly as you use any piece of software.
What looks to you like a single program, like Word, is in fact made up of many components, known as routines, each a tiny program in its own right doing a single job. In a word processor, for example, there may be something that colours a block of text, another that underlines it, and yet another that recognises a block of text as a valid URL. These can have many different uses, like automatically highlighting a URL as you type one into a document.
As you type, whenever a space is entered the previous word is passed to the URL detector. If that finds the word is in fact a valid address, then the word processor will invoke the block colour and block underline code, so the URL appears as such on screen. Of course, the same is-this-a-URL? piece of code can be used by lots of other programs, such as email, database, and even Web browsing software -- and each one that uses it must create and use a buffer in the same way.
All code running on a computer has access to an area of memory called the stack. The processor has various special instructions designed to handle the stack and its contents quickly and efficiently, including handling addresses within the stack much faster than just any old random location. It's also automatically transferred when one routine uses - calls - another one, so it's where buffers are built and filled.
The stack also automatically contains a return address: when one routine calls another, the second routine needs to know from whence it was called so it can hand control back. It's this use of the stack for both data and address information that makes buffer overflows such a tempting target. Often, the return address is very close to the buffer -- exactly how close depends on many things, but is usually the same for a particular routine.






