Playing silly buffers: How bad programming lets viruses in
Published: 15 Jan 2004 17:40 GMT
If the routine is tricked into writing too much to the buffer, the data it's storing will go off the end of one area and into the next -- potentially into the part where the routine will find its return address. At that point, all bets are off -- instead of returning safely to the code that called it, the routine will pass control to an address that was written in error. A crafty virus writer can force that address to correspond to their own, malicious code - and they've won.
A classic buffer overflow trick is to fool something that's trying to interpret directory and file names. Say there's a rule in an operating system that no directory name can be more than 256 bytes long on a disk. The person who writes the routine may think that this means the buffer for the directory name also only needs to be 256 bytes long -- a reasonable assumption. But elsewhere in the operating system, there's a specification that says you can represent a character by an escape sequence, so ^84 is the same as the letter T -- strings that use that form of nomenclature will be three times as long. If the routine doesn't know to check for that, it can easily end up copying far more than 256 bytes into the buffer even though it's sticking to what the writer thought were the rules. The programmer could have chosen to check for an overflow by counting bytes -- but that would involve some more programming, slowed the routine down and introduced more chances of error. At least, that would probably be the excuse: assumptions, ignorance and laziness are behind many buffer vulnerabilities.
A virus writer uses all the above information. They know where on the stack the return address is, they know how big the buffer is and they know how far they are apart. If they can fool the routine that writes to the buffer to write just that little bit more -- and arrange to have their own address copied at just the right place to overwrite the original return address -- they can take control of the computer. They can put their own malicious code in the buffer itself, and thus install and transfer control to a bad routine just by presenting the right data in the right way. They don't need to get user names, passwords or security privileges -- the operating system will think that the malicious code is being run under whoever's privileges were in use at the time.
Various ways exist to catch this behaviour. One of the most common -- now included by Microsoft in Windows 2003 -- is to generate a very hard-to-guess number and put it in a place in memory with no connection to any vulnerable buffers. Whenever a routine is called, a copy of this number -- called a cookie by Microsoft or a canary by everyone else -- is put on the stack just before the return address. The routine that's called does its job as usual, but immediately prior to getting the return address from the stack it checks the canary against the reference copy. If something has overwritten the stack on the way to the return address, the canary will be destroyed and the routine knows not to try and return control but to stop the software with an error.
This works well, for both malicious code and innocently written stack-trashing bugs. Because the canary is effectively random it's not possible for a virus to guess what number it's overwriting, and it doesn't affect the normal running of the code. However, there are still potential vulnerabilities -- if the copy of the canary in shared memory can be changed by an exploit, then the mechanism can be bypassed, and it's also possible for the error reporting mechanism to be attacked.
There are other ways. Both AMD and Intel have said that they are adding hardware support to their processors to stop the exploitation of buffer overflows: in effect, adding the ability to make critical areas of memory incapable of holding code that will execute. The processor can read and write it as usual so a buffer overflow can happen, but if the compromised address tries to transfer control to within the buffer -- where the virus lives -- the processor will refuse and an error will be generated.
However, there are good reasons why executable code may want to live on the stack, so such a technique will not be universally applicable in the future. Likewise, while the canary technique catches a good many classes of vulnerabilities there are other places where buffers full of data and addresses for executable code live side by side, in existing software as well as in stuff that's yet to be written.
In the end, we can only say that more tools will exist to catch or stop buffer overflow vulnerabilities from happening. Some will be in the operating system, some will be available for programmers to use if they wish. But good programmers have always been able to write code that is highly resistant to buffer overflows, while bad programmers will always be able to leave room for the unexpected case to cause unwelcome consequences. Poor programming, like poor people, will be with us always: education and higher standards will do as much to keep our buffers safe as anything else.







