Saturday, August 06, 2011

tavis ormandy's sophail presentation

at the black hat security conference this year tavis ormandy presented his research into the way an anti-virus product, namely sophos' product, operates in order to shine a light on something seems like it would be an important subject to consider when judging the efficacy of a product. the paper associated with the presentation can be found here.

tavis was not particularly kind in his evaluation of the code (which he apparently performed by reverse engineering the product). sophos' response is very measured and diplomatic, which is pretty much the perfect way (from a PR perspective) to respond to the kind of criticism being leveled at them. as usual, however, i don't have to be diplomatic.

tavis' paper betrays a conceit that i suspect is more common in those who break things than it is in those who make things. developers, upon dealing with someone else's code, inevitably learn an important lesson: the code tells you what the software does, but it doesn't tell you why it does it that way. tavis thinks he knows all he needs to know, but he only had the code to go by. so when it comes to why certain things were done the way they were, the only thing he could reasonably do is make educated guesses. in some cases those guesses may well have been quite good, but in others they were not.

i first realized this was going to be the case on the second page of the paper where he describes how weak the encryption used on the signatures was, often an XOR with an 8 bit key. if you were to guess that such encryption was to protect the signatures from an attacker, as tavis seems to have, then you'd be dead wrong. the primary purpose encrypting signatures serves is to prevent other anti-virus products from raising an alarm on the signature database (something that used to happen in the very early days).

on page 3 it's mentioned that the heavy use of CRC32 in the signatures means it's easy to maliciously create false alarms by creating files that have the same CRC32 values that a particular signature is looking for and in the same places. now i ask you, the reader, if someone is maliciously planting files on your network that are designed to raise an alarm, is that alarm really false? it may be a false identification, but there really is something malicious going on that an administrator needs to investigate.

also on page 3 he criticizes the quality of the signatures by stating they appear to ignore the context of the programs they come from, that they're for irrelevant, trivial, or even dead code. perhaps tavis expanded on this in his live presentation, but the paper doesn't make clear whether or not he actually looked at the malware the samples were supposed to be for. if he didn't, then the criticism about ignoring context would be particularly ironic. let's assume then that he did. how many malware samples did he examine? if only a handful then there's a not insignificant chance that he was dealing with bad examples that aren't really representative of the overall signature quality. did he ensure that his samples were actually malware? did he ensure that his samples were being identified by the right signatures? his previous criticism (on the same page!) about false identifications should highlight the fact that hey may have been looking at the wrong code when judging the quality of the signatures. but more importantly than that, there isn't a 1-to-1 relationship between signatures and samples. one signature may be intended to detect many (tens, hundreds, even thousands of) related samples - and however pointless tavis may think those sections of the malware code are, they may represent the optimal commonality between all those samples.

around about page 8 or so, tavis makes a point of highlighting the fact that the emulation the product does in order to let malware reveal it's unencrypted/unpacked form only goes for about 500 cycles. this highlights a failure to understand one of the core problems in malware detection: the halting problem. for any other criteria the code might look for in deciding it's seen enough, there's no guarantee that criteria will ever be encountered on a particular input program. there has to be a hardcoded cut-off point or the process runs the risk of never stopping - and that would severely impact the usability of the AV software. likewise if the hardcoded cut-off point isn't reached soon enough it also impacts the usability of the AV software.

there may yet be other examples of poor guesswork that i didn't see. my own knowledge of why certain design choices might be made is limited as i've never actually built an anti-virus product. i have considered it, of course, since i am a maker of things. perhaps tavis ormandy would benefit from more experience making things rather than breaking them. perhaps this was an unorthodox expression of the oft' repeated concept that the skills needed to attack are not the same as the skills needed to defend.

6 comments:

Anonymous said...

A minor point regarding the CRC problem - I think that both Tavis and you have missed it. The fact that CRC checksums can easily be forged is indeed a problem (theoretically) in using them for malware detection. The problem, however, is not that an attacker can create a file that has a CRC you already use and plant it on your system, as both Tavis and you surmise. The problem is that the attacker can create a piece of malware that has the same CRC as that of a well-known and widespread legitimate program. When when the AV program puts the CRC for the malware in its database, it could suddenly create thousands of false positives all over the world. Normally, such things are found during testing, but SNAFUs do happen from time to time.

The reason why I said that it was a theoretical problem is that during my 23-year career as anti-virus researcher, I have yet to see a malware author using this attack.

Another minor point about the cut-off of the emulator. If the Sophos scanner indeed has a static cut-off of 500 cycles for the emulator, this indeed isn't very smart. The problem is not only that it is too short; the main problem is that it is static. For instance, our emulator uses a default cut-off of 2 million instructions - but the detection language in the database can control the emulator. For instance, a detection entry for a malware variant can say "oh, and if you found this sequence of bytes while emulating, keep emulating for X more cycles". In other words, the cut-off is dynamic, not static.

kurt wismer said...

the CRC collision with good files seems like a difficult attack to mount as i imagine it would be difficult to predict what part of the file the AV vendor chooses to calculate a CRC of.

good insight about the emulation, though. thanks for that. the dynamic approach you describe does indeed make sense when something fishy is found. i didn't really have any frame of reference to judge what's large enough for when nothing fishy is found though.

Paul Ducklin said...

As Vesselin Bontchev surmises, the Sophos emulator doesn't have a static cutoff of 500 cycles. Like F-Prot's - and, I suspect, most other code emulators in most other decent threat-detection products - our emulator is controllable at runtime, scan-by-scan.

In fact, one of the reasons Ormandy's paper shows only a small number of older executable packers being handled by hard-wired code in our product (e.g UPX, PECompact) - something he assumes is a reason for criticism - is that we use the emulator to assist with the bulk of our unpacking needs.

That generally takes a _lot_ more than 500 CPU cycles :-)

As Kurt says in the main article, the "halting problem" says we may end up emulating for ever, with no result. So it's no good just leaving an emulator to run "to see what happens".

But, as Vesselin suggests, you can run the emulator in a controlled way, giving it more and more rope only if the results so far seem to justify it.

Very loosely put, the greater your certainty you're on the track of something interesting, the more millions of instructions you give to the emulator.

Bruce Thompson said...

I wouldn't be concerned about collisions with good files, my concern would be creating false negatives. If it's relatively easy to spoof the CRC32, then it's also easy to obfuscate it. A single bit change could result in the signature no longer matching. This, to me, is the inherent weakness of any checksum based detection scheme.

To put it another way, a CRC32 match on a known bad actor is a good indication that you've found that bad actor, or you've found something that for some mysterious reason is masquerading as that bad actor. Finding no matches for anything known tells you precisely nothing about what it is you're looking at.

kurt wismer said...

@Bruce Thompson
"If it's relatively easy to spoof the CRC32, then it's also easy to obfuscate it. A single bit change could result in the signature no longer matching. This, to me, is the inherent weakness of any checksum based detection scheme."

it's the weakness of all known malware detection systems regardless of how they're implemented - once you change a known malware it is no longer the same malware and thus no longer qualifies as "known".

however, since known malware detection was only ever appropriate for known malware it has never been appropriate to rely on it exclusively. it has always been called for to use methods for detecting unknown malware to complement the known malware detection algorithms. there are a variety of reasons why unknown malware detection methods can't be relied on exclusively either.

AL said...

One of the other considerations to bear in mind is, if you like, one of the 'dirty little secrets' of the industry - that of having to deal with ridiculous tests of our products. There exist thousands (perhaps millions) of corrupt/dead/intended/junk files, which are only relevant to people with large collections of 'malware' for the purposes of testing. The way that many companies (by no means all) deal with this is to simply add simple signature based detections for those samples. CRC is a fast method for a certain proportion of those, and although it's hardly robust, nor is it likely that a CRC taken across some specific parts of those files will be all that problematic. This comes down ultimately to the problem of verification. Testing of AV is a lost cause if you don't verify the samples. No tester truly verifies all the samples, ergo, all testing is next to useless. Notwithstanding that, people continue to test our products against piles of useless files - therefore, the only solution is, to some extent, to 'cheat' by adding - en masse - detection for such useless crap. The low impact way (from a user perspective) is to use 'signatures'. This gives the testers the illusion that they are having some sort of real impact on the confidence that users have in AV products, and gives AV vendors time to get on with the real job of dealing with actual malware (for which task you require emulators and much more modern methods than CRC's across a file). I seriously doubt that Sophos (or any other decent AV company) is purely using CRC as their frontline protection - if they were, they'd be out of business.
As to the emulation, Vess already covered that perfectly.