You’ve seen it in a dozen movies: a character commits a crime, is ID’ed on security camera footage, then dyes her hair to alter her appearance in hopes of evading capture. The m.o. is the same for polymorphic malware—malicious software that’s constantly evolving or changing in order to evade signature detection or blacklisting solutions. Although it’s not a new addition to the hacker’s arsenal, the use of polymorphic malware has lately become a favorite and highly dangerous tactic of organized cyber crime groups.
The Fingerprint Analogy: Not a Perfect Match
The approach taken by our Entropy Analyzer doesn’t lend itself that well to the “fingerprint” analogy, which is a better match for hash approaches. Computer forensics has long relied on hash values, which are produced by taking an unlimited number of bytes from a file and producing a unique, fixed-size number (the hash value) With an MD5, SHA1 or SHA256 hash of a file, you’re dealing with a long string of numbers and letters that are unique to the contents of the file in question. The file name is just metadata for a file, so two files with the same contents and different file names will produce the same hash value.
It’s a routine tactic of hackers, therefore, to take a piece of source code, change one minor thing inside the file and then compile it to create a hash not known to signature databases. Recently a potential customer of Guidance Software in Asia took a program, changed a line of code, compiled it, and dared our sales engineer to spot the malware running on the network. With our Entropy Near-Match Analyzer, the engineer readily found all instances of it, which resulted in a shocked—and pleased—reaction from the potential customer. This was something that competitive products being evaluated in that situation simply could not do.
What Entropy Does and Why It’s So Fast
When you make a small change in some code and compile it, because you’ve only changed a small portion of it, the orderliness of the file hasn't changed . That’s what Entropy measures: the orderliness in a file. This employs a concept straight out of thermodynamic science: entropy is a measure of the amount of disorder in a closed system.
While entropy in thermodynamics can only result in approximations because all possible states cannot be known, information entropy theory has the advantage of knowing the exact content of each file. This means that the number and probability of each state are known precisely.
Entropy has all the benefits of a signature-based tool, producing matches fairly accurately in a way that can be leveraged and used by computers as part of automation. It has significant advantages over those tools, however, as it’s less fallible or “brittle” than a signature, because it operates on a confidence level. You can tune it to return matches within a certain amount of tolerance, for example.
Indicators of compromise (IOCs) have been widely employed in malware detection because they’re less brittle than signatures. But if you have a sample of the malware in question from packet captures, for example, Entropy doesn’t require the creation of a separate definition. There are no if/then and and/or statements needed to describe the “known bad thing.” There’s no need to create a definition or a description. Just take a measurement—that’s all you have to do with Entropy.
And it’s fast. Entropy doesn’t need a lot of calculation time, because it works in a more logarithmic rather than linear way. With hashing, the larger the file, the more time required to calculate the hash. You can learn more about Entropy here.
Comments? How are you finding and dealing with polymorphic malware? We welcome discussion in the section below, whether on this topic or on one you would like to see us write about here in the blog.