Finding Binary Clones with Opstrings & Function Digests: Part I

Reverse engineering is an invaluable engineering tool.


July 01, 2005
URL:http://www.drdobbs.com/finding-binary-clones-with-opstrings-fu/184406152

July, 2005: Finding Binary Clones with Opstrings & Function Digests: Part I

Andrew is a software litigation consultant who works on the technical aspects of cases involving copyright, patents, trade secrets, antitrust, and privacy. He can be contacted at [email protected] or http://www.undoc.com/.


Stop me if you heard this one. It's his first night, and the new guy hears the other inmates shouting out numbers. "1375" shouts one; the others explode with laughter. Another yells "3811," and there are chuckles of recognition. The new guy asks his cellmate what's up, and learns that everyone else has been there so long, they know all the jokes by heart, and rather than repeat them, they just assigned them numbers, and now just tell the number instead of the joke. Next night, trying to fit in, our hero pulls a number off the top of his head, shouts out "2342"—and it falls flat. Nothing, it's dead out there tonight. His cellmate informs him that he told it wrong, or that some people don't know how to tell a joke, or that no one gets it or..., well, you get the idea. Same joke really, regardless of the punchline.

In this joke, a number arbitrarily assigned to a joke is somehow as expressive as the joke itself, like humming along to the hexdump of an MP3 file. The joke also suggests that a joke (or a story or article) is probably not very unique. It's just a number whatever. Folklorists have standard indices of motifs and tale types in which, for example, "Cruel Stepmother" is S31, "magic object received from fairy" is D813, and "identification by fitting of shoes" is H36.1. Though not by design, traditional stories are put together out of such motifs, which are, in a sense, low-level design patterns.

And so, to software: It would be useful, when looking at a program, to immediately know which parts of it are truly unique to that program, and which are "boilerplate"—stereotyped material that appears, more or less verbatim, time and again. Much software is cloned from other software. One study found even expert programmers producing an average of four code "clones" per hour (http://tlau.org/research/papers/ M.Kim-EthnographicStudyofCopyPaste- ISESE.pdf). If each common low-level design pattern had an assigned number, we could answer questions about the code's uniqueness or lack therefore, about how self-similar it is (that is, how much of the code appears more than once in the entire system), or about how one part of it relates to another, or how one version relates to the previous version.

It should be possible to identify software clones and boilerplates, even without the source code. Take a Microsoft Windows XP CD. Microsoft says that XP consists of 45 million lines of code. You probably don't have the source code, but this CD contains about 310 MB of binary code files. (The binary code on the CD relates to the source code, perhaps as the number of a joke relates to the joke itself.)

It sounds like questions about a program should be unanswerable without the source code. Because the source code is boiled away, as it were, in the process of compilation, trying to recover the source code from the binary program would be like trying to reconstruct a cow out of a hamburger. One of the premises of the Open Source movement is, of course, precisely that something is missing ("Where's the beef?") when programs are distributed without their source code.

But given that the processor actually has to run the binary code, this code can't—at least in a practical, good-enough sense—be all that much of a closed book, at least, any more than source code often is (on the false dichotomy between source and object code, see http://www-2.cs.cmu.edu/~dst/DeCSS/object-code.txt). In fact, examining binary object code may sometimes have advantages over examining source code.

The goal of this and subsequent articles is to construct a tool for building a database of function signatures or fingerprints as found in Windows XP and in major Windows applications such as Microsoft Office. A similar database has been proposed for Java code (http://citeseer.ist.psu.edu/baker98deducing.html). "Signature" here refers not in a C++ sense to the specification of a function, but rather to some characterization of the implementation of a function. With such a database, we should be able to:

These articles focus largely on creating a tool that lets you do these things; the actual "doing" appears in a subsequent article. However, a few glimpses of function database applications are in order.

Figure 1 is an excerpt from a function database built from all of the Win32 code files on the XP CD. Using Microsoft's symchk utility, PDB files were downloaded from the Microsoft Symbol Server for only a few of the files; a PDB provides names of internal functions in its corresponding Win32 code file. Here, there's a routine named _SHATransformNS@8 in dlimport.exe, dxmasf.dll, and (toward the bottom of Figure 1) wmvcore2.dll. The second column shows function signatures, and the many identical signatures in the second column show that over 30 files in the system contain this same piece of code. For example, the function at address 58F4C6BD in this version of wmvdmod.dll is _SHATransformNS@8. (Many of these files have been identified as comprising Windows Media Player; see http://www.microsoft.com/mscorp/legal/eudecision/ faq.asp.) The entire database of over half-a-million signatures was sorted (in reverse numeric order), so that matches like this would jump out from even a glance at the database file. The first column shows the size of the code, though not in bytes, but in terms of those elements of the code (the choice of which is the main subject of these articles) used to construct the signature; the binary code happens to be 2882 bytes and disassembles to over 1000 lines of assembler code.

Clearly, this code is part of the Secure Hash Algorithm (SHA). (It is probably fairly similar to the code at http://www.cc.utah.edu/~nahaj/c/sha/sha.c.html, except that a brief inspection of the Microsoft code shows that the numbers 0x5a827999, 0x6ed9eba1, and 0x81f1bbcdc appear over and over—a sure sign that the loops have been unrolled, which explains the very large size of this function.) Apart from showing that this code is duplicated 30 times within XP, Figure 1 also illustrates how, with a function database, debug symbols for one file can provide a name for a function in an entirely different file.

As another application, I queried for signatures representing large functions residing in both EXE and DLL files. This reveals functions in DLLs whose general usefulness is indicated by the same code's appearance in EXE files; but the EXE has its own complete copy of the code instead of linking to the DLL. Figure 2 shows one example.

There are routines inside untfs.dll, identical copies of which appear in the three EXEs; this suggests that these routines probably ought to be exported from the DLL, documented, and/or linked to by any application files that need this functionality. (Untfs.dll contains functions related to the NTFS filesystem; the three EXEs are filesystem-conversion utilities intended to be run early in the OS boot sequence, possibly before DLLs can be loaded.)

In my work, I've used the techniques described here to help uncover possible copyright infringement by first filtering out noninfringing similarities. If you compare every line in every file of even the most unrelated and noninfringing C/C++ source code projects, you will find matching lines such as return 0; or int i;. In Windows source code, you would find identical boilerplate references to wndclass.lpszMenuName = szAppName;. Such boilerplate code must be ignored when comparing source code. There's an analogous situation when comparing binary code: You need to filter out startup code, runtime library (RTL) code such as printf, wizard code, and so on.

Having an extensive function database makes it easy to create this exclusion list. This database can be created in part simply by compiling one's own programs with different compilers and linkers, different optimizations switches, and so on, and with debugging symbols enabled, so that the function database contains meaningful names. Figure 3 is an example of some Microsoft C RTL function signatures.

Were functions with these signatures encountered in the course of a binary copyright-infringement examination, you would know to exclude them from consideration. They are boilerplate code.

Signatures like those in Figure 3 can also be used when disassembling or debugging. The bold line means, for instance, that any function whose signature (computed as described in these articles) is 8D71CE1651AD88B045B36FB9EBE0109F, is in fact a copy of a particular Microsoft implementation of _strlen. All the functions in Figure 4, otherwise nameless, are now known to be _strlen.

This same idea of using function signatures to provide names for disassembly and debugging is used in IDA Pro's "FLIRT" (Fast Library Identification and Recognition Technology; http://www.datarescue.com/idabase/flirt.htm), and has been discussed in the context of code decompilation (http://www.itee.uq.edu.au/%7Ecristina/tr-2-94-qut.ps). When reverse engineering a program, knowing that some piece of code is actually _strlen or _printf will usually mean that the code need not be disassembled. Reverse engineering is at least as much about knowing what to ignore as it is about taking stuff apart.

Assisting with code disassembly is only a minor side benefit of the database discussed here, however. Our real goal is to create high-level models of systems such as Windows—models that describe what parts of the system most closely belong together, for example—based on nothing more than its binaries. A disassembler is run in the course of building the database, but its output is used by another program and then discarded, not viewed by a human.

File Fingerprints

Where will the signatures in a function database come from? Of course, the string of bytes that comprise a file can easily generate a hash or digest for that file, using an algorithm such as MD5. This digest is typically much smaller than the file itself (hardly surprising in a digestive process), yet generally works as a fingerprint or signature uniquely representing the much larger file. "Generally," because MD5 collisions have been found such that two different pieces of data can produce the same digest (http://www.doxpara.com/md5_someday.pdf).

Any program that generates MD5 hashes will report that C3C3864DA698F0CC1BE56F9695534DD8 is the digest for a 421-KB Windows file, \windows\system32\LegitCheckControl.dll, Version 1.0.0132.4. Sure enough, typing "C3C3864DA698F0CC1BE56F9695534DD8" into Google results in several hits to pages mentioning LegitCheckControl.dll. (In the absence of a central Microsoft server of file signatures, Google acts as a decent one.) The 32-character MD5 string is a boiled-down representative of the 421-KB file.

Canonically, any MD5 program reports that D41D8CD98F00B204E9800998ECF8427E is the digest for any 0-byte file. Typing "D41D8CD98F00B204E9800998ECF8427E" into Google results in thousands of hits to web pages that note this is the digest of the zero-length string. (Though the Solaris Fingerprint Database reports that 10,247 different files match this fingerprint.) Empty files, like happy families, are all alike.

If you run a utility such as Microsoft's File Checksum Integrity Verifier (fciv) on the \windows\system32 directory, it spits out an MD5 for every file in the directory. The MD5 appears on the line before the filename, so the resulting output can be sorted by MD5 rather than by filename. Glancing at the sorted output reveals any matching MD5s, which represent duplicate files appearing under different names, as in Figure 5.

This is somewhat useful, but the slightest difference results in a nonmatch. Comparing MD5 file digests is like a straightforward binary comparison with the cmp utility, except that MD5s are more easily transported than the files themselves.

As a workaround for this over-exactness, Microsoft has suggested running its DumpBin utility with command-line options that ignore date/time stamps, and then comparing the resulting text output with a diff utility (http://support.microsoft.com/kb/q164151/). Microsoft also has a BinDiff (one of many programs with this same name) for this same purpose. However, these methods still treat the file as one big clump.

Any binary file can be turned into some sort of a text file (even if only a hexdump), then examined with a text utility. You may therefore want to know why, if we want to compare DLLs or EXEs, we don't use diff or some other text-based tool. One reason is that diff compares entire lines, and (for reasons that will soon become clear) when comparing textual representations (such as disassemblies or hexdumps) of code files, we would only want to compare parts of a line, ignoring the rest of the line or treating it as a wildcard. A selective diff tool is easily built with languages such as Perl or Awk. A tool to slice a text file (extracting, say, only fields 3 and 4 from every line) is also easily built, and would help construct another tool that, indirectly via disassembly listings, allows two DLL or EXE files to be compared. Also, there are already binary diff utilities that will try to find the smallest possible set of edits that would turn one binary file into another (presumably a later version of the first); this idea is used, for example, in PocketSoft's RTPatch (http://www.pocketsoft.com/whitepapers/rtpwhite.pdf).

However, the goal here is not to find either the differences or the similarities between two files. In essence, the goal is to know, for a given file, which parts of it also appear in any of thousands of other files. This is not a job for diff or cmp.

Function Fingerprints

If digests can be created for entire files, then why not for the parts of a file? The file's digest, then, would represent (though isn't computed from) the set of digests of its component parts.

Nearly everyone with a computer, if only because they have either felt or been compelled to turn over their credit-card number to receive an antivirus update, is familiar with the idea of virus signatures (http://ftp.cerias.purdue.edu/pub/papers/ sandeep-kumar/kumar-spaf-scanner.pdf and http://www.cs.wisc.edu/wisa/papers/ issta04/issta.pdf). Often these signatures are verbatim transcriptions of a portion of a virus, where that portion (which may be data, such as a malicious text message rather than code), hopefully belongs uniquely to the virus and nothing else. e7iqom5JE4z is part of most signatures for the AnnaKournikova virus, for instance, because that is the name of a function in the virus's VBScript code. Increasingly, virus signatures are less literal, allowing for wildcards in the match.

Rather than locate an infection in a file, we will want to break a file down into its component parts and identify each of these parts. Still, there is some similarity between function signatures and virus signatures, particularly those that focus more on a virus's structure than on its literal bytes. It is sad that, of the little empirical study of software that takes place (our field is much more focused on adding to the pile than on understanding what we already have), so much of it is focused on the viral work product of underemployed 17-year-old boys. One of the goals of the function database is to show that antimalware techniques can be extended to examine plain commercial code.

There are numerous ways that we might split a binary file into component parts. As one example, an article by Kris Coppieters describes a BinDiff utility (DDJ, May 1995) that compares two binary files based on a dynamically selected file delimiter. This idea might be adopted so that, in a Windows executable, bytes corresponding to common instructions, such as MOV or PUSH, might make useful delimiters, marking off sections of more-interesting code in the way that '\n' marks off lines in a text file; the byte representing a function return (RET) might also be used as a delimiter.

However, when dealing with Windows software, the structure of the file is already known. Windows binary software resides in Portable Executable (PE) files. The file format is documented (http://www.microsoft .com/whdc/system/platform/firmware/ PECOFF.mspx), and is supported by numerous tools, including the one I'll rely on here, Clive Turvey's DumpPE (http:// www.tbcnet.com/~clive/dumppe.htm). DumpPE includes a disassembler that locates function starts (though only to a lesser extent, function ends) in a Windows program, and outputs the code for that function.

What we want, it seems, is simple—an MD5 digest for each function in a file. (I ignore the file's data, such as strings, resources, and so on.)

As stated, though, this would be of little use, because the code in a binary program is usually position dependent. For example, a compiler and linker may transform the simple C source code for func in Figure 6 into the Intel x86 code represented in either Figures 7 or 8, depending on the location of func within the entire program; this location depends on factors unrelated to the contents of func.

The second column of these disassembly listings (produced with DumpPE) shows the literal bytes of code that will be executed by the microprocessor. While a brief glance shows that Figures 7 and 8 are, as you would hope, darn similar, the bold areas show that the same source code in Figure 6, compiled in the same way with the same settings, results in slightly different bytes of code, depending on the function's location in a file.

Forget that you've seen the source code; assume the binary code is all you have. And try to ignore the fact that it's easy, when you're looking right at them, to see the similarities between Figures 7 and 8. Think, instead, of the code in Figures 7 and 8 appearing as only two out of, say, three-million pieces of code. How then would you match them up?

Were a string to be constructed from the literal bytes in Figure 2 (55, 8BEC,51,6A01,6830504000,...), it would differ from one constructed from the literal bytes in Figure 8 (55,8BEC,51,6A01, 6838894000,...). MD5 digests generated from these strings would, of course, also differ. Digesting such strings, therefore, would not be helpful in attempting to ID either function against a database of known functions, or to match the two functions with each other.

What is wanted, then, is a way to characterize a function, without using the exact bytes that make up the function. Ideally, you would like the smallest possible representation of the function that uniquely distinguishes it from all other functions, but that at the same time encompasses minor variations of itself.

Calling an MD5 digest a "signature" or "fingerprint" is a misnomer in a way because, while smaller than the original, it is boiled down or digested from the entire original, all of its parts; whereas we normally think of a signature or fingerprint (except perhaps a so-called "DNA fingerprint") not as some sort of homunculus of its owner, but as a small subset that stands in for them.

As one example of a subset that represents the whole, in a copyright-infringement case years ago involving software for Windows 3.x, I used the sequence of Windows API calls made by a function. While Windows API usage is heavily stereotyped with a small subset of the entire API set employed over and over in expected ways, in this case, one program used a sequence of obscure and/or undocumented Windows API calls, and it could be seen from the import table of the other file, even without disassembling, that it too used this same highly unusual sequence of API calls (http://www.sonic.net/~undoc/apple_ms.txt). In other words, the sequence of API calls issued by a function worked as a signature or fingerprint for the function itself. It would be too limiting to restrict ourselves to Windows API calls as the basis for function signatures, but this shows that some subset of a function can be extracted to characterize the function itself.

Besides sequences of unusual API calls, a function might be characterized using sequences of unusual assembly language instructions, or typical instructions used in unusual ways, or simply by ignoring the handful of instructions such as MOV that make up the bulk of code. Other techniques focus on the branch targets in a function, or the branches and calls that it makes. The general point is to move as far away from the literal binary code as possible, to avoid false negatives (missing cases where the overlying source code is the same or a trivial cut-and-paste modification), but not so far that there are unacceptably many false positives (ostensible matches found even when the overlying source code does not match).

We want the function's shape, its edges, not the function itself. As a hint of how I do this in the next installment of this article, look back at Figures 7 and 8, but try looking at it from a different perspective. Think about slicing a section through the code.

DDJ

July, 2005: Finding Binary Clones with Opstrings & Function Digests: Part I

744 . ECD500E1A7A5B8EF213539EB00EA7DB7    cmpbk32.dll - jmp_GetOSVersion
744 . D2ED9FAAF554B24E15CA26F0AF33B6F4    riched20.dll - off_74E50E08
744 . C56930F2796C78605C6399317B2E83AC    dmintf.dll - off_6CA231EA
744 . 63215108C3F4923AD61AFE8766A3C265    opengl32.dll - off_5ED6ECBA
744 . 1048805DE9ABC21DC0D3FB4733C00C01    user32.dll - off_77D7E38A
744 . 4050EB94076376FEF53CEFF95777A784    cewmdm.dll - fn_6F64F43D
744 . 4050EB94076376FEF53CEFF95777A784    dlimport.exe - _SHATransformNS@8
744 . 4050EB94076376FEF53CEFF95777A784    drmclien.dll - fn_41B1F4CD
744 . 4050EB94076376FEF53CEFF95777A784    drmv2clt.dll - fn_5186C71D
744 . 4050EB94076376FEF53CEFF95777A784    dxmasf.dll - _SHATransformNS@8
744 . 4050EB94076376FEF53CEFF95777A784    encdec.dll - fn_55DF218D
744 . 4050EB94076376FEF53CEFF95777A784    licdll.dll - fn_55AA6A36
744 . 4050EB94076376FEF53CEFF95777A784    moviemk.exe - fn_6008A35D
744 . 4050EB94076376FEF53CEFF95777A784    mspmsp.dll - fn_6008087D
 ... [more like this, including mssap.dll, msscp.dll, mstlsapi.dll,
          msvidctl.dll, mswmdm.dll, qasf.dll, rdpwd.sys, shmedia.dll...] 
744 . 4050EB94076376FEF53CEFF95777A784    wmv8dmod.dll - fn_58FB23FD
744 . 4050EB94076376FEF53CEFF95777A784    wmvcore.dll - fn_4F57FD4D
744 . 4050EB94076376FEF53CEFF95777A784    wmvcore2.dll - _SHATransformNS@8
744 . 4050EB94076376FEF53CEFF95777A784    wmvdmod.dll - fn_58F4C6BD
744 . 4050EB94076376FEF53CEFF95777A784    wmvdmoe.dll - fn_58E8813D
743 . E84B4BF7E900767D5E2E33DFDF1B7C5B    d3dim700.dll - fn_7396CE20
743 . 90786D0DD5B4C01419F0D8938D1CF3D1    wmvcore2.dll - fn_56038355
743 . 94E0CD7A8AD1A53BCFE9CB60A0214F0F    srrstr.dll - fn_5C02CE8B

Figure 1: Excerpt from function database: _SHATransformNS@8 and duplicates throughout XP.

July, 2005: Finding Binary Clones with Opstrings & Function Digests: Part I

577 . B647FA858CFFF1A967CF4E651E36564E   autofmt.exe - fn_0104E67B
577 . B647FA858CFFF1A967CF4E651E36564E   autoconv.exe - fn_01053928
577 . B647FA858CFFF1A967CF4E651E36564E   autochk.exe - fn_010504BC
577 . B647FA858CFFF1A967CF4E651E36564E   untfs.dll - fn_5B02B866
 ...
143 . 8860A6DD56DF723412AA7ABFBE77BF4B   autofmt.exe - fn_0102C4A0
143 . 8860A6DD56DF723412AA7ABFBE77BF4B   autoconv.exe - fn_01032DDE
143 . 8860A6DD56DF723412AA7ABFBE77BF4B   autochk.exe - fn_0102E7E4
143 . 8860A6DD56DF723412AA7ABFBE77BF4B   untfs.dll - 
        ?QueryLcnFromVcn@NTFS_EXTEN T_LIST@@QBEEVBIG_INT@@PAV2@1@Z
 ...

Figure 2: untfs.dll contains NTFS-related code that is duplicated in auto*.exe.

July, 2005: Finding Binary Clones with Opstrings & Function Digests: Part I

199 . FDC0933BEF8FEF46BBC801CEF9044EC7  hello.exe - ___sbh_heap_check
164 . FDE6365C1BD96441A9D11FEF5DE3C179  hello.exe - ___sbh_resize_block
140 . A60BAC1A238AB85A3BE509678459B543  hello.exe - _strncpy
129 . B0C0A7FB8E2DF2F7395C12B104823590  hello.exe - __read
113 . 5AC42E0A19DEA1A23BCA486032420B87  hello.exe - __ioinit
111 . 45287CFEDD4D9AAE9C8DE16AE174CF22  hello.exe - __setmbcp
98 . 156CEDE43BBF5A9FB890C2FD86906AEC  hello.exe - __write
89 . 31C50F435876A5BF29E58B7E05EE530A  hello.exe - ___crtGetEnvironmentStringsA
72 . 22ADC7AEE6DCB10FDFA978A5C4DD7387  hello.exe - _strcat
66 . 53AB78B2C866220949F86393731E0048  hello.exe - __XcptFilter
45 . DA586DD1E4B4F00BAAEC323B76E38299  hello.exe - __close
44 . FFE7B3C3E94F188C3BD3CB0785C81740  hello.exe - __setenvp
42 . 826B40AD485B9C7EB8961AC07A0CCD68  hello.exe - ___crtGetStringTypeA
41 . 8D71CE1651AD88B045B36FB9EBE0109F  hello.exe - _strlen
 ...

Figure 3: Some function signatures for Microsoft C RTL functions.

July, 2005: Finding Binary Clones with Opstrings & Function Digests: Part I

41 . 8D71CE1651AD88B045B36FB9EBE0109F    aclayers.dll - fn_715CB360
41 . 8D71CE1651AD88B045B36FB9EBE0109F    acxtrnal.dll - fn_71492E40
41 . 8D71CE1651AD88B045B36FB9EBE0109F    cewmdm.dll - fn_6F653AF0
41 . 8D71CE1651AD88B045B36FB9EBE0109F    cliconfg.exe - fn_00402A60
41 . 8D71CE1651AD88B045B36FB9EBE0109F    compatui.dll - fn_6E6E79C0
41 . 8D71CE1651AD88B045B36FB9EBE0109F    csseqchk.dll - fn_6DBB9A80
41 . 8D71CE1651AD88B045B36FB9EBE0109F    defrag.exe - fn_0100D8E0
41 . 8D71CE1651AD88B045B36FB9EBE0109F    dgrpsetu.dll - fn_6D2126A0
41 . 8D71CE1651AD88B045B36FB9EBE0109F    diactfrm.dll - fn_6CFC1810
41 . 8D71CE1651AD88B045B36FB9EBE0109F    dxdiag.exe - fn_01037850
 ...

Figure 4: Some functions that the function database can ID as _strlen.

July, 2005: Finding Binary Clones with Opstrings & Function Digests: Part I

C:\TEST>fciv \windows\system32 | sort
 ...
0ccaf5f4111f7366588d3840b0067dc2 \windows\system32\igfxrjpn.lrc
0ce97f997318122e726fc94e56f3cbf3 \windows\system32\SOFTPUB.DLL
0cf4c7f3341d73d4044053b203ac04e5 \windows\system32\wmp.ocx
0cf4c7f3341d73d4044053b203ac04e5 \windows\system32\wmpcd.dll
0cf4c7f3341d73d4044053b203ac04e5 \windows\system32\wmpcore.dll
0cf4c7f3341d73d4044053b203ac04e5 \windows\system32\wmpui.dll
0cfd77715e899e9fde1db92e64a4a897 \windows\system32\secedit.exe
0d143112394173967a3647096f74e743 \windows\system32\C_037.NLS
 ...

Figure 5: Some duplicated files in an updated version of Windows XP.

July, 2005: Finding Binary Clones with Opstrings & Function Digests: Part I

func(char *buf)
{
    int retval;
    do {
        retval = MessageBox(0, buf, "MSGBOX", MB_OKCANCEL);
    } while (retval != IDOK);
}

Figure 6: A piece of Windows application code.

July, 2005: Finding Binary Clones with Opstrings & Function Digests: Part I

00401000                    fn_00401000:       ; Xref 00401052 00401060
00401000 55                     push    ebp
00401001 8BEC                   mov     ebp,esp
00401003 51                     push    ecx
00401004                    loc_00401004:      ; Xref 0040101E
00401004 6A01                   push    1
00401006 6830504000             push    405030h
0040100B 8B4508                 mov     eax,[ebp+8]
0040100E 50                     push    eax
0040100F 6A00                   push    0
00401011 FF1594404000           call    dword ptr [MessageBoxA]
00401017 8945FC                 mov     [ebp-4],eax
0040101A 837DFC01               cmp     dword ptr [ebp-4],1
0040101E 75E4                   jnz     loc_00401004
00401020 8BE5                   mov     esp,ebp
00401022 5D                     pop     ebp
00401023 C3                     ret

Figure 7: Disassembly of code in Figure 6, after compilation (without optimization).

July, 2005: Finding Binary Clones with Opstrings & Function Digests: Part I

004010A3                    fn_004010A3:                ; Xref 00401019
004010A3 55                     push    ebp
004010A4 8BEC                   mov     ebp,esp
004010A6 51                     push    ecx
004010A7                    loc_004010A7:               ; Xref 004010C1
004010A7 6A01                   push    1
004010A9 6838894000             push    offset 408938h
004010AE 8B4508                 mov     eax,[ebp+8]
004010B1 50                     push    eax
004010B2 6A00                   push    0
004010B4 FF1524A24000           call    dword ptr [MessageBoxA]
004010BA 8945FC                 mov     [ebp-4],eax
004010BD 837DFC01               cmp     dword ptr [ebp-4],1
004010C1 75E4                   jnz     loc_004010A7
004010C3 8BE5                   mov     esp,ebp
004010C5 5D                     pop     ebp
004010C6 C3                     ret

Figure 8: Same code as in Figure 7, but at a different location in a different file.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.