Investigating Software and Source-Code Theft

In order to protect intellectual property, you must first be able to define what the property is.


July 19, 2005
URL:http://www.drdobbs.com/investigating-software-and-source-code-t/184406134

(Part 1 of 2) Software is a slippery term with many potential definitions. The definition of software that we choose to accept, and that we allow attorneys and business managers to agree to in contracts and license agreements, often impacts the ability of information security professionals to prove that an intruder stole the organization's intellectual property rather than simply reverse engineer it.

Our own understanding of our software shapes, and often limits, our rights to protect and defend it from misuse or even outright theft. To explain to law enforcement why we believe a particular intruder was responsible for a particular intrusion, we often point at source code or what appears to be our misappropriated digital property in the possession of that third party. Unlike tangible property, a sequence of bits that belong to us is not very easy to prove ownership of or to explain to law enforcement.

If we have ever distributed the digital property to others, or if anyone alleges that we have, then the multiple-sources doctrine comes into play and possession of stolen bits in and of itself may still result in nothing more than a good cause of action for civil litigation. To deal decisively with digital theft, we must have a perfectly clear understanding of what it is that we own, why we own it, and the difficulty of keeping exclusive control of what we think of as our property.

It is clear from the earliest days of thought concerning software there was full recognition that information, both in terms of operation codes and in terms of variable and constant values, was inseparable from and critical to the operation of a computing machine (see sidebar, “The Definition and Origin of Software”). There has always been a distinction between data (values) and operation codes, yet a computer program is incomplete without both sources of input.

Software is thus both data and code. If we allow ourselves to treat our data as something other than the software that it actually is, then we may find it difficult or impossible to request law enforcement help in response to an intrusion where a theft has occurred. Intruders are sophisticated enough to know how to argue technical points of machine code versus source code and authorized access versus theft. We must be prepared with easy-to-understand historical explanations of the nature of our digital property rights that even a layperson can easily understand. This is particularly important in cases where the source code or software theft results in a derivative work that incorporates elements of value from the stolen property but conceals its true origin.

Asserting, and Proving, Theft of Bits

Modern microprocessors operate in much the same way as Babbage's Analytical Engine. As in Babbage's design, using punched “operation cards” and punched card “variables,” a microprocessor receives both code and data through a single mechanism of information storage and transmission. Only the sequence in which the information is strung together and presented to the microprocessor distinguishes operation codes from variable or constant data values. Sequence, or order, of information is then the core property of any software program, and the defining character that sets it apart from nonsoftware. Or, qualitatively speaking, order and content imparts unto software all of its value. This gives rise to two forensic questions: Where does software come from, and where does it end?

Knowing that software is both data and operation code that are input to a microprocessor in a particular order does not help us determine either where software came from or where it ends. It also does not help us determine whether portions of software were misappropriated by a particular suspect. To help answer the latter question, a computer forensic analyst could compare sequences of operation codes and data in search of similarities that should not be present between software owned by the plaintiff and the bits found in the possession of a suspect. For such analysis to possibly reveal evidence of infringing material, the forensic analyst must be reasonably certain that the entirety of the information, in its true and correct sequence, has been provided for analysis by the plaintiff, and that every residual data storage under the control of the suspect has been exhaustively searched for traces of stolen digital property. This can mean searching everything from network data storage services to cell phones, and of course every hard drive that can be found. All this searching and seizing is certain to reveal secrets that the suspect would rather keep hidden, but unfortunately for everyone, there are no real privacy rights with respect to data storage when accusations of wrongdoing are made.

As the burden of proof of theft or misuse falls squarely on the plaintiff, the computer forensic analyst who is presented with software by the plaintiff may presume that the plaintiff has produced the software in its entirety, as withholding software would be to the detriment of the plaintiff's case. However, the analyst must not presume that the plaintiff has provided its software in a true and correct sequence, else any party could make a claim of wrongdoing against any other party and produce as evidence a fraudulent software program that itself was derived through infringement of the suspect's own digital property. The burden of proof in such an abuse of process would then be on the defendant, who would have to show that the similarities detected between the two software programs were present because of bad acts and bad faith on the part of the plaintiff. To assist the court in preventing such abuse of process, it is necessary for the computer forensic analyst to independently ascertain that the software provided for analysis by the plaintiff is in fact the software that is purported to belong to that party, and that a copy has been produced, in connection with the legal proceeding, in its true and correct original sequence, for instance, as delivered in the past to customers or end users.

If software today were as simple as when Babbage contemplated his Analytical Engine, then a forensic analyst and the court would be faced with insurmountable difficulties. Any party who received a copy of computer software could easily concoct “proof” that they are the true author of that software, there being no difference between the software as it was written by the true author and a copy of the same. Fortunately, the process of creating software is now sufficiently complicated such that forensic proof of origin can exist. The best place to find it is in the work product produced by individual programmers; after all, we are the source of all copyright ownership and the providers of all creative value.

The next article will delve into these issues further. I think these two articles together will provide a valuable guide to those who are struggling to contend with a security intrusion that resulted in the theft of digital property or trade secrets. Law enforcement will expect help from the victim in order to understand the nature of the offense, and we must be prepared to teach them why we believe we were harmed in order to receive the investigative and law enforcement assistance we seek.


Jason Coombs <[email protected]> works as a freelance computer forensic analyst and security incident response investigator. He also serves as a technical expert witness in civil and criminal court cases. Jason thinks he knows a thing or two about information security and forensics, but he may be mistaken; he may in fact be your typical corporate programmer geek with a slightly unusual résumé, which is mostly the result of a refusal to work in a cubicle and a desire to earn far more than he is probably worth.


The Definition and Origin of Software

The Definition and Origin of Software

Some experts argue that software is a collection of machine-code instructions that are executable by a microprocessor. To determine whether this definition is the most reasonable one for our software, it is necessary to understand the origin of software that is comprised solely of machine-code instructions.

The first mechanical computing machine, the Pascaline, was built by Blaise Pascal [1] in 1645 using gears and mechanical means to perform simple calculations. The Pascaline, like Leonardo DaVinci's earlier calculating machine design, did not rely on programming instructions but rather was a specialized tool for adding, subtracting, multiplying, and dividing in response to the purposeful rotation of gears by the machine's operator.

The first computing machine proposed that incorporated the notion of repeatable program instructions (termed “operation cards” by its inventor) was the Analytical Engine [2] by Charles Babbage during the 1830s. The Analytical Engine would be fed punched cards containing operation instructions and information (“variables”) also punched onto cards. A system “attendant” would preset constant values required for use by a formula, thus:

From writings of Charles Babbage we are given the following:

“When any formula is required to be computed, a set of operation cards must be strung together, which contain the series of operations in the order in which they occur. Another set of cards must then be strung together, to call in the variables into the mill, the order in which they are required to be acted upon.

The Analytical Engine is therefore a machine of the most general nature. Whatever formula it is required to develop, the law of its development must be communicated to it by two sets of cards. When these have been placed, the engine is special for that particular formula. The numerical value of its constants must then be put on the columns of wheels below them, and on setting the Engine in motion it will calculate and print the numerical results of that formula.

Every set of cards made for any formula will at any future time recalculate that formula with whatever constants may be required.”[3]

[1] “La machine d'arithmtique Blaise Pascal's Calculating Machine (1645).” Last accessed on September 25, 2004: http://www.fourmilab.ch/babbage/pascal.html

[2] “The Analytical Engine The First Computer.” Last accessed on September 25, 2004: http://www.fourmilab.ch/babbage/

[3] “OF THE ANALYTICAL ENGINE” Chapter VIII of Charles Babbage's 1864 autobiography, “Passages from the Life of a Philosopher.” Last accessed on September 25, 2004: http://www.fourmilab.ch/babbage/lpae.html

Return to Article


Jason Coombs <[email protected]> works as a freelance computer forensic analyst and security incident response investigator. He also serves as a technical expert witness in civil and criminal court cases. Jason thinks he knows a thing or two about information security and forensics, but he may be mistaken; he may in fact be your typical corporate programmer geek with a slightly unusual résumé, which is mostly the result of a refusal to work in a cubicle and a desire to earn far more than he is probably worth.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.