Channels ▼

Investigating Software and Source-Code Theft

(Part 1 of 2) Software is a slippery term with many potential definitions. The definition of software that we choose to accept, and that we allow attorneys and business managers to agree to in contracts and license agreements, often impacts the ability of information security professionals to prove that an intruder stole the organization's intellectual property rather than simply reverse engineer it.

Our own understanding of our software shapes, and often limits, our rights to protect and defend it from misuse or even outright theft. To explain to law enforcement why we believe a particular intruder was responsible for a particular intrusion, we often point at source code or what appears to be our misappropriated digital property in the possession of that third party. Unlike tangible property, a sequence of bits that belong to us is not very easy to prove ownership of or to explain to law enforcement.

If we have ever distributed the digital property to others, or if anyone alleges that we have, then the multiple-sources doctrine comes into play and possession of stolen bits in and of itself may still result in nothing more than a good cause of action for civil litigation. To deal decisively with digital theft, we must have a perfectly clear understanding of what it is that we own, why we own it, and the difficulty of keeping exclusive control of what we think of as our property.

It is clear from the earliest days of thought concerning software there was full recognition that information, both in terms of operation codes and in terms of variable and constant values, was inseparable from and critical to the operation of a computing machine (see sidebar, “The Definition and Origin of Software”). There has always been a distinction between data (values) and operation codes, yet a computer program is incomplete without both sources of input.

Software is thus both data and code. If we allow ourselves to treat our data as something other than the software that it actually is, then we may find it difficult or impossible to request law enforcement help in response to an intrusion where a theft has occurred. Intruders are sophisticated enough to know how to argue technical points of machine code versus source code and authorized access versus theft. We must be prepared with easy-to-understand historical explanations of the nature of our digital property rights that even a layperson can easily understand. This is particularly important in cases where the source code or software theft results in a derivative work that incorporates elements of value from the stolen property but conceals its true origin.

Asserting, and Proving, Theft of Bits

Modern microprocessors operate in much the same way as Babbage's Analytical Engine. As in Babbage's design, using punched “operation cards” and punched card “variables,” a microprocessor receives both code and data through a single mechanism of information storage and transmission. Only the sequence in which the information is strung together and presented to the microprocessor distinguishes operation codes from variable or constant data values. Sequence, or order, of information is then the core property of any software program, and the defining character that sets it apart from nonsoftware. Or, qualitatively speaking, order and content imparts unto software all of its value. This gives rise to two forensic questions: Where does software come from, and where does it end?

Knowing that software is both data and operation code that are input to a microprocessor in a particular order does not help us determine either where software came from or where it ends. It also does not help us determine whether portions of software were misappropriated by a particular suspect. To help answer the latter question, a computer forensic analyst could compare sequences of operation codes and data in search of similarities that should not be present between software owned by the plaintiff and the bits found in the possession of a suspect. For such analysis to possibly reveal evidence of infringing material, the forensic analyst must be reasonably certain that the entirety of the information, in its true and correct sequence, has been provided for analysis by the plaintiff, and that every residual data storage under the control of the suspect has been exhaustively searched for traces of stolen digital property. This can mean searching everything from network data storage services to cell phones, and of course every hard drive that can be found. All this searching and seizing is certain to reveal secrets that the suspect would rather keep hidden, but unfortunately for everyone, there are no real privacy rights with respect to data storage when accusations of wrongdoing are made.

As the burden of proof of theft or misuse falls squarely on the plaintiff, the computer forensic analyst who is presented with software by the plaintiff may presume that the plaintiff has produced the software in its entirety, as withholding software would be to the detriment of the plaintiff's case. However, the analyst must not presume that the plaintiff has provided its software in a true and correct sequence, else any party could make a claim of wrongdoing against any other party and produce as evidence a fraudulent software program that itself was derived through infringement of the suspect's own digital property. The burden of proof in such an abuse of process would then be on the defendant, who would have to show that the similarities detected between the two software programs were present because of bad acts and bad faith on the part of the plaintiff. To assist the court in preventing such abuse of process, it is necessary for the computer forensic analyst to independently ascertain that the software provided for analysis by the plaintiff is in fact the software that is purported to belong to that party, and that a copy has been produced, in connection with the legal proceeding, in its true and correct original sequence, for instance, as delivered in the past to customers or end users.

If software today were as simple as when Babbage contemplated his Analytical Engine, then a forensic analyst and the court would be faced with insurmountable difficulties. Any party who received a copy of computer software could easily concoct “proof” that they are the true author of that software, there being no difference between the software as it was written by the true author and a copy of the same. Fortunately, the process of creating software is now sufficiently complicated such that forensic proof of origin can exist. The best place to find it is in the work product produced by individual programmers; after all, we are the source of all copyright ownership and the providers of all creative value.

The next article will delve into these issues further. I think these two articles together will provide a valuable guide to those who are struggling to contend with a security intrusion that resulted in the theft of digital property or trade secrets. Law enforcement will expect help from the victim in order to understand the nature of the offense, and we must be prepared to teach them why we believe we were harmed in order to receive the investigative and law enforcement assistance we seek.

Jason Coombs <> works as a freelance computer forensic analyst and security incident response investigator. He also serves as a technical expert witness in civil and criminal court cases. Jason thinks he knows a thing or two about information security and forensics, but he may be mistaken; he may in fact be your typical corporate programmer geek with a slightly unusual résumé, which is mostly the result of a refusal to work in a cubicle and a desire to earn far more than he is probably worth.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.