Channels ▼


Forensic Data Validation and Integrity Logging

Forensic Data Validation and Integrity Logging

Integrity is an important feature of nearly every computer system. Threats that impact integrity are as important to defend against as are threats that permit unauthorized access to, or control over, these same systems. It does little good to achieve perfect security controls for a box that is so unstable and unpredictable that it is too risky to use for real work.

“Integrity” can refer to many things. There's your typical runtime integrity, where you'd like to know that software and hardware can be trusted to behave as you expect when they are used. And then there's the disaster-recovery sort of integrity where you need your hardware and software to cooperate with you when it comes time to rebuild an important resource. Other operating systems are written from the ground up to be recoverable in an emergency or disaster recovery situation, but Windows has always thrown exception situations at us that make recovery steps more difficult than they need to be. The only way to be reasonably sure that you'll be able to recover a broken or destroyed Windows box is to build a replica and intentionally destroy or disable it in disaster recovery tests. Then there's the all-important data integrity.

Consider the way that a corporate executive certifies the financial reports of a publicly owned, stock-market listed company. She didn't do the accounting herself—she receives prepared financial statements from the company's accountants, upon whom she relies for the accuracy of the data. The executive then signs her endorsement of the financial results of the company so they can be reported to regulators and to the investing public. It's simply not practical to presume that every number might have been compromised by an attacker. Once the data is in, the results are determined, and only a costly and time-consuming audit will uncover mistakes, fraud, or malicious tampering by an attacker.

The more complex a system becomes, the more we have to go back to the source of its inputs in order to certify the integrity of its outputs. Guaranteeing the integrity of the input in our computer systems must be a fully automated.

To automate something and retain its original integrity protections, we must also build forensic data validation and logging features into the automated system. You can't have just a little automation, otherwise you give up system integrity protections. Each input to an automated system must have a compensating self-validation procedure where the impact of each input on the system's output can be determined, with the help of automation, to have been what the source of the input expected its impact to be. Real world examples of this information-system security dynamic are all around us. For example, the ease with which we can now compare our purchases with near-real-time lists of charges on our bank and credit accounts enables us to spot financial crimes in as little as minutes. This is a necessary safeguard against the new threats created by increased automation of financial transaction processing. Without this safeguard, few people would trust modern day banking.

Many computer systems implement subjective definitions of integrity and accuracy. Data is often deemed to be accurate based solely on its presence in a system that is presumed to have no flaws or vulnerabilities. In other words, computers are presumed to be impervious to attack, and this presumption too often forms the basis of integrity validation, even when there are obvious alternatives that can be implemented in practice. Oddly, we tend to presume that, so long as we don't put garbage into a computer, we won't get garbage out. More importantly, we presume that if we've seen the computer behave with trustworthy, forensic-quality integrity in the past, that it will likely keep that quality of integrity in the future. We have not yet applied forensic integrity verification procedures to the two pillars of civilized society: democratic elections and the Domain Name System. Doing so would be relatively simple.

Data integrity in an electronic voting system means two things. First, what goes in is what comes out without loss or tampering. Second, the voters' selections are tabulated with close to 100 percent accuracy. To define "accuracy" in terms of what each voter intended to communicate requires some type of exception-handling mechanism built into the process. For privacy and practicality, a human election official can’t hunt down individual voters to clarify a vote after the polls have closed. So, in modern day elections we define "accuracy" in terms of the local bias for particular methods of subjective interpretation of election results, and we accept that election results will be impacted by software and hardware bugs, system failures, or varying technical methods employed by vendors of electronic voting equipment.

We could publish on the Web a list of every vote cast as it was counted. A unique ballot number could be printed on each voter's election receipt that can be used by any voter to validate the ultimate data throughput and impact of their vote. Provided that we accurately publish the vote, and defend the web site itself from attack and compromise, the database that reflects the actual polling results could be examined by each voter who contributed data to it. Any discrepancies could be corrected through an exception-handling procedure where the voter contacts election officials to dispute the way their vote was counted. In a valid election, there should be a statistically insignificant number of mistakes made, but every mistake that is detected must be corrected and the election results adjusted. The cause of each mistake could be investigated to find and solve technical problems, and if there are too many mistakes, the election could be invalidated and a new election scheduled when the bugs are worked out.

Elections should never miscount an individual voter's vote if it can be prevented. Though some votes must at times be discarded due to technical bugs, in principle everyone's vote has the same chance of being discarded, so the present system seems fair and reasonable. Without voter involvement in data validation of the election results, however, we presently have no way to find out if our vote was discarded or miscounted. It is possible, thanks to automation technology, for us to discover that we were excluded or miscounted and see that we are recounted. Once we have a data validation system in place to make this possible, we must ensure that a vote miscount correction process can't itself be attacked in order to influence outcome of the election. For this, the proper use of cryptography can help.

With proper use of cryptography (such as generating a secret key that will be saved along with each vote and encrypting the vote combined with random numbers) or salt (which guards against discovery of the encryption key or plaintext through cryptanalysis), ciphertext printed on each voter receipt can be considered definitive proof of the original vote cast by the voter. A workable dispute resolution process would thus be made possible with the help of cryptography, but only if each voter had reason to believe that selections made by them while at the polling place were communicated clearly and accurately to the polling device they used to cast their vote. Fraud would remain possible in the case where election officials want it to be possible because they could simply refuse to correct the way that voters' votes were counted based on an assertion that the ciphertext, when decrypted, revealed the original count to be accurate. But honest election officials would at least have the necessary forensic data validation procedures in place to detect attempts by outsiders to manipulate election results by surreptitious compromise of the electronic voting equipment used, and trusted, by voters.

The Domain Name System is another resource that needs better integrity. Compensating for flaws in DNS security is also possible through forensic analysis of data output performed by the source of the input. You know what IP addresses and other information you put into your DNS records, but do you know that this is the information that is coming out as DNS lookups occur in the real world? If any lookup ever results in anything other than the values you entered, then you have conclusive proof that there is a DNS spoofing, hijacking, or poisoning attack in progress. Why don't we have a system in place already, operated by the same people who operate the root nameservers and made available to everyone free of charge, that queries DNS servers around the world in order to verify that the data coming out matches the data that went in? Many defenses, some of them automated, become possible once such a ping of integrity is deployed for the entire DNS. Instead of working to build such defensive data integrity validation services for DNS, the IETF is pursuing a new standard known as DNSSEC, which many people have concluded to be so complex that it is impossible to deploy and manage safely. To learn more about DNSSEC, see

In the next newsletter I will present C# source code for an automated DNS attack detection technique that I call "DNS Pooling," which serves as a web-based man in the middle (MITM) countermeasure. The technique can be used as shown with Internet Information Services or adapted to work with any other web server on any OS platform. The idea is simple but effective: By dynamically generating the filenames of certain images embedded in a web page we can cause the URL path to the image to encode the IP address of the client making the request. The technique utilizes two or more DNS domains with different authoritative nameservers such that any MITM attack against both domains will, for a period of time, result in requests being received by the server that betray the presence of the MITM's IP address, which will be different from the true end-user's IP address.

MITM attacks that hijack, spoof, or poison just one of the DNS domains will always result in the MITM themselves, betraying their presence by sending requests to the web server from a different IP address than the request sent by the end user to the other DNS domain. Because of the way that DNS updates propagate for the two or more DNS domains from different authoritative nameservers, we can detect the simplest types of DNS attack. This would force attackers to instead compromise network routes, intercept and alter TCP/IP traffic to and from a victim node, or compromise the recursive nonauthoritative nameserver that the victim node trusts to provide all DNS lookups. These are more difficult attacks for many attackers to launch because they target specific network nodes and often require physical access to protected facilities. DNSSEC is supposed to cure all of our DNS security problems all the way out to the endpoints, but I personally still want backup countermeasures because DNSSEC is designed to failover to classic DNS when cryptography services are not available or unexpected errors occur.

Jason Coombs works as forensic analyst and expert witness in court cases involving digital evidence. Information security and network programming are his areas of special expertise. He can be reached at [email protected].

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.