Secure Internet Voting with Perl
TPJ Issue #20
As I write this, canvassing boards in Palm Beach and Broward Counties, Florida, are desperately trying to get manual recounts done in time for a court-imposed deadline. Countless lawsuits and countersuits launched by the Republican and Democratic adversaries are in various stages of adjudication, citizens are up in arms because they feel they have been disenfranchised by poorly-designed "butterfly ballots" and other election day mistakes, the Florida legislature has threatened to take the election into its own hands, and the U.S. Supreme Court has just agreed to add its considerable weight to the fracas.
All this because many of Florida's counties ballot using the Hollerith punchcard, an antiquated balloting technology whose main virtue is its low price tag. On the nightly news, election officials discourse gravely on "chads," those little bits of paper that some voters have trouble dislodging. We hear of pregnant chads, hanging chads, and dimpled chads. One election board volunteer is even accused of having eaten the chads that dropped out of the ballots he was handling.
Is this any way to run an election? I don't think so. Fortunately, Perl can help rebuild democracy, and in this article I'll show a simple Perl-based framework for a secure Internet-based balloting system.
About Secure Elections
Much of the inspiration for this article comes from Bruce Schneier's magnum opus Applied Cryptography and specifically from section 6.1, "Secure Elections." As explained there, the guiding principles of a secure election are to maintain privacy and prevent cheating. Schneier lays out six minimal requirements for a good election protocol:
- Only authorized voters can vote.
- No one can vote more than once.
- No one can determine for whom anyone else voted.
- No one can duplicate anyone else's vote.
- No one can change anyone else's vote without being discovered.
- Every voter can verify that his vote has been taken into account in the final tabulation.
Conventional paper ballots satisfy requirement 1 by voter registration, a process that ensures that only American citizens of a certain age can vote. Requirements 2 and 4 are satisfied by crossing the registered voter's name off a list when the voter enters the polling location, and 3 is satisfied by using an anonymous ballot. The last two requirements, however, are not completely satisfied by paper ballots, and are the source of much of the uncertainty and accusations in the current election fiasco.
Schneier's book describes several cryptographic protocols which meet these six requirements for secure elections. Some of them are quite elaborate and require new software at the voter's side of the connection. In this article, we will use one of the simpler ones that happens to be well suited for web-based voting.
This protocol requires two independent central facilities to work, called the CEA and the CLA. The CEA is the Central Enumeration Agency (although the Bush camp might call it "Chad Eaters Anonymous"). It is responsible for collecting and tallying ballots, and publishing the results on election night. The CLA is the Central Legitimization Agency (or "Controlling Legal Authority" in Gore-speak). It is responsible for registering and credentialing voters.
Here's how it works:
- Before the election, the CLA supervises voter registration. Each registered
voter is issued a Voter Registration Number (VRN), which is simply a large
random number. VRNs are issued electronically, for example, by email or floppy
(see Figure 1).
- The CLA maintains a list of all VRNs, and a list of who VRNs were issued
to, in order to prevent someone from registering twice. There is no record
of who a particular VRN was issued to.
- Prior to election day, the CLA sends the CEA the list of VRNs.
- On election day, the voter sends in an electronic ballot that contains
his choices for elected office. The ballot contains his VRN from step 1.
- The CEA checks that the VRN is valid, and crosses it off the list in order
to prevent someone from voting twice.
- The CEA generates a large random confirmation number (CN) for the voter, and uses it to enter the voter's choices into the vote tally.
- The CEA returns the CN to the voter.
- After all votes have been received, the CEA publishes the outcome, along with the lists of CNs and for whom their owners voted.
Figure 1. The voter registration number.
Dear Registered Voter, Attached is your voter registration number. PLEASE KEEP A COPY OF THIS E-MAIL; YOU WILL NEED IT IN ORDER TO VOTE. DO NOT BEND, FOLD OR SPINDLE. --REGISTRATION-START-- 8334290304185989711211356 0784480602526249967749323 7190985476311629502773466 9927297350189474428770582 --REGISTRATION-END--The privacy of the ballot is ensured by the separation of the CLA and the CEA. One facility knows the identity of the voters, but not who they voted for. The other has access to their vote, but not their identity. This protocol discourages election fraud in a number of ways. If a registered voter tries to vote twice, he will be caught in step 5. We can discourage non-registered voters from trying to guess valid VRNs by using large random numbers (in this example, we use 100-digit numbers). If the CEA itself tries to cheat by stuffing the ballot box or "losing" ballots, this can be detected in step 8. By publishing a list of CNs and their votes, the protocol allows voters to check the list to make sure that their votes were tallied correctly. There are still ways to defraud this protocol. For example, the CEA and CLA can collude to figure out the identity of a voter; if voting is being done over the web, the CEA can use the voter's IP address to figure out who he is. The first of these problems can be addressed by election auditors and statutory law. The second can be addressed using proxy servers or by having the balloting take place in central polling places equipped with ATM-like web browsers. It is also important to note that the protocol described here is a slight departure from the one described by Schneier. The original protocol is built along an email model, in which the voter himself generates the CN, rather than letting the CEA do it for him. The Database Schema Tallying votes is a task for a database, and this application uses MySQL to do the heavy lifting. Listing 1 shows the schema used by the CEA's database. In addition to managing the vote tally itself, the information in the database is used to generate the ballot on the fly. This avoids having to design a new web page for each ballot, and discourages election officials from coming up with butterfly ballots and other innovative designs. The schema has six tables:
- party. This table lists political parties. Each party has a unique ID, numbered from 1, and a short description, such as "Republican Party".
- office. This table lists the offices that are up for grabs. Each office has a unique ID and a short description like "Town Dogcatcher".
- registration. This table lists valid VRNs. Each VRN has a small integer associated with it that indicates whether the VRN has been used in the current balloting.
- candidate. This table lists infor-mation about the candidates in
the current election. Each candidate has a unique ID as well as fields that
describe who he is and what he's running for. The
party_id
andoffice_id
fields describe the candidate's party affiliation and the office he is a candidate for. Thefirst_name
andlast_name
fields are self-evident. AUNIQUE
constraint ensures that there cannot be two candidates from the same party running for the same office. The same candidate can, however, run for two different offices, which should make Senator and/or Vice President Lieberman happy. - tally. This table keeps track of the vote tally. The
voter_id
field corresponds to the voter's CN ballot confirmation number (not the VRN). Thecandidate_id
contains the voter's choice for candidate, and office_id contains the office the voter wants to place him in. AUNIQUE
constraint ensures that a voter can only vote once for a given office. - writein. This table keeps tabs on write-in candidates.
generate_vrns.pl
script generates the number of VRNs requested
on the command line. The part that does all the work is the subroutine random_digits()
between lines 12 and 19. It is critical to use a good random number generator
to generate VRNs; otherwise, valid VRNs would be too easy to guess. The Math::TrulyRandom
module (available on CPAN) promises to do this, but it hangs on my Linux system.
Instead, I use the /dev/urandom device, which uses a Linux kernel driver that
generates random data from non-deterministic system information, such as interrupts.
We read the requisite number of random bytes from the device, and then transform
them into a set of base 10 digits.
The enter_vrn.pl
script shown in Listing
4 takes a list of VRNs generated by the previous script, and enters them
into the CEA database using the DBI module and its DBD::mysql driver.
Line 5 connects to the CEA database or dies with an error message. Line 6
prepares a SQL INSERT
statement that will add VRNs to the registration
table. The code between lines 8-14 loops through the list of VRNs one paragraph
at a time, extracts the VRN information, and calls the SQL statement's
execute()
method to perform the insertion. After the last VRN is
processed, we call finish()
to close the SQL statement, and disconnect()
to disconnect from the database.
The E-Ballot
The fun part is the electronic ballot generated by the CEA. Figure
2 shows how it looks to the voter. There are three steps to voting. In the
first step, the voter makes his choices by selecting radio buttons in the ballot.
Each candidate is sorted by his party and the office he is running for. There
is also a text field that allows for write-ins.
In step 2, the voter enters his VRN. He can do so by cutting and pasting the
VRN into a large text field, or by uploading a file provided by the CLA that
contains the VRN.
When satisifed, the voter presses the VOTE button. The CEA checks that the
ballot is filled out, that the voter hasn't voted twice for candidates
for the same office, and that the VRN is valid and has only been used once.
If these checks are satisified, the CEA enters the voter's choices into
the database, generates a CN, and displays the confirmation to the user (Figure
3).
Later, when the votes have been tallied and published, the user can go to
the published results and make sure that his vote was correctly counted.
Of course, for the e-ballot to be at all secure, all communication between
the voter and the CEA's web site must use SSL, and the voter should be
instructed to check the web site's SSL certificate to make sure that it
is valid.
The vote
Program
The e-ballot is implemented by a conventional CGI script shown in Listing
5. The listing is a bit long, but most of it is fancy formatting in the
ballot section. We'll step through it a line at a time.
Lines 0-4: Load modules. We turn on taint checking and Perl warnings.
Taint checking ensures that we will be prevented from doing anything stupid
with user-supplied input, such as passing it to shells, and warnings alert us
of uninitialized variables and the like. We load the CGI and DBI modules. One
trick to notice is that among the symbols loaded from CGI are *table
,
*Tr
, and *dl
. The asterisk means to automagically
generate functions to start and end the corresponding HTML tags, such as start_table()
to generate a <TABLE>
tag and end_table() to generate a </TABLE>
tag.
Lines 5-7: Define constants and globals. We define a constant for the
length of a valid VRN, and declare globals that will hold various information
about the ballot. @CANDIDATES
is a two-dimensional list of candidates,
in which the first dimension is the candidate's party affiliate and the
second dimension is the office the candidate is running for. The values of this
array are candidate ID's. Each of @PARTIES
, @OFFICES
,
and @CANDIDATE_NAME
map from database IDs into human-readable labels.
To adjust for the fact that 1 is the lowest ID used in the CEA schema, we adjust
each of the indexes by 1. For example, the party_id
for the "Republican
Party" is 1, and so it can be found in @PARTIES
at $PARTIES[0]
.
We set the PATH environment variable to a safe known value in order to satisfy
Perl's taint-check requirements.
Lines 8-9: Connect to the database. We call the DBI->connect()
method to connect to the CEA database running on the local machine. We don't
use any password authorization here, but in a real application we would want
to.
Lines 10-20: Start the page. We emit the standard HTTP header by calling
the CGI module's header()
function, and start the top of the
HTML page by calling start_html()
and h1()
to generate
HTML boilerplate and a level-one header.
Lines 21-22: Initialize globals. We call get_globals()
to initialize the four global variables that describe the current ballot. get_globals()
will make the appropriate database calls to do this.
Lines 23-26: Handle a submitted ballot. We call the CGI module's
param()
method to look for a CGI field named VOTE. If such a field
exists, then it indicates that the user has submitted his ballot by pressing
the VOTE button. We immediately call a subroutine named validate()
,
which checks that the ballot is filled out correctly. If the ballot checks out,
it returns the user's VRN, and we pass the VRN to a subroutine named enter_ballot()
that adds the information to the growing tally.
Lines 27-30: Generate a new ballot. Otherwise, we call generate_ballot()
to create the ballot that the user sees on the page. This subroutine will be
called the first time the user loads the page, as well as when the validate()
subroutine detects an error in a previously-submitted ballot.
Lines 31-34: Finish up. We call end_html()
to generate
the boilerplate at the bottom of the HTML page, and disconnect from the database.
We then exit.
Lines 40-60: get_globals()
subroutine. This subroutine
is responsible for loading the global variables with information about the current
election. For efficiency's sake, it fetches all the information it needs
in a single large SQL statement that retrieves each of the candidates, their
party and office IDs, and the human-readable labels for candidates, offices,
and parties.
We do this by passing the appropriate SQL statement to the database's
prepare()
method, and then executing the resulting statement handle.
We then loop over each row of the returned table, populating the @CANDIDATES
,
@CANDIDATE_NAME
, @PARTIES
, and @OFFICES
arrays as we go. Notice how we offset each ID by one in order to use it as an
array index.
Lines 61-71: The generate_ballot()
subroutine. This subroutine
is responsible for generating the HTML for the ballot. It calls the CGI module's
start_multipart_form()
function in order to start a fill-out form.
We use this function rather than the more common start_form()
because
we will be accepting a file containing the voter's VRN for upload, and
only the multipart-style form can accept file uploads.
We then call three functions to generate the top, middle, and bottom of the
form, and call end_form()
to generate the form's closing tag.
Lines 72-104: The voting_matrix(
) subroutine. This large
subroutine generates the table that shows the ballot information. Don't
be intimidated. The part of the subroutine that does all the work is just two
nested loops. The outer one loops through parties, which become the rows of
the ballot, and the inner one loops through offices, which become the columns.
Within the inner loop, we check whether @CANDIDATES
contains a
candidate for the current party affiliation and office. If there is, we generate
a radio button whose name is the index into @OFFICES
and whose
value is the candidate ID. For the label we use the human-readable version of
the candidate's name, derived from @CANDIDATE_NAME
.
After creating the radio buttons for standard candidates, we create a series
of write-ins, one for each office. These are text fields with the name "writein
X", where X is the ID of the corresponding office.
Lines 105-116: The registration_number()
subroutine. This
subroutine generates the section of the ballot that prompts the user for his
VRN. There are two fields that can be used. One is a large text area named registration_id
,
where the user can cut and paste his VRN. The other is a file field named registration_file
,
which the user can use to upload a text file containing the VRN.
Lines 117-123: The vote()
subroutine. This subroutine
generates a single HTML submission button labeled "VOTE". This concludes
the portion of the script that generates the e-ballot.
Lines 124-165: The validate()
subroutine. This is responsible
for validating the voter's submission. There are several checks on the
integrity of the ballot. First we check for the easy things: whether the VRN
has been filled in, and whether any of the radio buttons in the ballot have
been selected (we only require a minimum of one office to be selected; it's
perfectly valid for the user to vote for some offices and enter no selection
for others).
We now check for more subtle problems. Lines 134-138 verify that each office
has exactly 0 or 1 votes. Although the fill-out form only allows a single candidate
from each office to be selected, a malicious voter could roll his own fill-out
form and try to vote for multiple candidates from the same office.
Having passed these checks, we recover the VRN. If the registration_file
field is present, then the user has chosen to upload a file. We call the CGI
module's param()
function to recover a filehandle for the
uploaded file, and parse out its contents. Otherwise, if the registration_id
field is present, we use its contents to recover a cut-and-pasted VRN.
Having recovered the VRN, we ensure that it is valid. First, we check that
the VRN is the correct length. If so, we consult the database to see whether
the VRN is in the registration table, and whether it is still unused. If both
these tests pass, then we declare that the submitted ballot is valid and return
the VRN to the caller.
When any of the tests fails, it calls a utility subroutine named error()
.
The error()
function displays a bold red error message on the top
of the page, and prompts the user to make corrections and try again.
Lines 166-205: The enter_ballot()
subroutine. The enter_ballot()
subroutine is where the information from the ballot is collected and entered
into the database, registering the voter's intent and keeping the sacred
flame of Democracy alight.
The first thing that we do is to update the database in order to mark the
VRN as used. This prevents the VRN from being used again. We do the update in
a way that will cause it to fail unless the VRN is currently marked as unused,
and avoids an attack based on race conditions while updating the database.
We now generate a confirmation number for the ballot by caling random_digits()
.
We use the newly-generated ID to generate two SQL insert statements, one for
regular candidates, and the other for write-ins. Each statement uses "?"
as placeholders for the office and candidate fields.
We now enter the voter's choices into the database, simultaneously generating
a confirmation page as we do so. We loop over the @OFFICES
array,
looking for CGI parameters corresponding either to a regular candidate for the
office or to a write-in. If we find a write-in, we recover it and insert it
into the writein table using the appropriate insert statement. Otherwise we
insert the voter's choice into the tally table. Notice how we add 1 to
the office index in order to convert it back into the 1-based ID used in the
MySQL tables.
Each time through the loop, we print out the office and the selected candidate,
using a definition list (<DL>) style HTML list.
At the end of the subroutine, we finish()
both SQL statements,
and then print out a nicely-formatted version of the voter's ballot confirmation
number.
Lines 209-223: Utility subroutines. We've already seen the random_digits()
subroutine. The error()
subroutine takes its arguments and incorporates
them into an HTML paragraph, using a red font and a large font size. The subroutine
explicitly returns undef, which allows this type of idiom in the caller:
return error('Please stop munching chads and start punching ballots') unless $is_valid;Tallying the Votes
On election night, tallying the vote is simply a matter of issuing a SQL statement to add up each candidate's counts and grouping the results by office. Here's one that will do the trick:
By this count Ogden Nash deserves to be the next President, Timothy O'Leary next State Senator, and Morticia Addams the next Dogcatcher. A definite improvement over this year's choices!
_ _END_ _
Lincoln D. Stein wrote CGI.pm.