Implementing Audio CAPTCHA

David uses sound to make CAPTCHA an equal opportunity security device.


December 10, 2007
URL:http://www.drdobbs.com/security/implementing-audio-captcha/204800632

David works as a musician, writer, and software engineer (not necessarily in that order). He can be contacted at www.summersong.net.


"Knock, knock, knock."

"Who is it?"

"It's me, Dave. Open up..."

When Cheech and Chong performed this routine in the 1970s, it became as popular as Abbot and Costello's "Who's on First?" was with a previous generation. While "Dave" was negotiating the possibility of opening the door, they both knew they were talking to human beings. But when considering if you will "open the door" to your blog comments or other feedback, you aren't as lucky. In fact, more often than not, you may be dealing with a machine asking you to open up.

You've probably seen the disaster areas left in the wake of a robot attack on a blog or other web feedback portal, when no filtering mechanism was used in the comments section. It's not a pretty site. Excessive blog spam can drive away readers as well as bring undesirable Google associations with your name.

In an even more important area, protecting personal information on banking, employee databases, and other sites, it's imperative for webmasters to determine if they are dealing with human beings. The usual method for accomplishing this is some form of CAPTCHA, short for "Completely Automated Public Turing Test to Tell Computers and Humans Apart" (www.captcha.net), which requires you to read some distorted text like that in Figure 1, then correctly enter the text sequence into a dialog box. Reading the text is (theoretically) easy for you, but difficult for a computer.

Figure 1: A typical CAPTCHA.

I think of a CAPTCHA as a lock that should be easy for humans to open, but hard for machines to decipher. Certainly, locking your car or house is not a foolproof way of stopping bad guys from doing their worst. The idea of a lock or alarm is really to make it harder for someone to mess with your stuff—so hard that they will move along. This is really the practical goal for a CAPTCHA as well.

Do You See What I See?

Watching Esther Williams tread water in those old 1940s movies is a cozy way to pass a rainy afternoon. But trying to decipher a computer screen with similarly swirling syllables is nobody's idea of a swank soiree. An audio CAPTCHA, on the other hand, is an attractive alternative to its visual counterpart.

Some people may be sufficiently turned off at the thought of navigating yet another visual CAPTCHA that they turn away from your site. Providing an alternative that may be easier for some people to navigate may prove an incentive for more people to unlock the virtual door to your site feedback.

Another reason for considering alternatives to visual CAPTCHA mechanisms involves a government regulation called "Section 508," officially known as "The Federal Electronic and Information Technology Accessibility and Compliance Act."

Section 508 (www.section508.gov) says that when Federal agencies develop or use electronic information, they are required to make that information available to everyone equally. In the case of CAPTCHA, this means that people who are unable to read a CAPTCHA should have access to an alternate entry mechanism. At least, that's my translation of the regulation's government-ese.

To date, Section 508 requirements apply only to sites with financial ties to the federal government. However, it makes good sense, from a business as well as a human standpoint, to provide accessibility alternatives for people with different physical challenges. Think about implementing these alternatives on your personal website as well as the business sites you develop.

Sound Advice

People who are sight challenged would naturally be an audience for an audio CAPTCHA. With this in mind, any audio CAPTCHA implementation should be constructed such that it's easy to use by people who use a screen reader.

You can test how well your audio CAPTCHA works for someone who is using a screen reader by using the "Microsoft Narrator" that comes with Windows. You turn this feature on in Windows XP by selecting Start|All Programs|Accessories|Accessibility|Narrator.

Any instructions that go with your CAPTCHA should be concise. Someone using a screen reader isn't going to want to have it reread some overly verbose instructions.

The implementation I present here can be used without employing the mouse at all. This feature is especially handy for someone using screen readers. I ask users to type a particular key to start the audio. Then once the audio has started, I move the focus to the proper text box and submit for validation upon getting the Enter key.

When implementing an audio CAPTCHA, it is still necessary to disguise the challenge in some way from robots. However, there are ways to deceive these machines that are much less intrusive and easier for a human to deal with than using pictures of distorted letters.

Some audio CAPTCHAs currently in use attempt to foil audio deciphering robots by obscuring the audio. To me, this defeats some of the purpose. Like the aforementioned swirling syllables, when you obscure audio, you make it harder for humans to understand as well as any mechanical facsimiles.

Instead of adding background noises or any similar muddiness to the sound, try adding some simple aural logic that a machine would find difficult to parse. In my implementation, I ask for four numbers. The challenge starts with a simple instruction "Please enter these four numbers...," then speaks the four random numbers (Figure 2). Available at www.ddj.com/code is a series of MP3 audio files for the numbers 0-9.

Figure 2: The Simple UI.

A nice addition, that should help to obscure the challenge from robots, would be to randomly include a phrase between two of the numbers like, "not 7, but instead a." So, for example, instead of hearing, "Please enter these four numbers—4,3,6,2," the user hears "Please enter these four numbers—4,3,6, not 7 but instead 2".

Naturally, in this example you would have to add logic to ensure that the fourth number asked for was not a 7. Other methods might include simply adding phrases like "and a," "then press Enter," and so on. The idea here is to add just enough audio to fool the robots without confusing or frustrating users.

Roll Tape

When I created my audio CAPTCHA, my goals were to keep things simple by using only a small amount of code with no outside dependencies. I also wanted to keep the instructions and user interaction as simple as possible. A solution using PHP seemed best for this, with some JavaScript to handle browser independence and keyboard input.

The main file, index.htm (Listing One), starts by setting a PHP session with a call to session_start(). Sessions are used in PHP as a persistence mechanism, letting data be passed from one page to another. (For more information on PHP sessions, see us2.php.net/session).

After setting the session, the PHP script loads an array with four random numbers, from 0 to 9. These numbers serve as the basis for the randomly generated audio and are used to verify the subsequent user entry. The four digits are then concatenated together and stored in the $_SESSION global array variable. Here I make use of PHP's associative array feature, naming this array index "captchaAnswer".

<?php
session_start();

// the 4 random numbers 
for($i = 0; $i < 4; $i++)
    $fileName[] = GetRand();
// Get a random number from 0-9 
function GetRand()
{
    return rand(1,10)-1;
}
$_SESSION['captchaAnswer'] = $fileName[0] . $fileName[1] . $
    fileName[2] . $fileName[3];
?>
<HTML>

<HEAD>
<title>Audio CAPTCHA Test</title>
</HEAD>
<BODY bgcolor="lightblue" onload="Init()">

<SCRIPT language="JavaScript">

// Init the window, by clearing any input from a former try
function Init()
{
    document.getElementById('userAnswer').value = "";
}
var onWindows = true;
var onExplorer = true;
if(navigator.platform.indexOf("Win") == -1)
{
    onWindows = false;
}
if(navigator.appName != "Microsoft Internet Explorer")
{
    onExplorer = false;
}
// grab user key press
document.onkeyup = KeyCheck;       
function KeyCheck(e)
{
    /* for browser independence, IE uses window.event, 
       Firefox passes event */
    var keyPressed = (window.event) ? event.keyCode : e.keyCode;
    var hotKey = 80; // p key
    if(keyPressed == hotKey)
    {
        document.getElementById('userAnswer').value = "";
        /* embed for IE, Media player won't show
        iframe for FireFox, embed doesn't play dynamic file */
        if(onWindows && onExplorer)
        {
            document.getElementById('mediaPlayer').controls.play();
        }
        else
        {
            document.getElementById('writeHere').innerHTML = 
     '<iframe src="./PlaySound.php" width="0" height="0"></iframe>';
        }
        document.getElementById('userAnswer').focus();
    }   
 }
/* if using IE on Windows, use an embedded Windows Media Player */
if(onWindows && onExplorer)
{
    document.write('<OBJECT id="mediaPlayer" \
    CLASSID="clsid:6BF52A52-394A-11D3-B153-00C04F79FAA6" \
    TYPE="application/x-oleobject" width="0" height="0"> \
    <param name="url" value="./PlaySound.php" /> \
    <param name="autoStart" value="false" /> \
    </OBJECT>');
}
</SCRIPT>
<br>
<!-- the form for instructions and user entry -->
<form method="POST" action="CaptchaSubmit.php">
Press the P key to play the sound file. 
<br>Then type the 4 numbers you hear and press Enter.
<br><input type="text" id="userAnswer" 
    name="userAnswer" maxlength="4" size="4"/>
</form>

<!-- a place to write the frame information when not on IE/Windows-->
<div id="writeHere" style="visibility:hidden"></div>
</body>
</html>
Listing One

When the page is loaded, a JavaScript Init() function is called. This simply makes sure that the text box is free of any previous entry. The next two functions determine the user's browser and OS. If the user is running Internet Explorer under Windows, I want to use an embedded Windows Media Player to play the sound files.

A JavaScript function KeyCheck() is called when the browser gets an onkeyup message. I use the key up rather than the key down because the key down can be generated more than once if the user holds a key down for a certain length of time.

KeyCheck() gets the key code, the key that was pressed and released by the user. It gets this code via either the window.event that is built into IE or the passed-in variable that will be present in Mozilla-based browsers.

I chose to use the P key to start the audio. There is no need to check for a lowercase P because the browser does "case folding," meaning any lowercase characters are converted to uppercase. I could have used almost any key here. I wanted to use the Spacebar, because that is what's used in many audio software packages for starting or stopping playback. But the Spacebar is already a browser hot key for page down, so I settled for P as in "Play."

When the P key up is detected, KeyCheck() clears any data that may be in the text box. Then it plays the audio file in one of two ways depending on the browser and the OS. If under IE/Windows, the embedded Windows Media player is used. The Media Player has an object model that allows script control. Calling the Player's controls.play() starts the Media Player.

If the user is not on IE/Windows, the script uses whatever the user has set as the default media player. When I tested this, I found that when I used the HTML embed tag, the dynamically created sound file did not play. Using the iframe tag instead, with a 0 size frame, plays the file.

When the audio has started, I shift the focus to the input text box. This sequence— activating the audio on the P key hit, shifting the focus to the input text box, and accepting the Enter key as a signal that the entry is complete—allows for a mouse-free user experience.

Below KeyPress() is code that embeds the Windows Media Player if the user is on IE/Windows. The url parameter is given the value of PlaySound.php. This is the file that generates the audio. Also, autoStart is False so the audio won't just start when the page is first loaded. You may want to change this value to True depending on how you are presenting the CAPTCHA.

After the JavaScript section comes the form data. The form takes the user input for the CAPTCHA and calls the CaptchaSubmit.php file to verify the input data. Lastly, I include a div section to contain the HTML frame that may be generated in KeyPress().

They're Playing Our Song

The PlaySound.php script (Listing Two) uses PHP to dynamically form an MP3 stream by concatenating shorter MP3 files. The PHP header() function writes out the content-type as MPEG. Then, I start the stream with a short introduction that simply says, "Please enter these four numbers." This is the Intro.mp3 file, which starts the stream.

The script then retrieves the four numbers from the $_SESSION array. The numbers are stored in the array as a single string, so the script pulls the string apart, storing the numbers in a filename array. Then, for each number, the corresponding MP3 file is added to the stream. This is the section where you may want to include some additional words.

Finally, the stream is printed to standard out and, thanks to the content-type header, is interpreted by the browser to be an MP3 file. Naturally, there are many different ways to play audio on the Web. I chose MP3 because it's easy to concatenate separate files due to the fact that there is no header information to be concerned with as there is with .wav files. You may want to experiment with different file types for your own CAPTCHA, perhaps Flash for example.

<?php 
session_start();
header('Content-type: audio/x-mpeg');
/* put an intro in first */
$thisFile = "Intro.mp3";
$fh = fopen($thisFile, 'r');
$allAudio = fread($fh, filesize($thisFile));
fclose($fh);

/* add the numbers to the file */
$str = $_SESSION['captchaAnswer'];
/* this works under PHP4,
can use str_split() here instead in PHP5 */
$fileName = array();
for ($i = 0; $i < strlen($str); $i++) 
    $fileName[] = substr($str,$i,1);
foreach ($fileName as $fileNumer)
{
    $thisFile = $fileNumer . ".mp3";
    $fh = fopen($thisFile, 'r');
    $allAudio .= fread($fh, filesize($thisFile));
    fclose($fh);
}
/* output the resulting MP3 */
print $allAudio;
?>
Listing Two

The last file, CaptchaSubmit.php (Listing Three) serves to check user input and outputs a simple message to indicate success or failure. CheckAnswer() first checks to see if anything was posted at all and if the PHP session is set. Then, it checks the user posted answer against the challenge initialized in the index.htm file, returning True or False (1 or 0) as is appropriate.

<?php

session_start();

header('Content-type: text/html');

// true if correct answer
function CheckAnswer()
{
  if (!isset($_POST['userAnswer']) || !isset($_SESSION['captchaAnswer']))
    return 0;
  else if ($_POST['userAnswer'] == $_SESSION['captchaAnswer']) 
    return 1;
  return 0;  
}
print "<html><head></head><body>";
print $_POST['userAnswer'];
if(CheckAnswer())
    print " is correct.";
else
    print " is not correct, please go back and try again.";
print"</body><html>";
?>
Listing Three

You will want to replace the lower half of this script, that outputs success or failure, with your own success criteria, probably a redirect back to your blog or other media page.

Alternate Audio

The solution I've outlined is simple and sonically kind of bare bones. If you have a website that people come to for entertainment, why not try to make any necessary CAPTCHA somewhat entertaining, or at least topical, to your website?

For example, if you have a site that is frequented mostly by musicians, try using musical tones instead of numbers. A musician, with a trained ear, can tell what the intervals are between two or more notes. Substitute the four random numbers with four random notes, within an octave, and ask for the intervals between them. Or play major, minor, and diminished chords asking the musician to name the chord types.

On the other hand, if your site is attracting Stephen Hawking devotees, try asking a math problem, like the cube root of 9. For an astronomy site ask about some astrological distances, like about how many miles there are from the Earth to the moon.

Another alternative, that would fit particularly well with the audio method I've presented, would be to ask for certain letters of the alphabet. For example, you could say, "Enter the first and fifth letters of the alphabet," substituting the words "first" and "fifth," with random positions.

For one more idea, try using some sound files with rhyming words and include one out of the group that doesn't rhyme with the others. Ask the user to enter the word that doesn't rhyme. You can probably think of other variations on a "name the thing that doesn't belong" scenario. Attempting to make your CAPTCHA topical, may remove some of the onerousness of the CAPTCHA process.

That's a Wrap

Some way of divining human users from mechanical interlopers is a necessary evil for many web applications. You should seriously consider using a combination of methods so that the feedback components of your site are accessible to all people surfing the Internet.

With some imagination, you may also be able to make the challenge blend into your site. In any case, it should be simple and straightforward. After all, the CAPTCHA is only the door, not what's on the other side.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.