People who are sight challenged would naturally be an audience for an audio CAPTCHA. With this in mind, any audio CAPTCHA implementation should be constructed such that it's easy to use by people who use a screen reader.
You can test how well your audio CAPTCHA works for someone who is using a screen reader by using the "Microsoft Narrator" that comes with Windows. You turn this feature on in Windows XP by selecting Start|All Programs|Accessories|Accessibility|Narrator.
Any instructions that go with your CAPTCHA should be concise. Someone using a screen reader isn't going to want to have it reread some overly verbose instructions.
The implementation I present here can be used without employing the mouse at all. This feature is especially handy for someone using screen readers. I ask users to type a particular key to start the audio. Then once the audio has started, I move the focus to the proper text box and submit for validation upon getting the Enter key.
When implementing an audio CAPTCHA, it is still necessary to disguise the challenge in some way from robots. However, there are ways to deceive these machines that are much less intrusive and easier for a human to deal with than using pictures of distorted letters.
Some audio CAPTCHAs currently in use attempt to foil audio deciphering robots by obscuring the audio. To me, this defeats some of the purpose. Like the aforementioned swirling syllables, when you obscure audio, you make it harder for humans to understand as well as any mechanical facsimiles.
Instead of adding background noises or any similar muddiness to the sound, try adding some simple aural logic that a machine would find difficult to parse. In my implementation, I ask for four numbers. The challenge starts with a simple instruction "Please enter these four numbers...," then speaks the four random numbers (Figure 2). Available at www.ddj.com/code is a series of MP3 audio files for the numbers 0-9.
A nice addition, that should help to obscure the challenge from robots, would be to randomly include a phrase between two of the numbers like, "not 7, but instead a." So, for example, instead of hearing, "Please enter these four numbers4,3,6,2," the user hears "Please enter these four numbers4,3,6, not 7 but instead 2".
Naturally, in this example you would have to add logic to ensure that the fourth number asked for was not a 7. Other methods might include simply adding phrases like "and a," "then press Enter," and so on. The idea here is to add just enough audio to fool the robots without confusing or frustrating users.