Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

A Tidal Wave of Spam


I opened my email inbox this morning and found 120 messages waiting. These included:

  • Twenty-seven messages from several mailing lists to which I subscribe
  • Twenty-seven messages sent by the Klez virus, including two masquerading under subject lines that purported to tell me how to remove the Klez virus
  • Twelve messages in various non-English languages, most of which I can't read because I don't have the proper character set
  • Twelve messages from merchants, including three trying to sell me prescription and non-prescription pharmaceuticals, two trying to sell me computer hardware, and two advertising Web and email hosting services
  • Six messages trying to entice me to pornography sites or interest me in sex services, including an advertisement for penis enlargement techniques
  • Five messages offering make-money-quick schemes, including two variants on the Nigerian scam
  • Four bounced emails, mostly from mailing lists
  • Three political messages
  • Two offers of software or services for issuing mass mailings

I also received 22 personal messages, about 18 percent of the total email in my inbox. If we pool mailing lists and personal mail, my inbox is still dominated by a majority (51 percent) of unsolicited spam.

Junk email used to make me furious. My first attempt to fight back was to junk the junk mail using the filtering software that was built into my mail reader. Each time I received a new piece of spam, I'd enter the sender's name and subject line into the filter so that I'd never receive that piece of spam again. This system never worked well because spammers vary their headers to avoid this type of filtering. For each piece of junk email the filter found, five slipped past.

I turned to more sophisticated filtering using third-party software running under the Unix procmail facility. The filter that I was originally most enthusiastic about used a form of fuzzy logic to count the occurrences of a long list of spam-related phrases, assigning each piece of incoming mail a spam likelihood index. The index could then be used to sort mail into various folders. For example, if a piece of incoming mail contained a high frequency of the phrases money, make money, and sure fire, it would be classified as a potential make-money-quick email and shunted to the junk mail pile.

I was pretty happy with this software until it misclassified and junked a legitimate email that was sent to notify me that I had been awarded a large grant for my research. I guess a letter that says "you have won a grant" sounds too much like one that says "you may already have won the sweepstakes."

I tinkered with the settings for a while, but could never achieve a satisfactory balance. If I set the software carefully enough that it would never misclassify a legitimate email, it let so much spam through that it wasn't worth the effort. Other filters that I experimented with had similar problems.

There are probably better filters out there, but filtering is a losing battle. I receive about 60 spam messages a day, so a filter would have to detect spam about 99 percent of the time to reduce the number of unsolicited messages to one a day. I get an equal number of legitimate messages daily, and to avoid missing more than one legitimate message per week, I need a filter that misclassifies less than 0.2 percent of mail. Maybe I can find a filter that has these characteristics, but consider what happens when the amount of junk mail I receive increases fivefold, which will likely happen sometime in 2004. To handle this tidal wave of spam, I'll need a filter that's more than 99.7 percent sensitive, but doesn't sacrifice specificity. These will be hard criterion to meet.

I could rail against spam, call on legislatures to criminalize it, encourage ISPs to block it, or propose radical strategies like imposing a charge for each Internet email sent. But I won't. Each of these proposals creates new problems, and many are worse than the one we're trying to solve. Instead, I've learned to stop worrying and to love the spam. A month ago I tossed out my mail filters. I like to think of my morning email sessions as "spam surfing." What new exciting opportunities are complete strangers offering me? What all-natural and completely safe herbal remedies will regrow my hair, boost my sexual stamina, and help me sleep better at night? What interesting attachments does Klez have for me today?

I'm happy and relaxed, and the only small cloud on the horizon is the thought of what I might be missing out on in all of those foreign-language messages. Maybe I should learn Korean.


Lincoln is an M.D. and Ph.D. who designs information systems for the human genome project at Cold Spring Harbor Laboratory in New York, NY. You can contact him at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.