INFO-LINK




Naive Bayesian Text Classification


John is chief scientist at Electric Cloud, which focuses on reducing software build times. He is also the creator of POPFile. John can be contacted at jgc@electric-cloud.com.


Paul Graham popularized the term "Bayesian Classification" (or more accurately "Naïve Bayesian Classification") after his "A Plan for Spam" article was published (http://www.paulgraham.com/spam.html). In fact, text classifiers based on naïve Bayesian and other techniques have been around for many years. Companies such as Autonomy and Interwoven incorporate machine-learning techniques to automatically classify documents of all kinds; one such machine-learning technique is naïve Bayesian text classification.

Naïve Bayesian text classifiers are fast, accurate, simple, and easy to implement. In this article, I present a complete naïve Bayesian text classifier written in 100 lines of commented, nonobfuscated Perl.

A text classifier is an automated means of determining some metadata about a document. Text classifiers are used for such diverse needs as spam filtering, suggesting categories for indexing a document created in a content management system, or automatically sorting help desk requests.

The classifier I present here determines which of a set of possible categories a document is most likely to fall into and can be used in any of the ways mentioned with appropriate training. Feed it samples of spam and nonspam e-mail and it learns the difference; feed it documents on various medical fields and it distinguishes an article on, say, "heart disease" from one on "influenza." Show it samples of different types of help desk requests and it should be able to sort them so that when 50 e-mails come in informing you that the laser printer is down, you'll quickly know that they are all the same.

The Math

You don't need to know any of the underlying mathematics to use the sample classifier presented here, but it helps.

The underlying theorem for naïve Bayesian text classification is the Bayes Rule:

P(A|B) = ( P(B|A) * P(A) ) / P(B)

The probability of A happening given B is determined from the probability of B given A, the probability of A occurring and the probability of B. The Bayes Rule enables the calculation of the likelihood of event A given that B has happened. This is used in text classification to determine the probability that a document B is of type A just by looking at the frequencies of words in the document. You can think of the Bayes Rule as showing how to update the probability of event A happening given that you've observed B.

A far more extensive discussion of the Bayes Rule and its general implications can be found in the Wikipedia (http://en.wikipedia.org/wiki/Bayes%27_Theorem). For the purposes of text classification, the Bayes Rule is used to determine the category a document falls into by determining the most probable category. That is, given this document with these words in it, which category does it fall into?

A category is represented by a collection of words and their frequencies; the frequency is the number of times that each word has been seen in the documents used to train the classifier.

Suppose there are n categories C0 to Cn-1. Determining which category a document D is most associated with means calculating the probability that document D is in category Ci, written P(Ci|D), for each category Ci.

Using the Bayes Rule, you can calculate P(Ci|D) by computing:

P(Ci|D) = ( P(D|Ci ) * P(Ci) ) / P(D)

P(Ci|D) is the probability that document D is in category Ci; that is, the probability that given the set of words in D, they appear in category Ci. P(D|Ci) is the probability that for a given category Ci, the words in D appear in that category.

P(Ci) is the probability of a given category; that is, the probability of a document being in category Ci without considering its contents. P(D) is the probability of that specific document occurring.

To calculate which category D should go in, you need to calculate P(Ci|D) for each of the categories and find the largest probability. Because each of those calculations involves the unknown but fixed value P(D), you just ignore it and calculate:

P(Ci |D) = P(D|Ci ) * P(Ci)

P(D) can also be safely ignored because you are interested in the relative—not absolute—values of P(Ci|D), and P(D) simply acts as a scaling factor on P(Ci|D).

D is split into the set of words in the document, called W0 through Wm-1. To calculate P(D|Ci), calculate the product of the probabilities for each word; that is, the likelihood that each word appears in Ci. Here's the "naïve" step: Assume that words appear independently from other words (which is clearly not true for most languages) and P(D|Ci) is the simple product of the probabilities for each word:

P(D|Ci) = P(W0|Ci) * P(W1|Ci) * ... * P(Wm-1|Ci)

For any category, P(Wj|Ci) is calculated as the number of times Wj appears in Ci divided by the total number of words in Ci. P(Ci) is calculated as the total number of words in Ci divided by the total number of words in all the categories put together. Hence, P(Ci|D) is:

P(W0|Ci) * P(W1|Ci) * ... * P(W m-1|Ci) * P(Ci)

for each category, and picking the largest determines the category for document D.

A common criticism of naïve Bayesian text classifiers is that they make the naïve assumption that words are independent of each other and are, therefore, less accurate than a more complex model. There are many more complex text classification techniques, such as Support Vector Machines, k-nearest neighbor, and so on. In practice, naïve Bayesian classifiers often perform well, and the current state of spam filtering indicates that they work very well for e-mail classification.

A useful toolkit that implements different algorithms is the freely available Bow toolkit from CMU (http://www-2.cs.cmu.edu/~mccallum/bow/). It makes a useful testbed for comparing the accuracy of different techniques. A good starting point for reading more about naïve Bayesian text classification is the Wikipedia article on the subject (http://en.wikipedia.org/wiki/Naïve_Bayesian_classification).

Implementation

The Perl implementation (Listing One) uses the hash (associative array) %words to store the word counts for each word and for each category. The hash is stored to disk using a Perl construct called a "tie" that, when used with the DB_File module, results in the hash being stored automatically in a file called "words.db" so that its contents persist between invocations.

use DB_File;
my %words;
tie %words, 'DB_File', 'words.db';

The hash keys are strings of the form category-word: For example, if the word "potato" appears in the category "veggies" with a count of three, there will be a hash entry with key "potato-veggies" and value "3." This data structure contains enough information to compute the probability of a document and do a naïve Bayesian classification.

The subroutine parse_file reads the document to be classified or trained on and fills in a hash called %words_in_file that maps words to the count of the number of times that word appeared in the document. It uses a simple regular expression to extract every 3- to 44-letter word that is followed by whitespace; in a real classifier, this word splitting could be made more complex by accounting for punctuation, digits, and hyphenated words.

sub parse_file
{
my ( $file ) = @_;
my %word_counts;
open FILE, "<$file";
while ( my $line = <FILE> ) {
while ( $line =~
s/([[:alpha:]]{3,44})[ \t\n\r]// ){
$word_counts{lc($1)}++;
}
}
close FILE;
return %word_counts;
}

The output of parse_file can be used in two ways: It can be used to train the classifier by learning the word counts for a particular category and updating the %words hash, or it can be used to determine the classification of a particular document.

To train the classifier, call the add_words subroutine with the output of parse_file and a category. In the Perl code, a category is any string and the classifier is trained by passing sample documents into parse_file and then into add_words: add_words( <category>, parse_file( <sample document>));

sub add_words
{
my ( $category, %words_in_file ) = @_;
foreach my $word (keys %words_in_file) {
$words{"$category-$word"} +=
$words_in_file{$word};
}
}

Once document training has been done, the classify subroutine can be called with the output of parse_file on a document. classify will print out the possible categories for the document in order of most likely to least likely:

classify ( parse_file( <document to classify> ) );

sub classify
{
my ( %words_in_file ) = @_;
my %count;
my $total = 0;
foreach my $entry (keys %words) {
$entry =~ /^(.+)-(.+)$/;
$count{$1} += $words{$entry};
$total += $words{$entry};
}
my %score;
foreach my $word (keys %words_in_file) {
foreach my $category (keys %count) {
if (defined($words{"$category-$word"})) {
$score{$category} +=
log( $words{"$category-$word"} /
$count{$category} );
} else {
$score{$category} +=
log( 0.1 /
$count{$category} );
}
}
}
foreach my $category (keys %count) {
$score{$category} +=
log( $count{$category} / $total );
}
foreach my $category (sort { $score{$b} <=> $score
{$a} } keys %count) {
print "$category $score{$category}\n";
}
}

classify first calculates the total word count ($total) for all categories (which it needs to calculate P(Ci)) and the word count for each category (%count indexed by category name, which it needs to calculate P(Wj|Ci)). Then classify calculates the score for each category: The score is the value of P(Ci|D). It's preferable to call it a score for two reasons: Ignoring P(D) means that, strictly speaking, the value is being calculated incorrectly and classify uses logs to reduce overflow errors and replace multiplication by addition for speed. The score is in fact log P(Ci|D), which is:

log P(W0|Ci) + log P(W1|Ci) + ... + log P(Wm-1|Ci) + log P(Ci)

(Recall the equality log (A*B)=log A+log B). In that log form, it is still suitable for comparison. After the score has been calculated, classify calculates log P(Ci) for each category and then sorts the scores in descending order to output the classifier's opinion of the document. classify makes an estimate of the probability for a word that doesn't appear in a particular category by calculating a very small, nonzero probability for that word based on the word count for the category:

$score{$category} += log( 0.1 / $count{$category} );

A small amount of Perl code wraps these three subroutines into a usable classifier that accepts commands to add a document to the word list for a category (and hence, train the classifier), and to classify a document.

if ( ( $ARGV[0] eq 'add' ) && ( $#ARGV == 2 ) ) {
add_words( $ARGV[1],
parse_file( $ARGV[2] ) );
} elsif ( ( $ARGV[0] eq 'classify' ) && ( $#ARGV == 1 )
) {
classify( parse_file( $ARGV[1] ) );
} else {
print <<EOUSAGE;
Usage: add <category> <file> - Adds words from <file>
to category <category>
classify <file> - Outputs classification
of <file>
EOUSAGE
}
untie %words;

If the Perl code is stored in file bayes.pl, then the classifier is trained like this:

perl bayes.pl add veggies article-about-vegetables
perl bayes.pl add fruits article-about-fruits
perl bayes.pl add nuts article-about-nuts

to create three categories (veggies, fruits, and nuts). Asking bayes.pl to classify a document will output the likelihood that the document is about vegetables, fruits, or nuts:

% perl bayes.pl classify article-I-just-wrote
fruits -4.11700258611469
nuts -6.60190923590268
veggies -11.9002266024507

Here, bayes.pl shows that the new article is most likely about fruits.

E-Mail Classification

If you are interested in classifying e-mail, there are a couple of tweaks that improve accuracy in practice: Don't fold case on values from headers and count words differently if they appear in the subject or body.

In the aforementioned Perl implementation, there is no difference between the words From, FROM, and fRoM: They are all considered to be instances of from. The parse_file subroutine lowercases the word before counting it. In practical e-mail classifiers, the names of e-mail headers turn out to be a better indicator of the type of an e-mail if case is preserved. For example, the header MIME-Version was written MiME-Version by one piece of common spamming software.

Distinguishing words found in the subject versus the body also increases the accuracy of a naïve Bayesian text classifier on e-mail. The simplest way to do this is to store a word like forward as subject:forward when it comes from the subject line, and simply forward when it is seen in the body.

Performance

The Perl code presented here isn't optimized at all. Each time classify is called, it has to recalculate the total word count for each category and it would be easy to cache the log values between invocations. The use of a Perl hash will not scale well in terms of memory usage.

However, the algorithm is simple and can be implemented in any language. A highly optimized version of this code is used in the POPFile e-mail classifier to do automatic classification. It uses a combination of Perl and SQL queries. The Bow toolkit from CMU has a fast C implementation of naïve Bayesian classification.

Uses of Text Classification

Although spam filtering is the best-known use of naïve Bayesian text classification, there are a number of other interesting uses on the horizon. IBM researcher Martin Overton has published a paper concerning the use of naïve Bayesian e-mail classification to detect e-mail-borne malware (http://arachnid.homeip.net/papers/VB2004-Canning-more-than-SPAM-1.02.pdf). In Overton's paper, presented at the Virus Bulletin 2004 conference, he demonstrated that a text classifier could accurately identify worms and viruses, such as W32.Bagle, and that it was able to spot even mutated versions of the worms. All this was done without giving the classifier any special knowledge of viruses.

The POPFile Project is a general e-mail classifier that can classify incoming e-mail into any number of categories. Users of POPFile have reported using its naïve Bayesian engine to classify mail into up to 50 different categories with good accuracy, and one journalist uses it to sort "interesting" from "uninteresting" press releases.

At LISA 2004, four Norwegian researchers presented a paper concerning a system called DIGIMIMIR, which was capable of automatically classifying requests coming into a typical IT help desk and in some cases responding automatically (http://www.digimimir.org/). They use a document clustering approach that, while not naïve Bayesian, is similar in implementation complexity and allowed the clustering together of "similar" e-mails without knowing the initial set of possible topics.

DDJ



Listing One

use strict;
use DB_File;

# Hash with two levels of keys: $words{category}{word} gives count of
# 'word' in 'category'.  Tied to a DB_File to keep it persistent.

my %words;
tie %words, 'DB_File', 'words.db';

# Read a file and return a hash of the word counts in that file

sub parse_file
{
    my ( $file ) = @_;
    my %word_counts;

    # Grab all the words with between 3 and 44 letters

    open FILE, "<$file";
    while ( my $line = <FILE> ) {
        while ( $line =~ s/([[:alpha:]]{3,44})[ \t\n\r]// ) {
            $word_counts{lc($1)}++;
        }
    }
    close FILE;
    return %word_counts;
}

# Add words from a hash to the word counts for a category
sub add_words
{
    my ( $category, %words_in_file ) = @_;

    foreach my $word (keys %words_in_file) {
        $words{"$category-$word"} += $words_in_file{$word};
    }
}

# Get the classification of a file from word counts
sub classify
{
    my ( %words_in_file ) = @_;

    # Calculate the total number of words in each category and
    # the total number of words overall

    my %count;
    my $total = 0;
    foreach my $entry (keys %words) {
        $entry =~ /^(.+)-(.+)$/;
        $count{$1} += $words{$entry};
        $total += $words{$entry};
    }

    # Run through words and calculate the probability for each category

    my %score;
    foreach my $word (keys %words_in_file) {
        foreach my $category (keys %count) {
            if ( defined( $words{"$category-$word"} ) ) {
                $score{$category} += log( $words{"$category-$word"} /
                                          $count{$category} );
            } else {
                $score{$category} += log( 0.01 /
                                          $count{$category} );
            }
        }
    }
    # Add in the probability that the text is of a specific category

    foreach my $category (keys %count) {
        $score{$category} += log( $count{$category} / $total );
    }
    foreach my $category (sort { $score{$b} <=> $score{$a} } keys %count) {
        print "$category $score{$category}\n";
    }
}

# Supported commands are 'add' to add words to a category and
# 'classify' to get the classification of a file

if ( ( $ARGV[0] eq 'add' ) && ( $#ARGV == 2 ) ) {
    add_words( $ARGV[1], parse_file( $ARGV[2] ) );
} elsif ( ( $ARGV[0] eq 'classify' ) && ( $#ARGV == 1 ) ) {
    classify( parse_file( $ARGV[1] ) );
} else {
    print <<EOUSAGE;
Usage: add <category> <file> - Adds words from <file> to category <category>
       classify <file>       - Outputs classification of <file>
EOUSAGE
}

untie %words;
Back to article


Around the Web

An Events Based Algorithm for Distributing Concurrent Tasks on Multi-Core Architectures

Here's a programming model which enables scalable parallel performance on multi-core shared memory architectures.

Quick Read

Swarm: A True Distributed Programming Language

The Swarm prototype is a simple stack-based language, akin to a primitive version of the Java bytecode interpreter.

Quick Read

Key Software Development Trends

Several trends are emerging within the area of software development. Here are some of the most important trends S. Somasegar has been thinking about recently.

Quick Read

Understanding Parallel Performance

Understanding parallel performance. How do you know when good is good enough?

Quick Read

Short and Tweet: Experiments on Recommending Content from Information Streams

The authors used 12 algorithms to study the URL recommendation on Twitter as a means of better directing attention in information streams.

Quick Read



Video

Forty finalists will gather in Washington, D.C. from March 11-16 to compete for $630,000 in awards.; DDJ; Intel; science; Dr. Dobb's talks with Commonsware's Mark Murphy about what's involved in developing software for the Android operating system; Android; apple; DDJ; tablet development; The new method uses analytics technology developed by the Mayo and IBM collaboration, Medical Imaging Informatics Innovation Center, and has proven a 95 percent accuracy rate in detecting aneurysm.; Algorithm; DDJ; diagnostics; ibm; imaging; T-Mobile USA is enabling phone calls to Haiti without charges for international long distance through January 31 and retroactive to the earthquake on January 12; DDJ; mobile; wireless; Al Williams gives you a demor of One-Der: The One Instruction CPU; DDJ; At the 2010 International Consumer Electronics Show, the auto industry's first working smartphone application was unveiled; DDJ; mobile; The Bluetooth Special Interest Group (SIG) has announced the adoption of BLUETOOTH low energy wireless technology.; bluetooth; DDJ; wireless; IBM has unveiled its list of five innovations that have the potential to change how people live, work and play in cities around the world over the next five to ten years; DDJ; ibm; TeliaSonera's LTE mobile broadband commercial network in Stockholm is now the fastest and largest in the world.; broadband; DDJ; ericsson; mobile; Google has introduced, google Goggles, a visual search application on Android devices that allows users to search for objects using images rather than words; Android; DDJ; google; mobile; Visual Search Applications; Dr. Dobb's talks with David Intersimone, Vice President of Developer Relations and Chief Evangelist at Embarcadero Technologies, about RAD Studio 2010, SQL optimization and his reflections on the software industry.; database programming; DDJ; sql; Researchers from Intel Labs have created an experimental, 48-core Intel processor or "single-chip cloud computer."; cloud computing; DDJ; Intel; multicore; parallelism; The Large Hadron Collider will produce roughly 15 million gigabytes of data annually, to be accessed by a distributed computing and data storage infrastructure called the LHC Computing Grid.; CERN; DDJ; grid computing; physics; A mobile handheld device designed to let users can point, shoot and listen to printed text.; DDJ; Intel; mobile; Ericsson has become the first vendor to prove end to end interoperability in TD-LTE, another standard of 4G radio technologies designed to increase the capacity and speed of mobile telephone networks.; DDJ; ericsson; mobile; TD-LTE; According to a recent study, 80 percent of US respondents feel there are unspoken rules about mobile technology usage, and approximately 69 percent agreed that violations of these unspoken mobile manners are unacceptable.; DDJ; Intel; mobile; IBM and Canonical will introduce a software package for netbooks and other thin client devices in Africa. This is the first cloud- and premise-based Linux netbook software package offered by IBM and Canonical.; cloud computing; DDJ; ibm; His unprecedented ability to manipulate individual atoms signaled a quantum leap forward in in nanoscience experimentation and heralded in the age of nanotechnology.; DDJ; ibm; nanotechnology; IBM honored for its invention of the Blue Gene family of supercomputers. Adobe founders also recognized.; adobe; DDJ; ibm; Former U.S. President Bill Clinton addressed thousands of online entrepreneurs from around the world gathered for the third APEC Business Advisory Council SME Summit in Hangzhou, China.; DDJ; e-business; With free cooling for several months a year, Sweden is an ideal location for cost-efficient data centers.; data centers; DDJ; PNC Bank introduces a new mobile App for the iPhone and iPod touch that provides Virtual Wallet customers with a high-def view of their money while on the go.; DDJ; iphone; The Swedish LTE site will be part of a commercial network scheduled to go live in 2010, bringing data rates far above what is possible in today's mobile broadband networks.; DDJ; ericsson; mobile broadband; Nanotechnology advancement could lead to smaller, faster, more energy efficient computer chips.; circuit boards; DDJ; nanotech; semiconductor; Dr Dobbs talks with with Claudia Backus, Senior Director of Ecosystem Programs at Motorola, regarding the company's recently released MotoDEV Studio for their Android-powered phones.; Android; DDJ; mobile; motodev; The Extremadura Regional Government of Spain and IBM have launched an electronic prescription system in 680 pharmacies in western Spain.; DDJ; ibm; Ericsson to Acquire Majority of Nortel's North American Wireless Business; DDJ; ericsson; mobile; telecom; Nintendo's Wii Sports Resort is an immersive, expansive active-play game that includes a dozen resort-themed activities.; DDJ; nintendo; video games; OnStar can remotely send a signal to the electronic system in the subscriber's stolen vehicle and the vehicle will not be able to be re-started.; cellular; DDJ; wireless; In celebration of the historic Apollo Moon landing, Google has released Moon in Google Earth.; DDJ; google; Ericsson has been awarded contracts with the three telecom operators in China to provide fixed broadband access.; broadband; DDJ; mobile; tv; wireless; Dr. Dobb's talks with Adobe's Adam Lehman about the upcoming release of ColdFusion specifically optimized for Flash and Adobe AIR platform delivery.; adobe; ColdFusion; DDJ; eclipse; Companies team to develop computing device and chipset architectures that will combine the performance of powerful computers with high-bandwidth mobile broadband communications and ubiquitous Internet connectivity.; broadband; DDJ; Intel; mobile; nokia; Adobe Systems and HTC recently announced that the new HTC Hero will be the first Android phone to ship with support for Adobe Flash Platform technology.; adobe; Android; cell phones; DDJ; flash; mobile; mobility; 3.2 million Euros awarded across eight prize categorie recognizing world-class scientific research and artistic creation.; DDJ; A parody of Paul Simon's "50 Ways to Leave Your Lover," but for software security nerds.; DDJ; sql; Dr. Dobb's Mike Riley talks with Jim Manias of Advanced Systems Concepts.  In this conversation, Jim discusses the new ActiveBatch 7 and how it can provide significant productivity gains for application developers and business process owners alike.; ActiveBatch; DDJ; Sun cofounder Scott McNealy and Oracle CEO Larry Ellison discussed Java's role in computing. Sun has also released OpenSolaris 2009.06.; DDJ; java; opensolaris; oracle; sun; Spotlight on NATO's centre of excellence on cyber defense in Tallinn, Estonia.; cyber defense; DDJ; nework security; security; Create Data Access Layers in ASP.NET; DDJ; In this demonstration you will learn how to layout a WPF application. We will explore the major layout panels that come with WPF, contrasting them with each other and describing when to use each.; DDJ; web development; windows; wpf; The Intel Foundation has announced the top winners of the Intel International Science and Engineering Fair; DDJ; Intel; News; science; Matt Hester demonstrates Internet Explorer’s 8 new feature Selectors API for utilizing CSS selectors for quick and easy element lookups.; DDJ; IE8; microsoft; windows; The NATO Virtual Silk Highway provides affordable, high-speed Internet access via satellite to the academic communities of the Caucasus and Central Asia.; DDJ; On a Windows Mobile device, applications are typically not closed down, but they stay in the background. Maarten Struys shows you a simple way to preserve battery power inside your own applications.; DDJ; microsoft; power consumption; windows; Windows Mobile Devices; Cadillac is now offering wireless Internet access with its CTS sedan.; DDJ; wireless broadband; By default, Windows Mobile Standard (Smartphone) applications launched from Visual Studio are not accessible on the device/emulator once they are minimized. In this video, Jim Wilson demonstrates two simple techniques to solve the problem.; DDJ; microsoft; smartphone; VIsual Studio; Mike Riley talks with the brass from Everypoint, creators of the NEMO mobile application development platform.; DDJ; Developers; development environments; mobile applications; Symmetric and asymmetric encryption algorithms, the SHA256 hash encryption algorithms, and how to implement in a simple application using Microsoft's Azure Services Platform.; Azure; DDJ; encryption; microsoft; security; windows; T-Mobile has introduced the Sidekick LX, which features enhanced video capability.; DDJ; Mobile Smartphone; Bluetooth 3.0 offers speedier transmission of large amounts of video, music and photos between devices wirelessly.; bluetooth; DDJ; mobile networks; wireless broadband; Cities around the world are battling with stressed transportation networks, so IBM has announced plans for three new smart rail projects in China, Taiwan and The Netherlands.; DDJ; ibm; ILOG; CASMOBOT is a Nintendo Wii remote controlled slope lawn mower.; DDJ; Denmark; nintendo wii; research; robotics; Project ensures documents, images, video and other Internet-based data growing at over 100 terabytes per month will live on for future generations; data storage; DDJ; history; Intenet; research; Sun Microsystems; Dr. Dobb's talks with Dave McAllister, Director of Standards and Open Source for Adobe, about the Open Screen Project.; adobe; DDJ; Open Screen Project; open source; The Facebook Connect SDK provides the code to let third-party developers embed hooks into their applications so users can connect to their Facebook accounts and exchange information using iPhone apps.; apple; cocoa; DDJ; Facebook; iphone; Mars in Google Earth Updated; DDJ; google; google earth; Google mars; red planet; The Sun Cloud is built on the Sun Open Cloud Platform that leverages the best in world-class open source technologies. The Sun Open Cloud Platform brings together Java, MySQL, OpenSolaris and OpenStorage.; cloud computing; DDJ; java; open solaris; sun; DDJ; High School; Intel; science; ILOG Elixir is a suite of professional user interface controls that gives developers a rich collection of innovative and interactive data display components for Adobe Flex and Adobe Air.; adobe; air; DDJ; elixir; flash; flex; ILOG; The inaugural San Diego Science Festival being held this month is touted as one of the largest multicultural, multigenerational, multidisciplinary celebrations of science ever seen on the West Coast; DDJ; lockheed; News; science; IBM has announced Innov8 version 2, a new version of its serious game that helps students and professionals hone their business and technology skills in a compelling, familiar video game format.; DDJ; ibm; serious games; Swiss Automobile Visionary Frank M. Rinderknecht builds a concept car with adaptive energy concept and iPhone controls.; apple; Concept Car; DDJ; iphone; j; siemens; Two-Year Plan to Focus on 32 Nanometer Manufacturing Technology; 32 nanometer technology; chip; cpu; DDJ; gpu; Intel; manufacturing; Nehalem; Westmere; New version features ocean layer, historical imagery, and more.; DDJ; google; Dr. Dobb's talks with Marty Alchin, author of "Pro Django" about his book and the deep internals of the Django framework.; DDJ; Django; A new content-authoring solution for learning professionals; adobe; DDJ; toolkits; web authoring; In a Second Life setting, Danny Coward discusses Java FX with Dr. Dobb's Jon Erickson.; DDJ; java; JavaFX; sun; The Core i7 processor is the first member of a new family of Nehalem processor designs with new technologies that boost performance on demand.; chip; DDJ; Intel; processors; Dan Diephouse, creator of XFire, a high-performance open-source SOAP framework (which became the Apache CXF project), shares the five common mistakes in SOA governance and insight about the Apache CXF and Mule RESTpack development environments.; apache; Apache CXF; DDJ; mule; open source; soa; soap; Xfire; Adrian Kaehler and Gary Bradski discuss the Open Computer Vision Library (sourceforge.net/projects/opencvlibrary/) and their book "Learning OpenCV".; DDJ; Open Computer Vision Library; OpenCV; In the first part of this two-part interview, Stephen Wolfram reflects on the 20-year anniversary of Wolfram Research.; DDJ; Mathematica; Mathematics; science; In the second part of this two-part interview, Stephen Wolfram discusses his book "A New Kind of Science."; DDJ; Mathematica; Mathematics; science; Nick Hodges talks about Delphi 2009, a RAD tool for Windows, and Delphi Prism, a database engine for Windows, Mac OS X, and Linux.; DDJ; delphi; RAD; windows; Dr. Dobb's talks with Tony Lombardo, lead Technical Evangelist at Infragistics, about all new UI tools for Windows and .NET.; .net; DDJ; silverlight; ui; windows; wpf; Dr. Dobb's talks with Eric Schulz about his International Mathematica User's Conference 2008 presentation on the Mathematica Essentials Palette and the future digital educational material; DDJ; Mathematica; Mathematics; Dr. Dobb's talks with ActiveState's Trent Mick about the recently released Komodo IDE 5.0.; DDJ; ide; open source; Dr. Dobb's talks with Continuity Logic's Kris Carlson about "Why We Die: Simulation of the Evolution of Senescence" and why he programs with Mathematica's functional programming language.; DDJ; functional programming; Mathematica; simulation; Ericsson collaborates with Intel; DDJ; ericsson; Intel; Mobile technology; Dr. Dobb's talks with Schoeller Porter about the grid and cloud versions of Mathematica; clouds; DDJ; Grid; Mathematica; Dr Dobb's interviews Yehuda Katz, maintainer of the Merb project, about the advantages this highly optimized Ruby on Rails alternative offers to web application developers.; DDJ; Ruby on Rails; Dr. Dobb's talks with Thomas Roman, Professor of Mathematics at Central Connecticut State University, about "Mathematica Visualization in a Theoretical Physics Problem - Negative Energy in an Unusual Quantum State."; DDJ; Mathematica; physics; quantum; science; The Forbidden City: Beyond Space & Time is a fully immersive, three-dimensional virtual world that recreates a visceral sense of space and time.; Blade Server; China; DDJ; ibm; linux; mac; online; virtual world; windows; Dr. Dobb's interviews open source luminary Miguel de Icaza about his latest milestone of achieving Microsoft .NET 2.0 Framework compatibility with the Mono Project .; DDJ; Dr. Dobb/s interviews Paul Kimmel, author of "LINQ Unleashed for C#", about Microsoft's new query technology that lets developers poll any information from any data source regardless of location or structure. I; C#; DDJ; Dr. Dobb's; LINQ; microsoft; It takes a supercomputer to build a super car. ; DDJ; HPC; simulation; Dr. Dobb's shows how to install and execute cross-platform scripting languages on the Windows Mobile platform. In this installment, Mike Riley examines Perl for Windows Mobile devices.; DDJ; mobile devices; perl; windows; Dr. Dobb's shows how to install and execute cross-platform scripting languages on the Windows Mobile platform. In this installment, Mike Riley examines Python CE which is optimized for Windows Mobile devices.; DDJ; mobile devices; python; windows; Dr. Dobb's shows how to install and execute cross-platform scripting languages on the Windows Mobile platform. In this installment, Mike Riley examines Ruby for Windows Mobile devices.; DDJ; mobile devices; ruby; windows; Young participants at ITU TELECOM ASIA 2008 in Bangkok, Thailand received free laptops as part of ITU’s initiative to promote affordable devices to increase access to information and communication technologies.; communication; DDJ; itu; Currently technical strategist to Microsoft's Chief Software Architect, Rebecca Norlander has had a tremendous impact on Excel, Internet Explorer, Windows XP SP2, and Windows Vista Security. ; DDJ; microsoft; Contributing authors to the book "Beautiful Code" got together at Dr. Dobb's SD West Conference in March, 2008. Part 1 of 3.; DDJ; programming; software development; Contributing authors to the book "Beautiful Code" got together at Dr. Dobb's SD West Conference in March, 2008. Part 2 of 3.; DDJ; programming; software development; Contributing authors to the book "Beautiful Code" got together at Dr. Dobb's SD West Conference in March, 2008. Part 3 of 3.; DDJ; programming; software development; Anders Hejlsberg discusses C#, Turbo Pascal, and what it means to design a programming language. ; C#; DDJ; microsoft; Turbo Pascal; Solar powered laptops given to youths at ITU Asia 2008.; DDJ; News; telecommunications; IBM breakthrough stands to impact future direction of information technology.; DDJ; Mike Riley spoke to ActiveState's Jeff Hobbes about the new features in Tcl Dev Kit and Perl Dev Kit including the code coverage and hot-spot analysis tool and Mac OSX support.; DDJ; Tim O'Reilly addressed the OSCON convention in his Wednesday keynote titled "Degrees of Freedom, Open Source in the Wed 2.0 Era.; DDJ;


Enabling People and Organizations to Harness the Transformative Power of Technology