## Algorithms for studying everything from movie stars to electrical grids

*
Mary Lynn is a mathematician, programmer, and writer who lives in Tampa, Fl. She can be reached at [email protected] yahoo.com.*

Remember the last time you met someone and suddenly realized there was something major you had in common—a city you'd both lived in or a mutual friend or acquaintance you shared?

"Small world," you said to your new acquaintance as you both pondered the odds. It turns out that the odds are pretty good, and these "small worlds" are everywhere. Studying this small-world phenomenon started out as a psychology experiment in the 1960s. With today's data and computer power, it's now a serious science.

Physicists, biologists, mathematicians, and computer scientists have joined the sociologists and psychologists in studying and modeling small-world networks. While each of these specialists has their own scientific reason to be interested in small worlds, they all have the desire to simulate small-world networks. In this article, I examine a few of the most popular algorithms for simulating small-world networks, and show how you can grow your own networks.

### What Is a Small-World Network?

One popular example of a small-world network today is the movie actor's network. In all likelihood, you've played the Six Degrees of Kevin Bacon game. Pick a celebrity—any celebrity, even a dead one. The game is to connect your celebrity to Kevin Bacon in the fewest steps possible by naming other celebrities and the movies they've starred in together. What makes the game interesting is that it can be challenging to find the path between Kevin Bacon and your celebrity, but it is there! Here's an example. Doris Day. A tough one, right?

It turns out Doris Day is only two degrees of separation from Kevin Bacon. She was in *It Happened to Jane* in 1959 with Jack Lemmon, who was in *JFK* in 1991 with Kevin Bacon. (The University of Virginia has an entertaining on-line version of this game at http://www.oracleofbacon .org/. For kicks, you can code one up yourself with data from the Internet Movie Database, http://www.imdb.com/, and a shortest path algorithm.)

The movie actor's network is one example of a small world. It can be represented mathematically as a graph (a collection of nodes and edges). In this case, nodes are movie actors and an edge connects two actors when they've appeared in a movie together. This graph has the property that the length of the shortest path between any two nodes is pretty small (usually less than 3 or 4 and almost always less than 6 or 7). Figure 1 illustrates a small piece of the movie actor's network. (Can you figure out what movies all of the edges represent?)

### Why Do Scientists Care About This?

Serious scientists care about small-world networks because they appear to be everywhere—and scientists love unifying theories. The Web, for example, is a small-world network (nodes are web pages and edges are the links between them). Of course, navigating the Web is important for lots of folks—and it turns out that the best search engines today are exploiting the linked nature of the Web. There are also examples of small-world networks in the growing field of computational biology. Protein networks are the most studied (nodes are proteins and edges are bindings between them). Other well-studied small-world networks include scientific coauthorship graphs (the only thing scientists love more than unifying theories are their own papers). Nodes in a coauthorship graph are authors of papers and an edge between two nodes represents a coauthored paper.

Many small-world networks affect our lives in major ways. The power grid that supplies electricity is one example. In this case, the nodes are generators, transformers, and substations, and the edges are the high-voltage transmission lines between them. The airport network is another interesting example. The nodes are the airports and the edges are the nonstop flights between them. Even proteins have small-world networks: Figure 2 illustrates a small part of the yeast protein network.

### Small Worlds and Power-Laws

If you think about all of these examples of small-world networks, one thing they all have in common is the concept of a "hub." Some nodes are more popular than others and have many more edges passing through them. This is a key element of small-world networks and it leads to other interesting mathematical properties that many of these networks have in common. The most important of those properties is something called a "power-law degree distribution." The "degree" of a node is defined to be the number of edges connected to it. In Figure 1, for example, Clint Eastwood has degree three. Every network has a "degree distribution," which is simply a function taking a degree to the number of nodes in the network that have that degree. Table 1 illustrates the degree distribution of the tiny network in Figure 1. Also, the nodes in Figure 1 are color coded by their degree. (For example, Doris Day and Kevin Bacon both have degree two and their nodes are both shaded yellow.)

A power-law degree distribution is a degree distribution that follows a power-law. In algebraic terms, if *X** _{k}* denotes the number of nodes with degree

*k*, then

*X*

_{k}*follows a power-law if*

*X*

*is proportional to*

_{k}*k*

*. In graphical terms, if you plot*

^{-a}*k*versus

*X*

*on log-log paper, then the distribution follows a power-law if the log-log plot is roughly linear. Figure 3 illustrates the log-log plot of the degree distribution of the yeast protein network in Figure 2.*

_{k}

### First Simulation

What does all this have to do with programming? Well, it is not just about efficient algorithms for finding the shortest path to Kevin Bacon in the movie actor network. Scientists want to simulate their small worlds to better understand them. Consequently, they've developed models and algorithms for doing just that. To simulate the networks, the scientific theory must be turned into computer code. That's where the fun begins. The algorithms presented in this article are simple and easy to implement in any language. I chose to demonstrate them here in Perl, mainly for the built-in hash data structure.

Start by simulating something simple—a completely random network. Granted, random networks aren't good models for small-world networks, but this is just a starting point. In the 1950s, mathematicians Erdös and Rényi introduced the notion of a random graph. A random graph has two parameters—*N*, the number of nodes that will be in the graph, and *p** _{,}* the probability that any pair of nodes will be connected. To simulate a random graph, you generate a random number

*r*for every pair of nodes. If

*r<p*, then you place an edge between that pair of nodes; otherwise, no edge is placed between the pair of nodes. Listing One implements this algorithm.

Simulating a random graph is so straightforward—you don't even have to hold the graph in a data structure. You can just print out the edges as they are simulated. In Listing One, there are a few *print* statements that probably seem unnecessary. Of course, there is a reason for them. The output from Listing One is a graph in "dot" format, a special format for graphs that is used by the Graphviz open-source graph drawing and visualization package developed by AT&T Lab researchers (http://www.research.att.com/sw/tools/graphviz/).

Once you have Graphviz installed, you can run Listing One and pipe the output into a file called (for instance) ER.dot. Then use the Neato graph visualization program (that comes with Graphviz) with ER.dot as your input. To produce Figure 4, I used the command:

> neato -Tps -Gcenter -Gsize="6,6" ER.dot -o ER.ps

### A First Small-World Model

One of the first mathematical models built specifically for small-world networks was the Watts-Strogatz model. It is a brilliantly simple model that provides the ability to calibrate between a ring lattice (a completely structured regular graph) and a random graph. The motivation for this approach was to model a few properties that Watts and Strogatz discovered in real social networks. The first property was the "small-world property"—the existence of short paths between most nodes in the network. The other property was something they called "clustering." In a network with a high clustering coefficient if two nodes are connected to a common node, then the two original nodes are likely to be connected to each other. This is a common phenomenon in social networks. It shouldn't be a surprise that in many real situations, cliques form. Friends come in groups larger than two. That is the central notion behind clustering.

In this initial Watts-Strogatz small-world model, the idea of a power-law degree distribution was not considered. As a result, the graphs simulated with the Watts-Strogatz algorithm do not possess this property. But it is a nifty model that has generated an enormous amount of enthusiasm in the scientific community.

There are three parameters to the Watts-Strogatz model. *N* is the number of nodes in the graph you want to simulate. *K* is the degree of each node at the initial step of the algorithm. And *p* is the probability of randomly rewiring each edge in the second step of the algorithm.

1. The Initial Step in the Watts-Strogatz model is to start with *N* nodes. Place them in a "ring." Connect every node to its *K* neighbors (*K/*2 on each side). This is called a "ring lattice" on *N* nodes with degree *K*.

2. Next is the Randomization Step, where you randomly rewire each edge with probability *p* such that self-connections and duplicate edges are excluded.

That's it. Simple, but powerful. Listing Two is my Perl implementation of this model. Notice that this algorithm does require storing the graph in a data structure. Since I'm using Perl, I have access to a double hash structure that is perfect for this job. This method may not prove scalable enough to simulate massive graphs, but will easily handle moderately large graphs. And the state-of-the art graph visualization algorithms will bog down long before Perl's ability to store a graph during simulation.

Figure 5 is a graph simulated by the Watts-Strogatz model with *N=*80, *K=*2, and *p=*0.3.

### The Rich Get Richer

Another group of scientists took a different approach to modeling small-world networks. Barabási and Albert developed a model they called the "Scale-Free Model" that uses the age-old notion of "the rich get richer, while the poor stay poor" to explain the power-law degree distribution present in many real networks. Barabási and Albert didn't want a model that started with *N* nodes, like the Watts-Strogatz algorithm. Instead, they wanted to grow a network over a series of time steps. Remarkably, graphs simulated with their basic "growth by preferential attachment" algorithm all end up with power-law degree distributions.

There are really only two parameters to the Barabási/Albert algorithm, *m** _{0 }*and

*m.m*

*is the number of nodes present at time*

_{0}*0*. (There are no edges present at time

*0*.)

*m*is the number of edges added at each time

*t*. I add a third parameter,

*N*, which will be the final number of nodes in the graph. By specifying

*N*at the start, this determines how many time steps to take in the simulation algorithm. Here are the details of the Barabási and Albert (Scale-Free) Model:

1. Growth. Start with a small number (*m** _{0}*) of nodes. At every time step, add a new node with

*m(m*

_{0}*)*edges that link the new node to

*m*nodes in the network.

2. Preferential Attachment. At each time step, choose the *m* nodes to connect to the new node by giving preference to the nodes with larger degrees. Specifically, the probability that a new node will connect to a node with degree *k* will be *k* divided by the sum of the degrees of all the nodes in the graph.

At time *t*, there will be *N = t + m** _{0 }*nodes in the graph and

*mt*edges.

Listing Three is my Perl implementation of the Barabási/Albert simulation algorithm. To keep the code simple, I don't test whether the *m* nodes that are connecting to the new node are all different. Simple variations to a model's implementation like this can have a significant impact on the graphs you can simulate.

Figure 6 is a graph simulated from this algorithm with *N=*500, *m** _{0}*=10, and

*m*=1. Figure 7 is the log-log plot of the degree distribution of this simulated graph, which demonstrates that this algorithm does produce graphs that have a power-law degree distribution.

### New Models, New Science

The models I've described in this article are just the tip of the iceberg in simulating small-world networks. Interesting new properties are being discovered in real-world networks all the time. As new properties are discovered, new models are needed that can simulate graphs with those properties. It is an exciting new science that is relying on clever algorithms and computer simulations for experimentation and discovery. You can start by searching the science literature for "complex networks" or "small worlds." Develop new models, tweak old ones, code them up. Who knows. You could discover the next big thing.

### Bibliography

Barabási, A.L. and R. Albert, "Emergence of Scaling in Random Networks," *Science*, 286, 509-512 (1999).

Milgram, S. "The Small World Problem," *Psychology Today* 2, 60-67 (1967).

Watts, D.J. and S.H. Strogatz, "Collective Dynamics of 'Small-World' Networks," *Nature*, 393, 440-442 (1998).

Xenarios I, Rice D.W., Salwinski L., Baron M.K., Marcotte E.M., Eisenberg D. (2000) "DIP: The Database of Interacting Proteins." *Nucleic Acids Research* 28:289-91. The DIP database is available online at http://dip.doe-mbi.ucla.edu/dip/Main.cgi.

**DDJ**

#### Listing One

#!/usr/bin/perl # Generate an Erdos-Renyi Random Graph with $N nodes # and edge probability $p using random seed $seed $N = 50; $p = 0.2; $seed = 11967; print "graph ER { \n"; print "node [shape=point,color=blue,width=.1,height=.1];\n"; srand($seed); foreach $i (1..$N){ foreach $j ($i+1..$N){ $r = rand(); if($r < $p){ print "$i -- $j;\n"; } } } print "}\n";

#### Listing Two

#!/usr/bin/perl # Generate a Watts-Strogatz Small World Network # with $N nodes, starting degree $K, and probability of rewiring $p $N = 80; $K = 4; $p = .3; $seed = 189123; print "graph WS { \n"; print "node [shape=point,color=blue,width=.1,height=.1];\n"; srand($seed); # initial step -- set up ring lattice with $N nodes, each of degree $K foreach $i (0..$N-1){ $left = int($K/2); # num nodes to connect to left $right = $K - $left; # num nodes to connect to right foreach $j (1..$left){ $ln = ($i-$j) % $N; $graph{$i}{$ln} = 1; $graph{$ln}{$i} = 1; } foreach $j (1..$right){ $rn = ($i+$j) % $N; $graph{$i}{$rn} = 1; $graph{$rn}{$i} = 1; } } # Rewire each edge with probability $p foreach $i (keys %graph){ foreach $j (keys %{$graph{$i}}){ $r = rand(); if($r < $p){ # randomly select a new node $jnew to connect to $i $done = 0; while(!$done){ $jnew = int($N*rand()); if( ($jnew != $i) && ($jnew != $j) ){ $done = 1; } } # remove edge $i <-> $j undef $graph{$i}{$j}; undef $graph{$j}{$i}; # add edge $i <-> $jnew $graph{$i}{$jnew}++; $graph{$jnew}{$i}++; } } } # print graph foreach $i (keys %graph){ foreach $j (keys %{$graph{$i}}){ print "$i -- $j\n"; } } print "}\n";

#### Listing Three

#!/usr/bin/perl # Generate a Barabasi-Albert Scale-Free Network with $N nodes, # starting with $m_0 nodes and adding $m edges at each timestep $N = 500; $m_0 = 10; $m = 1; $seed = 5255221; print "graph BA { \n"; print "node [shape=point,color=blue,width=.1,height=.1];\n"; srand($seed); # start with $m_0 nodes and no edges. First step--add 1 node and $m new edges # no preferential attachment here since no edges exist yet foreach $i (1..$m){ # select one of first $m_0 nodes (labeled 0..$m_0-1) to attach to node $m_0 $j = int($m_0*rand()); $graph{$m_0}{$j} = 1; $graph{$j}{$m_0} = 1; } # Preferential Attachment Growth $num_steps = $N - $m_0 - 1; foreach $t (1..$num_steps){ $new_node = $m_0 + $t; #calculate degree seq and sum of degrees $sumdeg = 0; if(exists $graph{$i}){ $degree{$i} = keys %{$graph{$i}}; } else { $degree{$i} = 0; } $sumdeg += $degree{$i}; } foreach $j (1..$m){ # preferentially select node $R = int($sumdeg*rand()); $cS = 0; $i = 0; while($cS < $R){ $cS += $degree{$i}; $i++; } $sel_node = $i ? $i-1 : 0; # add edge $new_node <-> $sel_node $graph{$new_node}{$sel_node} = 1; $graph{$sel_node}{$new_node} = 1; } } # print graph foreach $i (keys %graph){ foreach $j (keys %{$graph{$i}}){ print "$i -- $j\n"; } } print "}\n";