Channels ▼
RSS

Design

Jolt Awards: The Best Books


Jolt Productivity Award: Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites, by Matthew A. Russell

There is a new gold rush happening today. The gold is not the a mineral, rather it is data — data we can mine and turn into valuable information. Instead of digging into the earth or panning rivers, we need to mine the Web, specifically the Social Web. If you are planning to find fame and fortune from this gold rush, make sure to pack Matthew Russell's Mining the Social Web in your toolkit.

The Social Web has a wealth of data waiting to be discovered, analyzed, and turned into valuable information. Huge companies, such as Google and Facebook, depend upon this information in order to remain profitable. But you don't have to be a big company in order to mine it. Russell gives you everything you need to dig in and get started.

Mining the Social Web serves up 10 bite-sized chapters that will take you from tenderfoot to a knowledgeable social Web hacker. Spend an hour working through the first chapter and you'll be hooked. Russell takes you through the steps necessary to analyze the latest trends on Twitter, see who's tweeting about the trends, organize the data, and visualize it with tools such as Graphviz.

While Russell shows you exactly how to perform each step, and provides plenty of ideas for you to try, he also encourages you to explore on your own. His style reminds one of a great teacher. He poses a problem, shows you how to solve that problem, and then expands on it and challenges you to reinforce your learning by going further on your own.

In order to take advantage of this book, you must understand Python. All of the examples are written in Python and many external modules are used. Russell makes it as easy as possible for you to understand what is happening, even if you aren't Python-fluent, but this only goes so far. If you really want to get the most out of this book, you should have a working knowledge of the Python language.

Don't read this book when you have no Internet connection. This book reads better in electronic form than the printed page, and makes liberal use of hyperlinks to reference information. In the preface, Russell says that he does this in order for the reader to look at reliable, current information rather than out-of-date material. Without the links, material would need to be included directly in the text, making the book larger and less direct.

Once Russell sets the hook with Twitter hacking in the first chapter, he reels you in with a series of fascinating chapters beginning with capturing information using micro-formats (such as geo, XFN, and others). He shows you how to build graphs that express relationships between pages including microformat notation. While the book is not a tutorial on analytics, it contains plenty of examples of data analysis techniques with reference for more. It also shows you how to use many other tools for massaging data and extracting informational nuggets.

a significant portion of the book is devoted to different sources of data that is ready to be mined. These include mail messages, Twitter, LinkedIn, blogs, Google Buzz, and Facebook. The Google Buzz chapter indicates how quickly things change (today, that chapter would focus on Google+).

The final short chapter discusses the Semantic Web. There isn't much code, simply because the Semantic Web vision has yet to materialize. What is clear, however, is that Mining the Social Web can position you to take advantage of the Semantic Web when it arrives.

— Gary Pollice


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video