Channels ▼
RSS

Tools

GData: Accessing Google-Application Data


This month, I'm starting a series of articles addressing Google's GData APIs — the system that your web application (service) uses to access a customer's Google Web application data (for example, their Calendar events, Documents, etc.). The APIs are pretty consistent across the various Google apps, so I'll focus on Calendar as characteristic. Everywhere you read "Calendar" in this article, you can substitute the GData API of your choice. I'm going to start with a general dicussion of the APIs and how they work. Next month, we'll get into actual code.

Google's work on the APIs is uneven. I'm a huge fan of the Google Web Toolkit (GWT), for example. It has made my life vastly easier and my code more robust (and testable) than it would be were I working directly in raw JavaScript or one of the JavaScript libraries. And it lets me use a consistent object model and development environment across the entire application. I might quibble with a design decision here and there (I wish there were more interfaces in places), but I understand why the Google engineers made the decisions they did, even when I disagree with them.

At the other extreme, we have the GData APIs, which you use to access Google services like Calendar and Documents. These APIs are universally miserable. In fact, the Java wrappers that you use to access the underlying HTTP-level protocols are some of the worst work I've seen in many years of programming. Amateurish, low-quality student work, the Java layer violates many of the principles of good architecture, and the programmers seemed unaware of OO design concepts as basic as data abstraction. The Java layer is buggy, overcomplicated, badly documented, and essentially unusable. I'll explain these comments in a moment.

The quality issues not withstanding, I do have to interface with GData to get work done, so I've written my own Java abstraction layer to make that possible. I don't abandon existing libraries lightly, even when they're suboptimal, because you generally save time by using them. After wasting too many days getting nowhere, though, I'd had enough. Given how easy it was to write replacements for Google's code, I probably waited way too long before bailing.

I plan to open-source my GData wrappers once I get them into good-enough shape, and will publish some of them in Dr. Dobb's over the next couple months.

The Anatomy of a GData Call: REST APIs

Google's GData APIs are a two-level system. The top-level is made up of high-level-language (Python, Java) wrappers around lower-level text-over-HTTP remote procedure calls. The low level follows the "REST" philosophy: To make a call, you issue an HTTP POST or GET request with what you want to do (and any arguments required to make things specific) either encoded in the URL or specified in the body of the POST. Google returns the result in XML or JSON. REST is effectively a very simple RPC mechanism that leverages the HTTP server for data transport.

Before leaping in to the mechanics of the call, some documentation might be helpful. The main page for the Calendar API documentation is here. The various REST calls and return values for the Calendar APIs are described adequately in two documents: The Data API Developer's Guide: The Protocol and the Data API Atom Reference The former contains copious examples of the GET/POST formatting and also both the Atom and JSON return values. Find links to other web-service APIs in the Site Directory.

If a Calendar is public, you can just make an HTTP request and get a result back. If the Calender is private, you'll need to get permission to access the data via the OAuth process, which I discussed in an earlier series of articles here and here. (Google has provisions for authentication mechanisms other than OAuth, but they're discouraging their use, so I won't cover them.) The OAuth "dance" is the process that your app owners use to grant permission to access their data. The original OAuth articles talk about how to create a public-key/private-key pair and register it with Google. The dance uses those keys to digitally sign requests, and the dance terminates when google gives you an "authorized request token," which you store in your database and pass back to Google with every GData request for a given customer. The authorized request token doesn't expire, though your customer (who granted access to you in the first place) can revoke it. You'll also have to digitally sign every GData request using the same private key that you used in the original OAuth dance.

The examples I'll give you next month assume that you've gotten an OAuth authorized request token from Google, either by working with the code from the earlier articles or cutting and pasting from Google's OAuth Playground. If you do the latter, my code uses RSA-SHA1 as the "Signature Method."

A basic request that uses OAuth for authentication takes the following a form:

GET https:<i>//www.google.com/calendar/feeds/default/owncalendars/full</i>
    &oauth_consumer_key=example.com
    &oauth_nonce=38863f48...28dd9fd2c
    &oauth_signature_method=RSA-SHA1
    &oauth_timestamp=1249972977
    &oauth_token=1%2Fz1...LMzNBrKhElA
    &oauth_version=1.0

I've broken the string up onto multiple lines so that you can see the arguments, but normally everything would be concatenated into a single line, and then you'll URL-encode the line, yielding something like this:

GET&http%3A%2F%2Fwww.google.com%2Fcalendar%2Ffeeds%2Fdefault... etc.

Google calls this URL-encoded request a base string.

The query-string parameters are:

  • oauth_consumer_key: This parameter is typically the name of your web site. It's registered with Google when you register your public-key certificate (see my previous OAuth article), and is passed to Google as part of the initial OAuth registration process.

  • oauth_nonce: This parameter is a random number that Google uses to make sure that a hacker isn't capturing a request and then reusing it at a later date. On a given day (as defined by the oauth-timestamp parameter), this nonce can be used only once by your service. That is, the timestamp-nonce duple must be unique.
  • oauth_timestamp=: Today. This is an old-style UNIX data: seconds since 00:00:00 on Jan 1, 1970, GMT. In Java, you can get that number from the Date class's getTime() method.
  • oauth_token: This parameter is the "authorized request token" returned to you from Google when your user granted permission for your service to access their data via OAuth. Google maps the token to a particular user, so you don't need the Google user name or equivalent.
  • oauth_version=1.0: (Optional)
  • oauth_signature_method: The encryption algorithm you're using for the digital signing.

Once you've assembled and URL-encoded the base string, you digitally sign it using the private key I mentioned earlier. (I'll show you the code that does that next month.) Then, you append one more argument to the end of the query:

    &oauth_signature="kH%2BjQd%2Ba8...odMeUnsU%2FxANOw%3D"

where the argument is the URL-encoded digital-signature value.

Then, you send the request off to Google as a standard HTTP GET or POST. (I'll show you how to do that next month, too.)

Getting the Result

Up to this point, everything is pretty straightforward and reasonable. Now things get ugly.

The results of your request (a list of Calendar entries, for example), come back either as an XML Atom feed or in JSON format.(You specify which one in the request).

Using Atom is the first odd, and I'd argue incorrect, design decision that Google made. The whole point of XML is to tag data semantically. For example, proper XML for a Calendar might look something like this:

<calendarList>
    <calendar>
        <event startTime="xxx", endTime="xxx", description="...">
            <repeat> ... </repeat>
        </event>
        <!-- ... -->
    </calendar>
    <!-- ... -->
<calendarList>

Sub-elements should be used when there's some sort of natural containment relationship and there are an unknown (or recursive) number of sub-elements. (There are an unknown number of Calendars in a Calendar list.) Use element attributes (x=y pairs) if there's a one-to-one mapping with the element. An event, for example, has only one start date, so the start date should be an attribute, not a sub-element. The <repeat> sub-element, above, is necessary because a given event could repeat an unknown number of times, with the repetition specified in different ways for different occurrences.

Google doesn't use semantically reasonable XML, though. In fact, they've perverted the nature of XML by pressing an entirely unrelated (at least to calendaring) XML scheme — Atom — into service to represent not only Calendars, but all other GData objects (like address-book entries and documents).

Atom, however, is an XML scheme that was built to represent blog entries. Its elements and attributes are designed solely for that purpose. A Calendar is, fundamentally, not a blog entry (a "feed"), it's a calendar. Using Atom to represent Calendars is akin to using your XML scheme for customer data or inventory control to represent calendars. Yeah, an inventory item might have date and description fields somewhere inside it, but that doesn't mean that an inventory item is a reasonable container for an "appointment." The notion that Atom is a "general purpose" XML scheme that you can use to represent anything is nonsensical; it indicates a fundamental misunderstanding of what XML is.

There are two significant consequences to Google's choice of Atom:

  1. There's a lot of unnecessary complexity. You have to deal with the semantics of Atom, even when those semantics have no application to things like calendars. That unnecessary complexity translates to unnecessary work in writing the code, unnecessary work at run time, and an unnecessary waste of bandwidth when transmitting the data. For example, here is a (simplified) example of the XML that Google returns to represent a single calendar option.
        <feed xmlns="http://www.w3.org/2005/Atom"
            xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/"
            xmlns:gCal="http://schemas.google.com/gCal/2005"
            xmlns:gd="http://schemas.google.com/g/2005"
            gd:etag="W/&quot;Ck8FQ3Y4cCp7I2A9WxVVEkU.&quot;">
          <id>http://www.google.com/calendar/feeds/default/settings</id>
          <updated>2009-03-05T10:46:25.244Z</updated>
          <title type="text">Coach's's personal settings</title>
          <link rel="alternate" type="text/html" href="https://www.google.com/calendar/feeds/render"/>
          <link rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml" href="https://www.google.com/calendar/feeds/default/settings"/>
          <link rel="self" type="application/atom+xml" href="https://www.google.com/calendar/feeds/default/settings"/>
          <author>
            <name>Coach</name>
            <email>user@gmail.com</email>
          </author>
          <generator version="1.0" uri="http://www.google.com/calendar">Google Calendar</generator>
          <openSearch:startIndex>1</openSearch:startIndex>
    
          <!-- ... -->
    
          <entry>
            <id>http://www.google.com/calendar/feeds/default/settings/displayAllTimezones</id>
            <updated>2009-03-05T10:46:25.245Z</updated>
            <link rel="self" type="application/atom+xml" href="https://www.google.com/calendar/feeds/default/settings/displayAllTimezones"/>
            <gCal:settingsProperty name="displayAllTimezones" value="false"/>
          </entry>
    
          <!-- ... -->
        </feed>
    

    Most of this junk is required by Atom, and is noise in the context of Calendar options. If we fix the XML to represent only that information that's relevant, the foregoing XML could be replaced by the following XML:
        <globalCalendarSettings displayAllTimeZones="false" />
    

    Multiply the actual XML by 100 and you begin to understand the scope of the problem.
  2. Because you lose the semantic tagging, the real data is lost in the cloud of unnecessary mark up, and is difficult to both parse and use. Without those semantics you can't use XQuery to extract information from a long GData return value, for example.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video