Channels ▼
RSS

.NET

More Information-Rich Programming with F#


Consuming an Internet Scale Data Source with a Custom Type Provider

Thus far, I've provided examples of the type providers included in F# 3.0. It is also possible to develop your own custom type provider. A few months ago, the Visual Studio F# team added the source code of the Freebase type provider sample to the F# 3.0 Sample Pack. The Freebase type provider is an excellent example of the situations in which a custom type provider makes it easy to consume Internet scale data sources. In fact, the Visual Studio F# team has been using the Freebase type provider sample in most of their presentations on type providers benefits.

If you visit www.freebase.com, you will find a simple definition for Freebase as an entity graph of people, places, and things built by a community that loves open data. Freebase provides a huge and constantly growing information space with structured data on over 23 million topics. It is possible to query Freebase's data using Metaweb Query Language (MQL), which uses JSON objects as queries through standard HTTP requests and responses. For example, if you enter the following URL in a browser:

https://www.googleapis.com/freebase/v1/mqlread?query={"type":"/music/artist","name":"Queen","album":[]}

the results will provide all the music albums for Queen (see Figure 9). It is possible to parse the results with a JSON parser but the Freebase type provider can do that for you.

[Click image to view at full size]
Figure 9: Partial results of a query to Freebase displayed in the Web browser.

It is easier to read the previously shown query in the following lines with indentation:

https://www.googleapis.com/freebase/v1/mqlread?query=
{
    "type":"/music/artist",
    "name":"Queen",
    "album":[]
}

Because album is not specified, that is the value that the query will retrieve from Freebase. Note that it would be very difficult to generate and maintain code with the entire schema for Freebase in order to support IntelliSense and make it easier for developers to consume the data. Just imagine how long it would take for a code-generation tool to complete the entire schema for such a huge structure. However, the sample Freebase type provider allows you to start reading the database with just a few lines of code, as I did with the ODataService type provider. This type provider injects the necessary types on demand as you navigate through the Freebase data structure. As such, it is an excellent example of how to ease navigation through huge Internet data sources.The type provider has some limitations and many bugs, but it is stable enough to allow you to consume Freebase without major issues.

First, it is necessary to download and build the source code for the Freebase type provider. You can download the latest version from http://fsharp3sample.codeplex.com/SourceControl/list/changesets. You must click on Download (not on Downloads), see Figure 10. At the time I was writing this article, the file name for the latest change-set was fsharp3sample-18045.zip. You can also get the Freebase type provider from http://fsharp.github.com/fsharpx/.

[Click image to view at full size]
Figure 10: The Download link that allows you to download the latest change-set. (Don't click on the Downloads button because you won't find any files there.)

Next, you need to unzip the downloaded file and build the Samples.DataStore.Freebase.sln solution found within the SampleProviders folder. If you want to use the built Freebase type provider in an F# project, you must add the following references:

  • FSharp.Data.TypeProviders in Assemblies | Extensions.
  • Samples.DataStore.Freebase.dll. You have to select Browse and then click on the Browse… button in order to select the DLL in the Reference Manager.

If you prefer the F# script and you want to work in the FSI window, the following lines will have the same effect. (Take into account that you will need to execute them if you want to test code in either the FSI window or any F# script related to the Freebase type provider. Obviously, you will need to replace C:\Gaston\fsharp3sample-18045… with your path to the Samples.DataStore.Freebase.dll.)

#r "FSharp.Data.TypeProviders" #r @"C:\Gaston\fsharp3sample-18045\SampleProviders\Samples.DataStore.Freebase\obj\Debug\net45\Samples.DataStore.Freebase.dll"

If you create an F# script and add the following two lines to the previously shown lines that add the necessary references to use the Freebase type provider, you will retrieve all the cheese names from Freebase (see Figure 11). You will notice that dsafdafda is definitely not a valid cheese name. Thus, you must be careful because not all the data retrieve is valid.

  let data = Samples.DataStore.Freebase.FreebaseData.GetDataContext()
  let cheeseNames = data.Commons.``Food & Drink``.Cheeses |> Seq.toList
[Click image to view at full size]
Figure 11: Some of the results of the cheese names retrieved with just a few lines by running a script in the FSI window with the Freebase type provider.

In the code, the GetDataContext static method gets a simplified data context for the Freebase type provider. The type provider generates types and retrieves data on demand. Thus, when you enter "data," IntelliSense will display the available domain categories, such as Commons. If you enter the following:

  Let cheeseNames = data.Commons.F

IntelliSense displays all the domain categories for the Freebase database that contain 'F':

  • American football
  • Fashion, Clothing and Textiles
  • Film
  • Food & Drink

When you select a domain, IntelliSense retrieves the description. For example, when you select Food & Drink, you will see a detailed description that starts with the following sentence (Figure 12): The food domain is a collection of information about all kinds of food and drink.

[Click image to view at full size]
Figure 12: IntelliSense displaying the domain categories that contain 'F' and the description for the selected domain category.

This way, the Freebase type provider enables you to access the different domains without having to review any schema documentation. You learn the schema as you navigate through the different domain collections thanks to IntelliSense on demand.

The following lines show another example of the basic usage of the Freebase type provider to retrieve all the musical albums for Queen programatically.

module FreebaseExample = 
    open System
    open Samples.DataStore.Freebase

    let data = Samples.DataStore.Freebase.FreebaseData.GetDataContext()

    let musicalArtists = data.``Arts and Entertainment``.Music.``Musical Artists``

    let queen = 
        query { for artist in musicalArtists do
                where (artist.Name.ApproximatelyMatches "Queen")
                select artist
                headOrDefault }

    let queenAlbums = 
        query { for album in queen.Albums do
                sortBy album.Name
                select album.Name }
        |> Seq.toList

    let printAlbum (albumName: string) = Console.WriteLine(albumName)

    List.iter printAlbum queenAlbums

    let ReadAnyKey() = 
        Console.WriteLine("Press any key to continue")
        Console.ReadKey() |> ignore

    ReadAnyKey()

In this example, the code uses musicalArtists (data.``Arts and Entertainment``.Music.``Musical Artists``) as the data source for a query expression. Once you started writing the lines that define the query expression, the type inference mechanism knows that artist is of type FreebaseData.ServiceTypes.Music.Music.ArtistData. Thus, after you write "artist," IntelliSense will display the different fields available for the entity).

The code retrieves a musical artist that approximately matches "Queen" (queen) and then retrieves all the FreebaseData.ServiceTypes.Music.Music.AlbumData entities in the Albums property for queen, and sorts them by name. If you execute the code, you will see the title for all the musical albums related to Queen sorted by name on the console (Figure 13).

[Click image to view at full size]
Figure 13: The first results of the Queen musical albums sorted by name retrieved from the Freebase database.

The Freebase type provider allows you to reach to each individual value for an entity. For example, take a look at the following lines that retrieve the artist that partially matches Queen:

let queen = 
query { for artist in musicalArtists do
where (artist.Name.ApproximatelyMatches "Queen")
            select artist
            headOrDefault }

Because the Freebase type provider allows you to navigate though each data collection, you can access sample data sets of named individuals of many collections. For example, you can access a sample data set of named individuals of type Musical Artists in the Web data store by entering data.``Arts and Entertainment``.Music.``Musical Artists``.Individuals. Each time you select an artist, the Freebase type provider will load a long description (see Figure 14). This way, the previously shown query can be replaced with the following line:

  let queen = data.``Arts and Entertainment``.Music.``Musical Artists``.Individuals.Queen

[Click image to view at full size]
Figure 14: Exploring an individual musical artist with a tooltip that displays the long description.

Other Custom Type Providers

You can find many other useful type provider samples with their complete source code in http://fsharp3sample.codeplex.com/SourceControl/list/changesets (see the SampleProviders folder). You can also install them via http://fsharp.github.com/fsharpx/ or use NuGET by executing the following commands:

Install-Package FSharpx.Core
Install-Package FSharpx.Http
Install-Package FSharpx.Observable
Install-Package FSharpx.Compatibility.OCaml
Install-Package FSharpx.TypeProviders
Install-Package FSharpx.TypeProviders.
{Documents,Freebase,Graph,AppSettings,Excel,Math,Regex,Machine,Xaml}

The FSharpx project usually adds recent type provider samples, so it is extremely useful if you think you might need to build a custom type provider or you're interesting in diving deeper on this interesting feature. The project currently includes the following type providers within FSharpx.TypeProviders:

  • AppSettings: Injects setters and getters that work with application settings files.
  • Documents: Very interesting type provider that allows strongly typed access to JSON, XML, and CSV files.
  • Excel: Provides strongly typed access Excel worksheets.
  • Freebase: The one I've used in my previous examples to access Freebase Web database.
  • Graph: Offers type providers for graphs and state machines.
  • Machine: Provides strongly typed access to the file system and the Registry.
  • Math: Very interesting type provider for vector data structures.
  • Regex: Provides strongly typed access to regular expressions.
  • Xaml: Provides strongly typed access to XAML files.

In this article, I've demonstrated the different options for consuming WSDL, OData, and an Internet Scale Data Source with a custom type provider (Freebase). If you already have some experience working with the different data sources used in the examples, you will understand the productivity boost offered by type providers. Once you start taking advantage of F# 3.0 type providers, it is really difficult to go back to the code-generation tools. If you're making the move to F# to work with different data sources using a functional approach, I encourage you to take a look at the additional type providers I've mentioned. In some cases, it is extremely useful to customize the source code for an existing type provider in order to solve your own specific requirements. With the ease of access delivered by type providers, you can focus instead on your algorithms.


Gaston Hillar specializes in using Microsoft developer technologies. He writes frequently for Dr. Dobb's.

Related Article

Information-Rich Programming With F# 3.0


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video