Google's Summer of Code: Part I

By the DDJ staff, December 01, 2005

Google's Summer of Code resulted in thousands of lines of code. Here are some of the students who participated.

December, 2005: Google's Summer of Code: Part I

Apache Axis2 JMX Front
CL-GODB: A Common Lisp GO Database Manipulation Library
Wide Character Support in NetBSD Curses Library
gjournal: FreeBSD GEOM Journaling Layer

Google's Summer of Code was a unique and exciting program in which student programmers were provided stipends for creating new open-source projects or helping established ones. Over the summer of 2005, Google funded more than 400 projects to the tune of $5000 each, with $4500 going to the student and $500 to a mentoring organization. DDJ will be profiling some of the student participants over the coming months. Google's open-source programs manager Chris DiBona and engineering manager Greg Stein led the Summer of Code project. DDJ recently talked to DiBona about the program.

DDJ: What was the original goal of the Summer of Code?

CD: The original impetus behind the program was to ensure that budding computer scientists wouldn't let their programming skills diminish over the summer while working in a noncomputer-related job. We thought that if we could make it possible for these students to work with the open-source community then they would be exposed to a whole new, very real, class of problem. This would then lead to more open-source software developers, programs, and better developers overall.

DDJ: Did the final results meet your expectations?

CD: From the very beginning the students far exceeded Google's and my personal expectations. The quality of the applications alone caused us to double the number of accepted students from 200 to 419, and I think that easily a thousand of them proposed acceptable applications. Now that the program is over, the early results are pretty terrific, showing around 80 percent of the students having succeeded to execute on their projects to their mentors' satisfaction.

DDJ: What was the biggest surprise coming out of SoC?

CD: Just how advanced some of the projects ended up being. I remember thinking when I saw some of the projects that there was no way someone new to a project could pull them off. One, a CIL back end for GCC, which allows for the creation of CLR code from any GCC front-end language, should be preposterously difficult to do, but the student not only completed it, but did it in such a way that amazed his mentor, Miguel De Icaza.

DDJ: What was the geographic distribution of participants?

CD: We had 419 students taking part in the program from 49 countries. In the U.S. alone, we had students from 38 states.

DDJ: Are they representative of open source as they are today, or are they signs of things to come?

CD: I think that they are a little bit of the present and a big part of the future. Open source can be a little intimidating for the newcomer, and I think the Summer of Code helped to mix things up a bit and keep things fresh. Happily, a good number of the students have indicated that they intend to continue working on their open-source projects.

DDJ: Where does SoC go from here?

CD: Into the Fall, of course! We're going to examine the feedback and make sure that the program was successful; if so, we may do another one next year.

DDJ

Apache Axis2 JMX Front

Name: Chathura C. Ekanayake
Contact: [email protected]
School: University of Moratuwa, Sri Lanka
Major: Computer Science and Engineering
Project: Axis2 JMX Front
Project Page: http://wiki.apache.org/ws/SummerOfCode/2005/JMXFront/
Mentors: Deepal Jayasinghe and Srinath Perera
Mentoring Organization: Apache Software Foundation (http://www.apache.org/)

Apache Axis2 is a highly extensible Java-based web-service engine. Its extensibility comes mainly from the handler chain-based architecture. Axis2 allows configuring these handlers and other feathers mainly using XML files. There was no proper way to configure these settings while Axis2 was running in servers. The goal of Axis2 JMX Front is to provide a JMX management interface for monitoring and configuring Axis2 at runtime.

Axis2 JMX Front consists of a management class (MBean) named Axis2Manager, which provides access to all configurable modules. It handles everything regarding configuring various modules and provides a simple interface. This MBean has the functionality to configure settings of handlers, transport protocol handlers, and deployed services. For example, administrators can use this interface to turn off selected operations from web services, after they are deployed. This MBean is registered in an MBeanServer and published in a JMXConnectorServer. Remote management applications (JConsole, JManage, and so on) can access this MBean using RMI and call any function it provides. Therefore, administrators of Axis2 can log on to this interface to monitor and configure the system while it is running in servers. They can also manage different Axis2 engines running in different servers as a collection (cluster) using this interface.

Axis2 JMX Front can be extended seamlessly with additional management functionality. Developers can add functions to the existing MBean or create separate MBeans without altering the rest of the code. After implementing a class with the required management functionality, they can call the methods of the JMXManager class to register and publish objects of those classes as MBeans. Example 1 illustrates the use of JMXManager for registering a normal Java object named myObject as an MBean.

Axis2JMX Front uses the Apache commons.modeler package for registering MBeans. Therefore, MBean developers are not required to provide a separate interface for their management objects. JMX Front loads all the JMX-specific classes at runtime to make the Axis2 build independent of JMX libraries. It also provides a separate class named JMXAdmin to handle all JMX-related features. Axis2 engine can load this class at runtime to JMX-enable the system. This allows Axis2 JMX Front to be deployed as an optional package, which can be integrated to Axis2 at deploy time.

// Create Object
String myObjectName = "Axis2:type=management.MyObject";
MyObject myObject = new MyObject();

// Register myObject using JMXManager
JMXManager jmxManager = JMXManager.getJMXManager();
jmxManager.registerMBean(myObject, myObjectName);

Example 1: Using JMXManager.

CL-GODB: A Common Lisp GO Database Manipulation Library

Name: Samantha Kleinberg
Contact: [email protected]
School: New York University
Major: Physics and Computer Science
Project: CL-GODB
Project Page: http://common-lisp.net/project/cl-godb/
Mentor: Marco Antoniotti
Mentoring Organization: LispNYC (http://www.lispnyc.org/)

CL-GODB is a new interface to the GO Database written in Common Lisp. The Gene Ontology (GO) is a collection of terms organized in a taxonomy representing a controlled vocabulary used to describe genes, gene products, their functions, and the processes they are involved in for a variety of organisms. The GO Database (GODB) represents the ontological information and gene product annotations in a convenient relational database format (the GO database uses MySQL).

Until now, there have been no interfaces to the database that use Common Lisp. This is inconvenient as there are Bioinformatics and Systems Biology tools that employ the language (BioLingua, GOALIE, and the BioCYC suite, for instance).

GOALIE, developed by Marco Antoniotti and Bud Mishra in NYU's Bioinformatics Group analyzes time course data from micro-array clustering experiments. The CL-GODB library will be integrated into GOALIE, improving the tool's functionality and efficiency.

The library works by building an incremental, as-needed, internal image of the GO database contents in core. This improves the speed of queries and facilitates the construction of more complex predicates that may be needed in an application such as GOALIE.

Users start by creating a handle that identifies their session and is linked to several hash indexes used in the in-core caching. Once they have connected to their copy of the GO database, they have access to a variety of built-in SQL queries, which take advantage of the indexing and add to the stored data. The queries range from getting basic information about a term, to finding a term's lineage using a choice of hierarchies.

As a testbed for the CL-GODB library, we built a GUI application that is available as a standalone executable. The CL-GODB Viewer lets users browse the hierarchy with a graphical tree view and provides information about each term and its associated genes, in a manner similar to that of several other GO viewer applications available online.

Creating the CL-GODB was challenging at times, as it was my first project in Common Lisp. The biggest hurdle was making sure that case-sensitivity vagaries were taken care of, as Common Lisp and MySQL behave differently under Windows and UNIX. In the end, it did work and I learned more about the intricacies of SQL syntax than I ever wanted to know.

Figure 1: The CL-GODB user interface.

Wide Character Support in NetBSD Curses Library

Name: Ruibiao Qiu
Contact: [email protected]
School: Washington University
Major: Doctoral Candidate, Computer Science and Engineering
Project: Wide Character Support in Curses
Project Page: http://netbsd-soc.sourceforge.net/projects/wcurses/
Mentors: Julian Coleman and Brett Lymn
Mentoring Organization: The NetBSD Project (http://www.netbsd.org/)

The current NetBSD curses library doesn't support wide characters, which limits the use of NetBSD in countries with wide-character locales. The "Wide Character Support in curses" project adds wide-character support to the NetBSD curses library, complying with the X/Open Curses Reference to provide internationalization and localization.

The difficulty of adding wide-character support to NetBSD curses lies in its internal character storage data structure and related functions, which assume an 8-bit character in each display cell. Adding wide-character support means adding a new character storage data structure to hold wide-character information. This structure holds not only the character but also the attributes, including any nonspacing characters associated with the display cell.

The internal character storage data structure adds two linked lists for foreground/background nonspacing characters and uses spare bits in the attribute field for the character width, which are required for multicolumn characters. There is one storage cell per column, but the width fields are set differently for a multicolumn character. For an m-column-wide character, the first cell holds the width of the character, and the other m-1 cells hold the position information in their width fields. This offset is negative, making it easy to detect a cell belonging to a multicolumn character.

To read a wide character from a keyboard, a distinction must be made between a function key sequence and a wide-character sequence. The keymap routines for narrow character input are used to detect function keys, and the stateful wide-character conversion routine mbrtowc() is used to assemble input bytes into a valid wide character.

Some existing narrow character routines have been modified to work with wide characters. The new storage data structure makes screen-refreshing code more complicated because the NetBSD curses library uses a hash function to determine if a screen needs to be refreshed. For wide-character support, the hash function must include the nonspacing characters as well to capture the changes in rendition. Another issue is when a character is added or deleted, a check must be made to detect if that character was part of a multicolumn character. All parts of the multicolumn character are removed in this case.

The modified curses library was tested with three wide-character locales—Simplified Chinese, Traditional Chinese, and Japanese. Test results show that twice the memory is generally required to support wide characters.

gjournal: FreeBSD GEOM Journaling Layer

Name: Ivan Voras
Contact: [email protected]
School: University of Zagreb
Major: Electrical Engineering and Computing
Project: gjournal
Project Page: http://wikitest.freebsd.org/moin.cgi/gjournal/
Mentors: Pawel Jakub Dawidek and Poul-Henning Kamp
Mentoring Organization: The FreeBSD Project (http://www.freebsd.org/)

The aim of the gjournal project is to create a data journaling layer for FreeBSD's GEOM storage device layer. The idea of gjournal was born from the observation that FreeBSD doesn't currently have a journaling filesystem, but in an early phase the specification was extended to include copy-on-write (COW) functionality.

The GEOM subsystem is a modern kernel-based framework that manages pretty much all aspects of usage and control of storage devices. It's based on the concept of classes. A GEOM class can be a source of data or it can implement data transformations in a completely transparent way. All classes can be arbitrary combined in a hierarchy in the form of a directed acyclic graph. Examples of existing GEOM classes are gmirror, which consumes two or more underlying class instances (called "geoms") and provides one that duplicates and distributes I/O requests to them (a RAID 1 layer); and geom_dev, which consumes all disk device geoms and creates entries in the /dev filesystem hierarchy for them.

The gjournal is implemented as a GEOM class that consumes two geoms and produces one. The first of the two consumed geoms is designated as a "data device" and the second as a "journal device." The basic idea is to transform write requests to the produced geom into sequential writes to the journal device. The class implements two kernel threads: A main worker thread to which I/O requests are delegated, and a helper thread used to asynchronously commit data from the journal to the data device.

In regular mode, the journal device is divided into two areas, one of which is used to record data until it's filled—at which point, it's scheduled for asynchronous commit. A timed callout is scheduled that periodically triggers the swap/commit process. Two journal formats are implemented—one optimized for speed that emphasizes sequentiality of writes to the journal device, and another that conserves space by keeping metadata for the journal in one place.

Unfortunately, the most used FreeBSD filesystem—the UFS—cannot be used with gjournal because this layer doesn't distinguish metadata (for example, information about deleted but still referenced files) and requires a fsck run to correct references. The COW facility is functional and can be used for experimentation with filesystems.

1 2 3 4 5 6 7 8 9 10 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Google's Summer of Code: Part I