Open Source

Loadable Modules & the Linux 2.6 Kernel

By Daniele Paolo Scarpazza, June 01, 2005

The Linux Kernel 2.6 introduces significant changes with respect to 2.4.

Daniele is a Ph.D. student at Politecnico di Milano (Italy), where he currently works on source-level software energy estimation. He can be contacted at scarpaz@ scarpaz.com.

The heart of Linux is the kernel, which is in charge of scheduling tasks, managing memory, performing device I/O operations, servicing system calls, and carrying out other critical tasks. Unlike other UNIX-like operating systems that implement a pure monolithic architecture, Linux lets you dynamically load/unload portions of the kernel (modules). This lets you provide support for new devices and add system features without recompiling or rebooting the kernel, then unload them when they are not needed anymore.

The possibility of loading/unloading modules is a key feature for driver programmers because it lets you test drivers during development without rebooting the kernel at every change, thus dramatically speeding up the test-and-debug process.

Kernel 2.6 introduces significant changes with respect to kernel 2.4: New features were added, existing ones removed, and some marked as deprecated, although they're still usable but with severe limitations. Consequently, modules written for kernel 2.4 don't work anymore, or work with grave restrictions. In this article, I examine these changes.

A Minimal Module

Listing One is the shortest possible implementation of a module. Adhering to this template lets you write code that can operate equally as a module or statically linked into the kernel, without modifications or #ifdefs.

The initialization and cleanup functions can have arbitrary names, and must be registered via the module_init() and module_exit() macros. The module_init(f) macro declares that function f must be called at module insertion time if the file is compiled as a module, or otherwise at boot time. Similarly, macro module_exit(f) indicates that f must be called at module removal time (or never, if built-in). The specifier __init is effective only when the file is compiled in the kernel, and indicates that the initialization function can be freed after boot. On the other hand, __exit marks functions that are useful only for module unloading and, therefore, can be completely ignored if the file is not compiled as a module.

Compiling Modules

The 2.6 kernel's build mechanism ("kbuild") has been deeply reengineered, affecting how external kernel modules are compiled. In 2.4, module developers manually called GCC, including command-line preprocessor symbol definitions (such as MODULE or __KERNEL__), specifying include directories and optimization options. This approach is no longer recommended because external modules should be built as if they were part of the official kernel. Consequently, kbuild automatically defines preprocessor symbols, optimization options, and include directories. The only required thing you do is create a one-line makefile:

obj-m := your_module.o

where your_module is the name of your module, whose source is in the file your_module.c. You then type a command line such as:

make -C /usr/src/linux-2.6.7 SUBDIRS= 'pwd' modules

The output provided by the build process is:

make -C /usr/src/linux-2.6.7 SUBDIRS= /root/your_dir modules
make[1]: Entering directory
'/usr/src/linux-2.6.7'
CC [M] /root/your_dir/your_module.o
Building modules, stage 2.
MODPOST
CC /root/your_dir/your_module.mod.o
LD [M] /root/your_dir/your_module.ko
make[1]: Leaving directory
'/usr/src/linux-2.6.7'

In the end, a new kernel module is available in your build directory under the name of your_module.ko (the .ko extension distinguishes "kernel objects" from conventional objects). With a more elaborate Makefile (such as Listing Two), you can avoid typing this command line.

Module Versioning

The 2.6 module loader implements strict version checking, relying on "version magic" strings ("vermagics"), which are included both in the kernel and in each module at build time. A vermagic, which could look like "2.6.5-1.358 686 REGPARM 4KSTACKS gcc-3.3," contains critical information (for example, an extended kernel version identifier, the target architecture, compilation options, and compiler version) and guarantees compatibility between the kernel and a module. The module loader compares the module's and kernel's vermagics character-for-character, and refuses to load the module if differences are detected. The strictness of this check complicates things, but was advocated after compatibility problems arose when loading modules compiled with different GCC versions with respect to the kernel.

When compiling modules for a running kernel that you may not want to recompile, when cross compiling for a deployment box that you do not want to reboot, or when preparing a module binary for a kernel provided with a given Linux distribution, your module's vermagic must exactly match your target kernel's vermagic. To do this, you must exactly duplicate the build environment during module compilation, to that present at kernel compilation time. This is done by:

Using the same configuration file as the kernel (since the configuration file used to compile the kernel is available in most cases under /boot, a cp /boot/config-'uname -r' /usr/src/linux-'uname -r'/.config command is enough in most cases).
Using the same kernel top-level Makefile (again, it should be available under /lib/modules/2.6.x/build; therefore, the command cp /lib/modules/'uname -r'/build/Makefile /usr/src/linux-'uname -r' should go).

Module Licensing

The Linux kernel is released under the GNU Public License (GPL), whose purpose is to grant users rights to copy, modify, and redistribute programs, and to ensure that those rights are preserved in derivative works:

6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licenser to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License.

The practical case of a kernel module depending on a second module (a common case in Linux) is not explicitly mentioned in the GPL, yet some interpretations of its underlying philosophy postulate that a proprietary module should not depend on a GPL-licensed one, because the latter would restrict the rights granted to the user by the former. Module writers advocating this interpretation can now enforce this policy with the EXPORT_SYMBOL_GPL() macro in place of EXPORT_SYMBOL(), thus exporting symbols that can be linked only by modules specifying a GPL-compatible license.

With this in mind, all module writers are asked to declare the license under which their module is released, via the macro MODULE_LICENSE(). Table 1 lists the licenses and respective indent strings currently supported by the kernel (all indent strings indicate free software except for the last one). Additionally, the indication of license makes it possible for users to verify that their system is free, the free development community can ignore bug reports including proprietary modules, and vendors can do likewise based on their own policies.

When no license is specified, a proprietary license is assumed. Modules with a proprietary license cause the following warning when loading:

your_module: module license
'Proprietary' taints kernel.

and force flags must be specified to have the module properly loaded.

The macro EXPORT_NO_SYMBOLS is deprecated and not needed anymore because a module exporting no symbols is the norm.

Parameter Passing

The old parameter passing mechanism, based on the MODULE_PARM() macro, is obsolete. Modules should define their parameters via a call to the macro module_param(), whose arguments are:

The name of the parameter (and associated variable).
Its type (chosen among byte, short, ushort, int, uint, long, ulong, charp, bool, and invbool, or a custom typename; for example, named xxx, for which helper functions param_get _xxx() and param_set_xxx() must be provided).
The permissions for the associated sysfs entry—0 indicates that the attribute is not to be exposed via sysfs.

Example 1 presents two example declarations.

Use Count

Module use counts protect against the removal of a module that is still in use. Modules designed for previous kernels called MOD_INC_USE_COUNT() and MOD_DEC_USE_COUNT() to manipulate their use count. Since these macros could lead to unsafe conditions, they are now deprecated. They should now be avoided, for example, by setting the owner field of the file_operations structure, or replaced with try_module_get()/module_put() calls. Alternatively, you can provide your own locking mechanism in a custom function, and set the module's can_unload pointer to it. The function should return 0 for "yes," and -EBUSY or a similar error number for "no."

If used, the deprecated MOD_INC_ USE_COUNT macro marks the current module as unsafe, thus making it impossible to unload (unless enabling the forced unload kernel option and using rmmod -force).

The 2.6 Device Model and /sys Filesystem

Kernel 2.6 introduces an "integrated device model"—a hierarchical representation of the system structure, originally intended to simplify power-management tasks. This model is exposed to user space through sysfs, a virtual filesystem (like /proc), usually mounted at /sys. By navigating sysfs, you can determine which devices make up the system, which power state they're in, what bus they're attached to, which driver they're associated to, and so on. sysfs is now the preferred and standardized way to expose kernel-space attributes; module writers should then avoid the soon-to-be obsolete procfs.

Figure 1 (available electronically; see "Resource Center," page 3) is a typical sysfs tree. The tree is conceptually similar to the view provided by the Windows "hardware manager." The first-level entries in /sys are:

block, which enumerates all the block devices, independently from the bus to which they are connected.
bus, which describes the structure of the system in terms of buses and connections.
class, which provides device localization based on device class (the mouse, for example) apart from its physical bus connection or device numbering.
devices, which enumerate all the devices composing the system.
firmware, which provides a facility for the dynamic management of firmware.
power, which provides the ability to control the system-wide power state.

Given the first-level classification, the same device can appear multiple times in the tree. Symbolic links are widely used to connect identical or related entities; for example, the block device hda is represented by a directory entry /sys/block/hda, which contains a link named "device" pointing to /sys/devices/pci0000:00/0000:00:07.1/ide0/0.0. The same block device also happens to be the first device connected to the IDE bus; thus, entry /sys/bus/ide/devices/0.0 points to the same location. Conversely, a link is provided pointing to the block device associated to a given device; for example, in /sys/devices/pci0000:00/0000:00:07.1/ide0/0.0, a link named "block" points to /sys/block/hda.

Exposing module attributes via sysfs requires a minimal understanding of the device model and of its underlying kobjects, ktypes, and ksets concepts. Understanding those concepts is easier in an object-oriented perspective because all are C-language structs that implement (with debated success) a rudimentary object-oriented framework. Table 2 is a mapping between OO and kobject concepts.

Each directory in sysfs corresponds to a kobject, and the attributes of a kobject appear in it as files. Reading and writing attributes corresponds to invoking a show or a store method on a kobject, with the attribute as an argument. A kobject is a variable of type struct kobject. It has a name, reference count, pointer to its parent, and ktype. C++ programmers know that methods are not defined on an object basis; instead, all the objects of a given class share the same methods. The same happens here, the idea of class being represented by a ktype. Each kobject is of exactly one ktype, and methods are defined for ktypes (usually functions to show and store attributes, plus a function to dispose of the kobject when its reference count reaches zero: a destructor, in OO terms). A kset corresponds to a generic linked list, such as a Standard C++ Library generic container. It contains kobjects and can be treated as a kobject itself. Additionally, handlers can be associated to events of kobjects entering or leaving a set, thus providing a clean way to implement hot-plug operations. The cleanness of the design of such a framework is still debated.

sysfs_example.c (available electronically; see "Resource Center," page 3), a complete example of a kernel module, shows how to create and register kobject variables to expose three attributes to user space—a string and two integers, the first read and written as a decimal number and the second as a hexadecimal one. Example 2 is an example of interaction with that module.

Removed Features

Some features have been removed from 2.4. For instance, the system call table is no longer exported. The system call table (declared as int sys_call_table[];) is a vector containing a pointer to the routine to be invoked to carry out that call for each system call. In 2.4 kernels, this table was visible to—and, more important, writable by—any module. Any module could easily replace the implementation of any system call with a custom version, within a matter of three lines of code. Apart from possible race conditions issues (on SMP systems, a system call could be replaced while in use by an application on another processor), this implied putting an incredible amount of power—the ultimate heart of the OS—in the hands of any external module. If you're not convinced about the relevance of this danger, look into how easy it was to write and inject malicious code in the form of modules that replace sys_call_table entries. Implementing rootkits is possible in no more than 30 lines of code (see http://www .insecure.org/).

Concern is not only related to malicious modules, but also to proprietary modules provided in binary form only for which it is hard to tell exactly what they may do. The issue was radically eradicated in kernel 2.6: The system call table can only be modified by code built in the kernel, whose source is therefore available.

DDJ

Listing One

#include <linux/module.h>
#include <linux/config.h>
#include <linux/init.h>

MODULE_LICENSE("GPL");
static int __init minimal_init(void) 
{
  return 0;  
}
static void __exit minimal_cleanup(void) 
{
}
module_init(minimal_init);
module_exit(minimal_cleanup);

Back to article

Listing Two

obj-m   := your_module.o
KDIR    := /usr/src/linux-$(shell uname -r)
PWD := $(shell pwd)

default:
    $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
install: default
    $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules_install
clean:
    rm -rf *.o *.ko .*.cmd *.mod.c .tmp_versions *~ core

Back to article

1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Open Source