Database

LDAP Search Filters

By Marcelo A.F. Calbucci, May 01, 2000

The Lightweight Directory Access Protocol is a transport mechanism for Directory Service transactions. Marcelo focuses on the search filter that's part of LDAP search functionality.

May00: LDAP Search Filters

A C++ class makes it easy

Marcelo is a developer on the Microsoft Exchange Server. He can be contacted at [email protected].

Many protocols run over the basic frame of the Internet, including HTTP/HTML, FTP, and SMTP. In this article, I'll focus on yet another protocol -- the Lightweight Directory Access Protocol (LDAP). LDAP is a transport mechanism for Directory Service transactions. Consequently, LDAP has a set of API functions to handle such operations as search, add, modify, delete, and the like. These APIs are available from various vendors.

LDAP has been examined in several DDJ articles, including "Examining Microsoft's LDAP API," by Sven B. Schreiber (December, 1998); "Understanding LDAP," by Basit Hussain (March, 1999); and "Examining PerLDAP," by Troy Neeriemer (April 1999). Here, I will focus on the search filter that is part of LDAP search functionality. This is the most relevant parameter when performing a directory search -- and searches are the most important operation you're going to do in a directory. In the process, I'll present CLdapFilter, a C++ class to handle LDAP search filters (available electronically; see "Resource Center," page 5).

The Search Filter

An LDAP search filter is a string containing one or more primitive conditions to retrieve objects from a directory. (For our purposes here, a "primitive condition" is one that tests a single type with a value.) Its syntax (see Table 1) defines a set of primitive conditions using and/or/not operations stored in a prefix notation. In other words, the LDAP search filter is related to directories in the same way that, say, SQL is to databases.

A simple query for all objects in a directory might be ldap_search_s(ld, "", LDAP_SCOPE_SUBTREE, "(objectClass=*)", NULL, 0, &result);. The filter here is "(objectClass=*)" -- which can be translated to "I want all entries where the objectClass attribute is present." In this case, it means all objects, because any object present in the directory has to have the objectClass attribute set.

Programming Issues

LDAP search functions use a simple string to specify the filter being used. If you want to have a filter that gets all entries that have givenName equal to "Homer" or "Marge," age equal to "32 or greater," and an e-mail address, your search filter might be "(&(|(givenName=Homer)(givenName= Marge))(age>=32)(mail=*))." Because programmers are used to infix notation, switching to this notation often leads to erroneous filters.

Another issue that arises involves brackets. A search filter that contains several subfilters and logical operations makes you spend time figuring out if the brackets are right. Can you find the error in the filter "(&(|(sn=Simpson)(age>=32))(!(| (givenName=Marge)(givenName=Lisa)))"? Give up? A close bracket is missing in the last position. And if you give this string as a parameter for ldap_search, the only thing you'll get as a reply is the error message LDAP_FILTER_ERROR. I've made this mistake when creating complex filters. For instance, in the middle of a large piece of code, you could waste time finding that the problem is in the ldap_search, and even more time finding out what is wrong with the filter. Of course, your compiler is not going to parse the string and indicate during compile time that you've got an error. You have to wait until run time to find out -- and you have to hope you find out before your customer does.

A Solution

I initially set out to build a class that could let me construct filters in an infix notation (using a C++ operator overloading mechanism and the only member variable). I naively thought wstring (STL basic_string for Unicode) would work. Then, when I overloaded operator&&, operator||, and operator!, my new class would put the brackets and operators in place. Therefore, instead of the previous filter, I could write:

CLdapFilter Filter;

Filter = (F("sn=Simpson'') || F("age>=32")) && !(F("givenName=Marge") || F("given Name=Lisa"))

The F is just a #define of CLdapFilter, so you can write shorter lines. If nothing else, the compiler will complain if you commit any error with brackets like this. This is better then a link-time or run-time error.

This approach is slower than using a simple string. But filter construction isn't a performance concern, at least when it comes to writing directory client programs. The time you spend building filters will be less than the time the directory server takes to reply. In fact, you can probably construct hundreds of filters in the time it takes your server to reply to a single search with a single entry as a result. Better yet, client programs can construct one filter and use it several times.

The class for search filters can help you when writing filters because it improves the readability of the code and saves misunderstanding brackets. But you still have a problem, particularly if users use a more complex filter when creating a CLdapFilter object, as in:

CLdapFilter Filter("(&(sn=Simpson)(given Name=Homer))");

There are two ways you can avoid making mistakes when building this CLdapFilter. The first is to prohibit users from creating filters that have one or more brackets. This implies you should only use primitive comparisons -- something that's not a good solution, since you can't reuse filters from strings, only from primitive comparisons or from another CLdapFilter. The second solution -- and the approach I take -- is to build a parser that analyzes the filter being provided as a string.

Final Version

With this in mind, my final filter design consists of a class that is not only a string, but an entire structure that holds a complete filter. In addition, for each element it inserts in this structure, the code parses the string to be sure it's a syntax-compliant filter.

As an aside, I used Unicode as my base because I think most of the actual implementation of LDAP APIs support Unicode as parameters. Also, the LDAP protocol is not Unicode, so your API is responsible for converting from Unicode to ASCII (UTF8).

The class that you talk to is CLdapFilter, which provides several useful interfaces to create, modify, and read your filter. Internally, however, I use a subclass called CFilterTree, which contains a set of subfilters and the operation you are performing on those subfilters. I also had to add another type of operation that is not an and/or/not; I call this an "item." The operation item means you are not handling subfilters, but a simple primitive comparison (as in "sn=Simpson"). To store this item, I added a wstring that is mutually exclusive with the subfilters: If the operation type is item, the information is stored in the string; if the operation type is not item, the information is stored in the subfilters. The subfilter is an STL's vector of CFilterTree. You can see the members variables in Listing One.

The first thing you should have noticed is that I'm storing an entire CFilterTree in my vector container, not a pointer to it. There are several reasons for this. First, to make the code more readable and easy to maintain, I like to avoid pointers. Second, while profiling both versions (pointer versus class), there was no performance penalty. This is primarily because filters are usually small, don't have hundreds of subfilters, and you perform more read operations than writes. Pointers, however, lead to performance degradation due to dereferencing it. Figure 1 illustrates CFilterTree.

My CLdapFilter class is made of one CFilterTree, which can be made up of several CFilterTree and/or items (primitive comparisons). Listing Two is a snapshot of CLdapFilter. This way of storing information is slower than storing the entire filter in a simple string, but is more flexible. The parser makes this class even slower, but has the benefit of never having an invalid filter.

There are two approaches to error notification when constructing an object of a class in C++. The first is to set an internal member variable (or some static one) with some error code, and pray that the programmer using the class will check this variable before using the object. The second is to throw an exception in the constructor.

I used both in my implementation, depending on whether you compile with the LDAPFILTER_SUPPORT_EXCEPTION defined. This is done in filter.h (available electronically). If you compile without this #define, you will have two more member variables that store the error code. You can then use GetLastError() or GetLastErrorString() to return text with the error message. I have a variable that stores the position in the string that the parser failed; it is retrieved using GetLastErrorPos(). You also have an additional function called ClearLastError().

If you compile with that #define, you won't have these member functions or variables. Instead, you are going to throw an exception of type CFilterException if the parser fails. The exception class has two main member functions: GetError(), which returns a string with the error message; and GetPos(), which returns the position in the string the error occurred.

If I need to set an error in the parser code, I call SetLastError(), which is responsible for setting the error variables in the class; otherwise, I throw an exception.

Operators

operator&& is not a member of the class, but a global function. It is necessary because after this operator, you don't want it to change the lhs or rhs values. You return a new CLdapFilter.

If you want to construct a CLdapFilter with the string:

"(&(objectClass=person)(sn=Simpson))"

you execute:

CLdapFilter Result = F("objectClass=person") && F("sn=Simpson");

First, the F("objectClass=person") constructs a temporary instance of CLdapFilter. The code parses the string; if it is okay, it sets the operation type of this class to OpTypeItem and sets the m_wsItem to the string passed to the constructor. The same thing for F("sn=Simpson").

After constructing both temporary CLdapFilter instances, you call the operator&& with these two objects. The first thing that operator&& does is check that the rhs and lhs values are valid. If the construction of one of these failed, you have a member variable saying what was the last error. In this case, you propagate the error to the result and finish.

I decided that if one of the values passed to the function is empty, the result is going to be the other one, or empty if both were empty. This sounds odd, but what do "X &&" or "&& Y" mean? I decided they mean "X" and "Y," respectively. So, if you are writing code and users decide to enter wsFilter1 but leave wsFilter2 empty, you still have a good filter in Result.

If none of the values passed to operator&& were empty, you have two real subfilters to be transformed into a single and operation. But, these subfilters could also be and operations. For example, the filter:

"(&(&(a=b)(c=d))(&(e=f)(g=h)))"

has the same semantic as:

"(&(a=b)(c=d)(e=f)(g=h))"

If the operation in one of the parameters is also an and, you do not add this parameter to the subfilter list. Instead, you add the subfilters of the parameter to the subfilter list of the result.

The mechanism is identical for operator||. But for operator!, you have a particularity. If you deny something that was already denied, you are reaffirming it. This is:

"(!(!(a=b)))" =="(a=b)"

So you put this logic into the operator!.

To improve the readability and maintainability, I implemented all other operators or member functions (that use wstring or wchar_t * as a parameter) simply by calling the operator that uses the CLdapFilter as a constructor.

Because LDAP filters don't support empty filters, it should be illegal to pass empty strings to the constructor or operator=. But I decided not to do this, because you can improve the programmability when you don't have to check every parameter that you pass to CLdapFilter class.

The Parser

After you call a constructor using wchar_t * or wstring (or use operator=), you call ParseString(). The entire parser subsystem is written in six functions; see Table 2.

First of all, LdapEscapable() is problematic. Depending on the implementation of the LDAP server you are running, other characters can also escape. So if you need to change something in CLdapFilter, this is the first function you should examine.

The ValidateAttribute*() functions are simple and not fully compliant with the LDAP standard. For example, I don't check if the type of the attribute is using only valid characters. I only check if it is not empty, and do not have spaces between characters. For the attribute value, it can be anything except empty.

ParseCondition() parses a simple primitive condition. It can be written without the brackets (as in "sn=Simpson") or with any number of brackets ("(((sn=Simpson)))"). The first is not an LDAP standard, but it is useful. This function also strips the type and value of the attribute and calls the validate functions.

Finally, you spend most of your time in FullParser(). This function uses a simple mechanism to break full conditions into small ones, and calls itself recursively until it finds out that the condition is a primitive condition. It then calls the ParseCondition().

The algorithm that parses an entire string is straightforward. You start reading each character in the string, ignoring spaces. If the character is an open bracket, you increment the brackets counter; if it is a close bracket, you decrement the brackets count. If the counter, after start, reaches zero, that means this should be the last character in the filter; anything following it is a syntax error. If it goes below zero, it means that you have a brackets mismatch, as in (sn=3)). At the end of parsing, if the brackets counter is not zero, you have more open than close brackets.

If the character being parsed is a &, |, or ! (and is in between two open brackets), you enter operation mode. This means that you have a complex filter, and should start parsing each following pair of open/close brackets as a subfilter. At this time, you call FullParser() recursively.

I do a lot of other checks in FullParser() to see whether the character sequence is valid. For example, after a close bracket, if there is any character other than a bracket, then we have a syntax error.

The last function of the parser mechanism of CLdapFilter is ParseString(). Its only purpose is to call FullParser(). This function could be dropped and you could call FullParser() directly, but I do a little check to see if the string is empty. This should never happen with FullParser(). Example 2 is a call stack of a ParseString() function.

I could have written all the parser functions as static members of CLdapFilter, but this is a problem with the error variables. A solution is to make the error variables also static, as some APIs do.

Conclusion

Is CLdapFilter thread safe? It depends. The real question is: Is the STL library you are using thread safe? If the answer is yes, then CLdapFilter is thread safe. Here, I use the classes with the Microsoft STL implementation that came with Visual C++ 6.0, which are not thread safe. Therefore, I cannot share CLdapFilter instances between threads.

One optimization that can be done in CLdapFilter is reference counting. Every time you use the copy constructor or the assignment operator, we copy all member variables. For a medium-size filter string, this equates to something between 150 and 300 bytes. If you copy this filter several times, a performance penalty can result. Reference counting is platform dependent, because you should implement a mechanism that makes it thread safe. In Windows, you can use the SDK API functions InterlockedIncrement() and InterlockedDecrement(). But remember, Microsoft's STL is not thread safe, so you still have some issues to resolve here.

Even though this class was originally written in C++, it is simple to transform it into a Java class or any other language that supports OOP and operator overloading. This class is also platform independent, so it should work with most of the operational systems and processors existent.

CLdapFilter is a nice class if you are working on an LDAP client program and need to construct several filters. However, there is little or no gain in using this class if you have just a few static filters in the code.

DDJ

Listing One

Struct CFilterTree {
   enum eOpType = {OpTypeNull, OpTypeItem, OpTypeAnd, OpTypeOr, OpTypeNot};
   // . . . member functions
   eOpType m_eOpType;
   vector<CFilterTree> m_SubFilters;
   wstring m_wsItem;
};

Back to Article

Listing Two

Class CLdapFilter {
public:
   // ...
protected:
   // ...
private:
  CFilterTree m_FilterTree;
  wstring m_wsGeneratedString;
  bool m_fModified;
  // ... other members
};

Back to Article

1 2 3 4 5 6 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

LDAP Search Filters