Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

C++ Locales


Dr. Dobb's Journal August 1998: C++ Locales

Nathan developed the facilities found in chapter 22 of the Draft C++ Standard mainly so he would be able to write portable C++ programs without bothering about locales. He can be contacted at http://www.cantrip.org/.


To be usable, your program must use the right alphabet, menus and dates must be in the right language, and you must sort lists in the correct order. When user preferences match your own, this is easy. But software travels, and your software will have to adapt.

Where your program is running and how that relates to user needs is referred to as "locale." Keeping the locale separate from the program code is called "internationalization" -- something the Standard C++ library supports well.

Challenges

In supporting internationalization, the C++ Standard Library confronts many challenges. Your server programs may have many clients around the world, so the standard library must support more than one locale per program. You use (or will soon use) multithreading, so it must be reentrant. The ways your users' cultures and preferences differ are unlimited, so it must be extensible. Because you may not (yet) care about locales, it must be ignorable. Finally, because it is standard, it must be efficient, easy, and safe to use.

The Standard C++ library uses the full power of the language to fulfill these requirements, including new standard features not yet implemented in all compilers. (This article suggests how the C++ locale library may be used. It also shows how it might be implemented, so you can use the same techniques in your own programs.)

The locale Object

The Standard C (not C++) library's locale describes character encoding and the formats for a few value types: numbers, dates, money. These represent the edge of a vast continent of cultural and personal preferences. There are also time zones (including rules for summertime adjustments), measurements (feet, meters, cubits), paper sizes, window colors and fonts, citizenship, employer, e-mail addresses, sex, and shoe size. Of course, the C++ Standard cannot represent everything. Instead, it only implements (as examples) those categories found in the traditional C locale libraries.

The key to understanding the C++ locale is the facet -- a class interface for a service that may be obtained from a locale object. For example, the Standard C++ library facet num_put<> formats numeric values, while the collate<> facet provides ordering for string values. A facet is also an object contained in a locale. Each locale object contains a set of facets to provide these services.

The C++ locale is a simple object that can be passed around, copied, and assigned efficiently, just like any built-in value. If you don't care about the details of how to use it, you can pass it on to someone else who does. For example, functions that take a locale argument can declare a default argument value, locale(), which is a copy of the current global locale. Each iostream keeps a locale object on hand for use by the operators >> and <<, respectively. These measures give the locale facilities a low profile, so it won't intrude where its more-powerful features are not needed.

A Date Class Example

Imagine you have a simple Date class, as in Listing One. (Don't let the std:: namespace notation throw you; all the library components of Standard C++ are in the namespace "std." The standard headers iosfwd and ctime declare the standard names used in the example, and the "::" notation gives access to those names.)

This Date class provides month names, a constructor, stream operators, and a member asCLibTime that converts to a Standard C library struct tm (look up strftime() in your C manual) to help communicate with other libraries.

The formats users expect to see for dates vary. If you coded a format right into operators >> and <<, many users would be dissatisfied. Instead, you can delegate to the locale object kept by the stream, as shown in Listing Three (which will be explained shortly).

An Example Program

How can users control the format of dates produced by an operator<< that uses locales? Listing Four may be the simplest example possible.

The constructor call locale("") constructs a locale object representing the user's preferred formats. On many systems, the empty string tells the library to substitute whatever is in an environment variable (often LANG or LC_ALL). The common name for the American locale, for example, is "en_US." Thus, you (as a user) can choose the output format by setting LC_ALL before running the program (on POSIX systems locale -a lists the names of supported locales).

The call to cout.imbue() installs the newly constructed locale in cout for use by the various << operators. The next line uses the definition from Listing Three, which in turn uses the time_put<> facet of the newly imbued locale.

Using Facets

To use a facet of a locale, you call the Standard C++ library global function template use_facet<>(). Listing Two shows the declaration of use_facet<>(), which is found in the standard header locale.

For a facet class Stats with an int member shoesize(), for example, and a locale object named loc, a call would be int ss = use_facet<Stats>(loc).shoesize();. This syntax for calling a function template, supplying the template parameter explicitly, is not yet implemented everywhere. It's called "explicit template function qualification," and it resembles the syntax for new cast expressions, such as dynamic_cast<>. In effect, use_facet<>() is like a safe cast. In the aforementioned example, the resulting reference is used immediately to call the member function Stats::shoesize().

An Example operator<<

Listing Three shows what the implementation of operator<< for Date might look like, using the real Standard C++ library facet time_put<>.

A lot is going on here. First, the header files ctime, ostream, and locale are the new standard headers, and date.h has the declarations from Listing One. The line that begins with using lets you leave off the std:: in front of standard names later in the function. The constructor for the local variable cerberus prepares the ostream for output; in a multithreaded environment, it might lock the stream. The local variable tmbuf gets filled in with the components of the date argument.

The interesting part is in the next two lines: os.getloc() obtains the locale object kept by the ostream argument os. The call to use_facet<>() gets a reference to the facet time_put<char> of that locale. The line put(...) calls the facet member time_put<char>::put, which actually writes the characters out to the stream os.

Finally, the local variable cerberus is destroyed (perhaps unlocking the stream) right before the stream os is returned. The header "date.h" didn't mention locales, but because of this code hidden in operator>>, a couple of lines in main() let you format dates appropriately for users anywhere in the world. (Without those lines in main(), you get the default "C" locale behavior.)

Your Own Facet

The standard facets are designed so you can derive from them to get finer control of locale behavior. However, this derivation is not the only way you can extend a locale. You can make your own facet, and construct a locale to hold it.

What makes the class Stats in Listing Five a facet? It's derived from locale::facet, it has a public static member named id of type locale::id, and its member functions are const. That's all. It does not need a default constructor, copy constructor, or assignment operator, though it must be destroyable.

A facet class instance is only useful as part of a locale. Listing Six shows one way to make a facet instance part of a locale. The first line constructs a locale object here as a copy of the current global locale, with the addition of the newly created Stats facet. (In a real program, you would probably read the argument from a file.) This uses a template constructor that deduces the facet type from the pointer argument. (Support for template constructors, as for other member templates, is a recent addition to the language and is not yet implemented in all compilers.) The second line demonstrates its use, as in the earlier example. The locale library takes ownership of the facet object, so you never need to delete it, and it can't leak.

Under the Hood

How does this work? This can all be implemented in ordinary C++, and you can do the same when you need a container indexed by type. First, the locale object itself is efficient to copy and assign because it really contains only a pointer to a vector of facets, as in Listing Seven. (Only the members used in examples here are listed.) All copies share the same vector. When a locale is copied, the reference count gets incremented.

The facet base class, locale::facet (Listing Eight), is also reference counted. It has a virtual destructor so that when the count goes to zero, the locale can destroy derived class instances safely. The only tricky bit (besides getting the reference-counting code correct) is in the class locale::id (Listing Nine).

Recall that each facet type contains a static member of type locale::id. Thus, there is one static instance per facet type. The member index_ is set to zero by the loader, and remains zero until it is set to something else, regardless of when static constructors are executed.

When does the index_ member get set? Listing Ten shows the definition of the locale template constructor used in Listing Six. The constructor begins by copying the implementation vector from other and fixing up all the reference counts. Then it sets Facet::id.index_ to assign the facet an identity if it has none yet, and (if necessary) grows the new vector to fit. Finally, it installs the new facet, still being careful to keep the reference counts right. Thus, the id::index_ member is zero until it is actually used.

This template constructor can be instantiated only if the Facet parameter really qualifies as a facet in every way; otherwise, users get a compile or link error. This code, as in the other constructors shown, is not threadsafe; a threadsafe implementation would be messier, though it would do the same things.

The Function Template use_facet<>()

The template use_facet<>(), declared in Listing Two and called in several examples, is defined in Listing Eleven. If the facet has not yet been assigned an identity, or if no instance of it (or anything derived from it) is found in the argument locale, use_facet<>() throws an exception. (The test, here, is tricky: If index is bigger than the vector, or if the resulting pointer is zero, then the facet is not present; the pointer at offset zero is always zero.)

The definition of the locale constructor from a character pointer involves reading locale description files and constructing many different facets, and is beyond the scope of this article.

Conclusion

The standard facets only scratch the surface of what programs need to know about users. The C++ Standard committee is closing up shop; it is time for people like you, contributing to POSIX working groups and ad hoc Internet interest groups, to standardize bindings for what now clog the preferences menu of every interactive application. Perhaps the most pressing need is for a standard time-zone facet which can check the current version of the TZ database on the Internet (ftp://elsie.nci .nih.gov/pub/).

Acknowledgments

Thanks to Chris Lopez and John Gilson for reviewing this article.

DDJ

Listing One

#include <iosfwd> // istream, ostream#include <ctime>  // struct tm
namespace ChronLib {
class Date {
  long day;  // days since 1752-09-14
public:
  enum Month { jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec };
  Date(int year, Month month, int day);
  void asCLibTime(struct tm*) const;  
};
std::ostream&
operator<< (std::ostream& os, Date const& date);
std::istream&
operator>> (std::istream& is, Date& date);
}

Back to Article

Listing Two

namespace std {  template <class Facet>
    Facet const& 
    use_facet(locale const& loc);
};

Back to Article

Listing Three

// date_insert.C#include <ctime>   // struct tm
#include <ostream> // ostream
#include <locale>  // use_facet
#include "date.h" 
std::ostream&
ChronLib::operator<<(std::ostream& os, Date const& date)
{
  using namespace std;
  ostream::sentry cerberus(os);
  if (!cerberus) return os;
  struct tm tmbuf;
  date.asCLibTime(&tmbuf);
  use_facet< time_put<char> >(os.getloc())
    .put(os, os, os.fill(), &tmbuf, 'x');
  return os;
}

Back to Article

Listing Four

#include <iostream> // cout#include <locale>   // locale
#include "date.h" 


</p>
int main()  {
  using namespace std;
  using ChronLib::Date;
  cout.imbue(locale(""));
  cout << Date(1942, Date::dec, 7) << endl;
  return 0;
}

Back to Article

Listing Five

//stats.h#include <locale>
class Stats : public std::locale::facet {
 public:
  static std::locale::id id;
  Stats (int ss)       : shoeSize_(ss) {}
  int shoeSize() const { return shoeSize_; }
 private:
  Stats (Stats&);           // not defined:
  void operator=(Stats&);   // not defined:
  int shoesize_;
};
//stats.C
#include "stats.h"
std::locale::id Stats::id;

Back to Article

Listing Six

locale here(locale(), new Stats(48));int ss = use_facet<Stats>(here).shoesize();

Back to Article

Listing Seven

class locale { public:
  class facet;
  class id;
 ~locale() { if (imp_->refs_-- == 0) delete imp_; }
  locale() : imp_(__global_imp) { ++imp_->refs; }
  explicit locale(char const* name);
  locale(locale const& other)
    : imp_(other.imp_) { ++imp_->refs_; }
  locale& operator=(locale const& l);
  template <class Facet>
    locale(locale const& other, Facet* f);
  // other constructors
  template <class Facet>
    friend Facet const& use_facet(locale const&);
private:
  struct imp {
    size_t refs_; // ref-counter
    vector<facet*> facets_;
    ~imp();
    imp(imp const&);
  };
  imp* imp_;
};

Back to Article

Listing Eight

class locale::facet {  friend class locale;
  friend class locale::imp;
  size_t refs_;    //initially 0 = One reference
 protected:
  explicit facet(int refs = 0);
  virtual ~facet();
};

Back to Article

Listing Nine

class locale::id {  friend class locale;
  size_t index_;
  static size_t mark_;
};

Back to Article

Listing Ten

template <class Facet>locale::locale(locale const& other, Facet* f) {
  imp_ = new imp(*other.imp_);
  imp_->refs_ = 0;  // one reference
  size_t& index = Facet::id.index_;
  if (!index)
     index = ++Facet::id.mark_;
  if (index >= imp_->facets_.size())
    imp_->facets_.resize(index+1);
  ++f->facet::refs_;
  facet*& fpr = imp_->facets_[index];
  if (fpr) --fpr->refs_;
  fpr = f;
}

Back to Article

Listing Eleven

template <class Facet>  inline Facet const& use_facet(locale const& loc)
{
  size_t index = Facet::id.index_;
  locale::facet* fp;
  if (index >= loc.imp_->facets_.size() ||
      (fp = loc.imp_->facets_[index]) == 0)
    throw bad_cast();
  return static_cast<Facet const&>(*fp);
}PREPROCESSING

Back to Article


Copyright © 1998, Dr. Dobb's Journal

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.