Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Social Processes and Heartbleed, Part 1

April 16, 2014

Well, it finally happened: We have seen a severe, widely publicized buffer-overrun bug, with wide-reaching effects, that will be difficult, slow, and expensive to fix. I think it is safe to say that all over the world, software managers are saying to developers: "Why didn't you warn me?" and many of those developers are answering: "We did; why didn't you listen?"

There is a constant tension in software development between doing it right and doing it quickly, and this tension is particularly visible in the area of security. So for the next few weeks, I am going to explore this tension. Two aspects of this tension are particularly important:

  • It doesn't matter how good a solution is to a problem if no one uses it.
  • It doesn't matter if someone fails to use a good solution for a good reason or for a bad one.

Let's begin with a seemingly trivial example: the gets function in the C standard library. This function takes a pointer as its argument; it reads characters from the standard input stream into consecutive memory addresses starting where the pointer points. It stops reading when it encounters end of file or a newline character, whichever comes first. If it encounters a newline character, it does not place that character in memory. When it is done, it appends a null character. As a result, what the programmer sees is whatever came from the standard input as a null-terminated C string, minus the newline character, if any, that ends the input line.

This function is both limited and convenient. C programmers often use it to read from the standard input in situations such as:

 
     char input[100];
     printf("Yes or no?\n");
     gets(input);
     /* and so on… */

For at least 30 years, many members of the C programming community have known that gets is unsafe and cannot be made safe. The reason, of course, is that its (only) parameter is a pointer to memory in which it is to place its result, and there is no way for gets to find out how much memory is available for its use. As a result, if there are enough characters in the standard input before the next newline, gets is guaranteed to overwrite the memory that it was given; no action on the programmer's part can prevent this overwriting.

Partly because of gets' lack of safety, it has a companion named fgets. This function takes three arguments: a pointer to memory into which to store data, an integer that gives the size of that memory, and a stream from which to read. One might think, therefore, that the previous code fragment could be made safe by rewriting it this way:

 
     char input[100];
     printf("Yes or no?\n");
     fgets(input, 100, stdin);
     /* and so on… */
 

Unfortunately, there is one more difference between gets and fgets: If fgets stops by reaching a newline character, it includes that newline character as part of the input, whereas gets excludes the newline. Therefore, this rewrite doesn't work: In order to achieve the same result, it is necessary to delete the newline if it is there. One might imagine doing so this way:

 
      /* This code doesn't work! */
     char input[100];
     printf("Yes or no?\n");
     fgets(input, 100, stdin);
     char *last = input + strlen(input) – 1;
     if (*last == '\n')
           *last = '\0';
     /* and so on… */
 

This code fails in an obscure edge case: If it is executed when the standard input stream has consumed all available characters but has not yet reached end of file, then fgets will effectively return a null string by making input[0] a null character. If that happens, strlen(input) is zero, so last will point to the character immediately before input. The result of this code fragment would therefore be undefined; fixing the problem is left as an exercise for the reader.

Once upon a time, I worked in an organization in which one of the managers was security-conscious enough to demand that gets be removed from the local C library. The result of doing so was constantly having to rewrite code that we got from elsewhere. It is not hard to imagine the resulting conversations:

--You know that code you sent me? We had to rewrite part of it so that it didn't use gets.
--What do you have against gets?
--<long explanation>
--Oh, that's interesting.
--We'll be happy to send you the revised code if you like.
--Sure, go ahead — but I can tell you right now that we're not going to be able to do anything about it; we can change code like that only if a customer complains.

Despite its known insecurity, gets was part of the C89 and C99 standards. It was finally removed from the C2011 standard; but when I checked my local implementation, it was still there. Even more interesting to me is that to my knowledge, there is still no function in the C library that is a safe, convenient alternative to gets.

I'd like to invite the C developers reading this to start a discussion: Did you know that gets was unsafe before you read about it here? Does your shop have a policy on the use of gets? Have you ever rewritten code to avoid using it? Anything else you want to tell us? I’ll continue the discussion next week.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-bc51dc074fc355126d80914dd4f6d27d
2014-07-01T09:28:53

Thanks for sorting that Andrew, I appreciate going through the archive, they seem to stand the test of time


Permalink
AndrewBinstock
2014-04-23T00:16:48

OK, you should now be able to step through all of Andrew's blog posts.


Permalink
ubm_techweb_disqus_sso_-aeaacce9e439a71714d3aafaea2973da
2014-04-22T19:37:07

HI Andrew, thanks very much for your response. I am very interested in his 160 articles. So if I can get that it would be great.

Thanks.
Have a great day.


Permalink
AndrewBinstock
2014-04-22T19:32:45

You can find links to his most recent posts here: http://www.drdobbs.com/author/... It used to contain a link to older material. Let me get that fixed so that you can access those ealier, equally excellent, posts.


Permalink
ubm_techweb_disqus_sso_-aeaacce9e439a71714d3aafaea2973da
2014-04-22T19:26:46

Guys does anyone know how I can get/buy all the articles and blog posts from Andrew ??


Permalink
ubm_techweb_disqus_sso_-072e69455219b9957fa9d76f41b378a0
2014-04-17T02:24:45

It still amazes me that buffer overflows have been a well-known security vulnerability for decades now, but still show up in production code. Even in an embedded environment where I'm not too worried about malicious code, I always guard against buffer overflow - it matters not to me whether received data is invalid by malice or an external bug. As for gets() itself, Microsoft's Visual Studio has had a safer version of gets() for years: gets_s, which I believe is now in C11 - and even if not using a VS or C11, the implementation of such a function for internal use is trivial. I believe the VS compiler even complains about use of functions like gets(). However, even substituting gets() for gets_s() isn't fail-safe: it requires the user to ensure that the provided length parameter actually corresponds with the provided buffer pointer. The fact that C doesn't have a built-in string type should be enough by itself to drive developers to switch to C++, if only as a safer C. With C++, it's fairly simple to provide templatized wrappers for functions like this that overload on provided array size. Just switching to a C++ compiler doesn't prevent use of unsafe use of native pointers and arrays of course - I rely on regular peer reviews and static analysis tools for that. Oh, and from my understanding of the heartbleed bug (http://nakedsecurity.sophos.co..., it wasn't gets(), but memcpy() - although one can easily create a memcpy_s() in similar form to gets_s().


Permalink
ubm_techweb_disqus_sso_-b51f7a869a1175412e56da18be4810cb
2014-04-16T18:18:17

I have forbidden gets on my teams for years. It is easy enough to use fgets and yes, I've rewritten code that uses gets before.

There are lots of standards like MISRA and I would imagine they all forbid it as well although I haven't looked to see.


Permalink
franklinchen
2014-04-16T12:59:58

I was a C developer in my first job as a software developer. In my first days at work in 1993, I was walked through an overview of our internal C coding style and processes. Yes, we were not allowed to use gets. We had a "safe" wrapper library that also included replacements for other problematic standard library functions as strcpy.


Permalink


Video