Secure Coding in C and C++: C-Style Strings

NEWS AT SEI

Author

Robert C. Seacord

This library item is related to the following area(s) of work:

Security and Survivability

This article was originally published in News at SEI on: April 1, 2005

C++ creator Bjarne Stroustrup has commented, "C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off." So are programmers still embracing the C and C++ programming languages? The answer is a resounding "yes"—estimates are that C and C++ continue to have more than 3 million users. But using C and C++ creates challenges for programmers.

"The C programming language is a flexible, portable, high-level language that has been used extensively for more than 30 years but is the bane of the security community," says Robert C. Seacord, senior vulnerability analyst at the SEI's CERT Coordination Center (CERT/CC).

Seacord's latest work, Secure Coding in C and C++ (Addison-Wesley, 2005), is becoming well known in software development circles. Software Developer Magazine editor Rick Wayne recently wrote that "...Seacord dissects the way worms, viruses and other despicable fauna wreak their havoc on unwary programs, and shows how to stop problems at their source (which is the source, so to speak). String buffers are, sadly, still a rich vein for attackers to mine, but Seacord doesn't stop with the obvious, covering pointers, memory management and even unlikely-seeming avenues of attack like the humble integer. In each case, he covers "mitigation strategies" in depth, examining coding practices as well as tools, with plenty of source-code examples."

LinuxWorld Magazine also ran an interview with Seacord as one of its cover features in the November 2005 issue. In the interview, Seacord detailed some best practices for secure coding in C and C++ and addressed some Linux-specific security issues.

Commonly exploited software vulnerabilities are usually caused by avoidable software defects. Seacord's book systematically identifies the program errors most likely to lead to security breaches, shows how they can be exploited, reviews the potential consequences, and presents secure alternatives. Secure Coding in C and C++ presents hundreds of examples of secure code, insecure code, and exploits, implemented for Windows and Linux.

So what are the characteristics of C that make it prone to security flaws and what should developers be aware of? Seacord begins to address these questions with specific code examples in the following material that is adapted from his book. More information about Secure Coding in C and C++ is available on the SEI website.

—Richard Lynch, news@sei editor

The C language was created in the early 1970s as a system implementation language for the UNIX operating system. C was derived from the typeless language B [Johnson 73], which in turn was derived from BCPL [Richards 79]. BCPL was designed by Martin Richards in the 1960s and used during the early 1970s on several projects. B can be thought of as C without types or, more accurately, BCPL refined and compressed into 8K bytes of memory.

One goal of a high-level programming language is to provide portability. Portability was not a major goal at the inception of the C programming language but gradually became important as the language was ported to different platforms and eventually became standardized. Portability requires that logic be encoded at a level of abstraction independent of the underlying machine architecture and transformed or compiled into the underlying representation. Problems arise from an imprecise understanding of the semantics of these logical abstractions and how they translate into machine-level instructions. This lack of understanding leads to mismatched assumptions, security flaws, and vulnerabilities.

The C programming language is intended to be a lightweight language with a small footprint. This characteristic of C leads to vulnerabilities when programmers fail to implement required logic because they assume it is handled by C (but it is not). This problem is magnified when programmers are already familiar with superficially similar languages such as Java, Pascal, or Ada, leading them to believe that C protects the programmer better than it actually does. These false assumptions have led to programmers failing to prevent writing beyond the boundaries of an array, failing to catch integer overflows and truncations, and calling functions with the wrong number of arguments.

Another characteristic of C worth mentioning is the lack of type safety. In general, type safety implies that any operation on a particular type results in another value of that type. C was derived from two typeless languages and still shows many characteristics of a typeless or weakly typed language. For example, it is possible to use an explicit cast in C to convert from a pointer to one type to a pointer to a different type. If the resulting pointer is de-referenced, the results are undefined. Operations can legally act on signed and unsigned integers of differing lengths using implicit conversions and producing unrepresentable results. This lack of type safety leads to a wide range of security flaws and vulnerabilities.

An example of an unsafe type in C and C++ programming is C-style strings. Strings—such as command-line arguments, environment variables, and console input—are of special concern in secure programming because they comprise most of the data exchanged between an end user and a software system. Graphic and Web-based applications make extensive use of text input fields and, because of standards like XML, data exchanged between programs is increasingly in string form as well. As a result, weaknesses in string representation, string management, and string manipulation have led to a broad range of software vulnerabilities and exploits.

Strings are a fundamental concept in software engineering, but they are not a built-in type in C or C++. C-style strings consist of a contiguous sequence of characters terminated by and including the first null character. A pointer to a string points to its initial character. The length of a string is the number of bytes preceding the null character, and the value of a string is the sequence of the values of the contained characters, in order.

Common String Manipulation Errors

Programming with C-style strings in C or C++ is error prone. The most common errors leading to software vulnerabilities are unbounded string copies, null- termination errors, and string truncation.

Unbounded String Copies
Unbounded string copies occur when data is copied from an unbounded source to a fixed length character array (for example, when reading from standard input into a fixed length buffer). In the following C program listing, the program reads characters from standard input using the gets() function (on line 4) into a fixed length character array until a newline character is read or an end-of-file (EOF) condition is encountered.

1. void main(void) {
2. char Password[80];
3. puts("Enter 8 character password:");
4. gets(Password);
...
5. }

Reading data from unbounded sources creates an interesting problem for a programmer. Because it is not possible to know beforehand how many characters a user will supply, it is not possible to pre-allocate an array of sufficient length. A common solution is to statically allocate an array that is much larger than needed. In this example, the programmer is only expecting the user to enter 8 characters, so it is reasonable to assume that the 80-character length will not be exceeded. With friendly users, this approach works well. But with malicious users, a fixed-length character array can be easily exceeded.

It is also easy to make errors when copying and concatenating strings because the standard strcpy() and strcat() functions perform unbounded copy operations. In the following code listing, the command-line argument in argv[1] is copied into the fixed-length static array name (line 3).

1. int main(int argc, char *argv[]) {
2. char name[2048];
3. strcpy(name, argv[1]);
4. strcat(name, " = ");
5. strcat(name, argv[2]);
...
6. }

The static string " = " is concatenated after argv[1] in name (line 4). A second command-line argument (argv[2]) is concatenated after the static text (line 5). Can you tell which of these string copy and concatenation operations may write outside the bounds of the statically allocated character array? The answer, of course, is all of them.

A simple solution is to test the length of the input using strlen() and dynamically allocate the memory, as shown in the following code listing:

1. int main(int argc, char *argv[]) {
2. char *buff = (char *)malloc(strlen(argv[1])+1);
3. if (buff != NULL) {
4. strcpy(buff, argv[1]);
5. printf("argv[1] = %s.\n", buff);
6. }
7. else {
/* Couldn't get the memory - recover */
8. }
9. return 0;
10. }

The call to malloc() on line 2 ensures that sufficient space is allocated to hold the command line argument argv[1] and a trailing null byte. The strdup() function can also be used on Single UNIX Specification, Version 2 compliant systems. The strdup() function accepts a pointer to a string and returns a pointer a duplicate string. The strdup() function allocates memory for the duplicate string. This memory can be reclaimed by passing the return pointer to free().

Unbounded string copies are not limited to the C programming language. For example, if a user inputs more than 11 characters into the C++ program shown the following C++ code listing, it will result in an out-of-bounds write.

1. #include <iostream.h>
2. int main() {
3. char buf[12];
4. cin >> buf;
5. cout << "echo: " << buf << endl;
6. }

Null-Termination Errors

Another common problem with C-style strings is a failure to properly null terminate. In the following code listing, the character arrays a[], b[], and c[]are declared as fixed length character arrays.

1.int main(int argc, char* argv[]) {
2. char a[16];
3. char b[16];
4. char c[32];
5. strncpy(a, "0123456789abcdef", sizeof(a));
6. strncpy(b, "0123456789abcdef", sizeof(b));
7. strncpy(c, a, sizeof(c));
8. }

According to the C99 C language standard (ISO/IEC 9899:1999), the strncpy() function copies no more than n characters from a source array to a destination array. Therefore, if there is no null character in the first n characters of the destination array, the resulting string is not null-terminated. As a result, neither a[] nor b[] is properly terminated in the above example. Null-termination errors are difficult to detect and can lie dormant in deployed code until a particular set of inputs causes a failure.

String Truncation

String truncation occurs when a destination character array is not large enough to hold the contents of a string. It may occur while reading user input or copying a string or even by limiting the number of characters to prevent a buffer overflow. While generally preferable to a buffer overflow, string truncation results in a loss of data, and in some cases can lead to software vulnerabilities.

Mitigation Strategies

Unbounded string copies, null-termination errors, and string-truncation errors have led to numerous vulnerabilities in C and C++ programs. There are, however, numerous mitigation strategies that can be employed to produce more secure code.

C++ programmers, for example, have the option of using the standard std::string class. The std::string class is the char instantiation of the std::basic_string template class, and it uses a dynamic approach to strings in that memory is allocated as required—meaning that in all cases, size() <= capacity(). The std::string class is convenient because the language supports the class directly. Also, many existing libraries already use this class, which simplifies integration. Problems can still arise in converting from basic_string to C-style strings and in using the subscript operator [] (which does not perform bounds checking). However, basic_string is generally less prone to errors that result in security vulnerabilities.

For C users, the solution is not as obvious. Conventional solutions such as the use of strncpy() and strncat() are still prone to buffer overflows and truncation errors. As a result, Microsoft has deprecated the use of these functions in Visual Studio 2005. Microsoft developed a set of string functions that can be effective in the remediation of legacy code and submitted them to the ISO/IEC SC22 WG14 J11 International Standardization Working Group for the Programming Language C. The CERT/CC is also working on a managed string library for C that provides capabilities similar to those of the basic_string class in C++. [Long 05, Seacord 05]. An alpha version of this library will be available from www.cert.org in the first quarter of 2006 for testing.

Summary

C is a popular and viable language although it has characteristics that make it prone to security flaws. Some of these problems may be addressed as the language standard, compilers, and tools evolve. In the short term, the best hope for improvement is to educate developers in how to program securely by recognizing common security flaws and applying appropriate mitigations. In the long term, improvements must be made in the C language standard and implemented in compliant compilers for C to remain a viable language for developing secure systems.


References

[Richards 79]
Richards, Martin and Whitby-Strevens, Colin. BCPL: The Language and Its Compiler. New York, NY: Cambridge University Press, 1979 (0-521-21965-5).

[Johnson 73]
Johnson, S. C. and Kernighan, B.W. The Programming Language B (Computing Science Technical Report No. 8). Murray Hill, NJ: Bell Labs, 1973.

[Long 05]
Fred Long, Robert C. Seacord. Specification for Managed Strings.

[Seacord 05]
Robert Seacord. "Managed String Library for C." C/C++ Users Journal 23, 10 (October 2005): 30-34.

About the Author

Robert Seacord began programming professionally for IBM in 1982 and has been programming in C since 1985, and in C++ since 1992. He is currently a senior vulnerability analyst with the CERT/Coordination Center (CERT/CC) at the Software Engineering Institute (SEI). As a member of the Vulnerability Analysis Team, Seacord works with other CERT team members to analyze software vulnerability reports and assess the risk to the Internet and other critical infrastructures, identify underlying causes of vulnerabilities, and develop coding practices to improve the security of software systems. He is coauthor of two other books: Building Systems from Commercial Components (Addison-Wesley, 2002) and Modernizing Legacy Systems (Addison-Wesley, 2003).

Find Us Here

Find us on Youtube  Find us on LinkedIn  Find us on twitter  Find us on Facebook

Share This Page

Share on Facebook  Send to your Twitter page  Save to del.ico.us  Save to LinkedIn  Digg this  Stumble this page.  Add to Technorati favorites  Save this page on your Google Home Page 

For more information

Contact Us

info@sei.cmu.edu

412-268-5800

Help us improve

Visitor feedback helps us continually improve our site.

Please tell us what you
think with this short
(< 5 minute) survey.