Coding Guidelines

Introduction

These are a set of basic rules that I developed at a previous company that I worked for to try to minimise the incompatibilities in interfaces between different people's code by adopting a standard set of coding rules and guidelines. They are intended to be a minimum set since I don't see the point in dictating silly things like code layout.

1. Use the STL

Basic data structures should be implemented using the STL template classes wherever they are appropriate. Most of these are container classes implemented using templates but there are other less-used headers too. You should not write your own list structures, sets etc.

String processing should use the STL string type in preference to char*. Indeed, char* should be considered obsolete.

The STL provides:

stringdynamic string type
vectordynamic array container
listbasic list container
dequedouble-ended queue container
pairA container for holding a collection of two related objects
mapassociative storage container
multimapas map container but with multiple entries per key
setassociative storage container
multisetas set container but with multiple entries per key
algorithmsorts, searches etc. acting on containers
functionalpredicates etc. acting on containers
iostreamI/O system - but see notes below on I/O

There are a number of books around on using the STL. However, I can't recommend one because I haven't found a good one yet. A reasonably-useful online guide is to be found at CPlusPlus.com.

2. Use STLplus

The STLplus library has three objectives: it extends the STL by providing extra template classes; it deals with portability issues as discussed in the section "Make your code portable"; and it provides a lot of utilities which you will find useful.

The STLplus provides the following continer classes:

smart_ptrA memory managing container for holding a single object
digraphA directed graph container
hashA chained hash table container with similar interface to map
matrixA 2-dimensional matrix container
ntreeA rooted tree container
tripleA container for holding a collection of three related objects
foursomeA container for holding a collection of four related objects

3. Use std::string

If you look at old C programs that do string handling using char* (or char[]) buffers, you will find a mess - every time. There will be constant juggling of buffers, reallocation to avoid overflow, etc. All of this simply shows that the C-style char* is a bad choice of data type for string handling.

All this hassle - and a lot of potential bugs - can be avoided by simply not using char* at all. All string handling should be done using std::string throughout.

If you do need to call a C system function that has char* interfaces, then it is a good policy to put a C++ wrapper around it that does all the conversions to/from std::string. The wrapper function should have only std::string in the interface.

A corollary of this rule is that all C runtime functions using char* interfaces should be considered obsolete and either not used - if there is a C++ equivalent - or wrapped in such a C++ layer before use.

4. Use IOStream

In C++ you have a choice between three I/O systems (unistd from C, stdio from C and iostream from C++). This can cause incompatible interfaces. Therefore it is good practice to standardise on one.

This choice is simple since it is best to consider the C libraries as obsolete in general, although they are useful in some special circumstances. Therefore, all I/O interfaces should use IOStream since this is the only C++ I/O system.

There is also support in the STLplus library for a binary dump format. This has the advantage that any data structure that is dumped can later be restored - it is a two-way mechanism, whereas text I/O is typically human-readable but one-way. See the persistence functions for details.

5. Modularise

The recommendation is to have one subsystem declared per header. A subsystem may be a class, with possibly sub-classes declared in the same header. Or it could be a collection of closely-related functions. For that matter it could be one function. The header file should have the same name as the subsystem (no naff abbreviations, you're not limited to 8 letter filenames anymore and haven't been for decades) with the extension .hpp for C++ headers and .h for C-only headers.

Source code should be contained in a file with the same name as the header but with a .cpp extension for C++ and a .c extension for C-only.

I recommend not putting template implementations in headers since headers also need to be human-readable. It is amazing how many headers are not human-readable and I consider this incompetence. However, there is a requirement that template implementations are visible to the compiler in the same way as headers, so should not be put in .cpp files either. My solution to this is to have a third file type with a .tpp extension which contains template implementations. This I #include at the end of the .hpp file that declares the templates so that any code calling a template will have access to both declaration and implementation.

6. #Include Rules

Include only the minimum set of headers in a header file needed to make all the types used in the header available. Any additional headers needed in the C++ body should be included in the body file. This minimises the number of includes that someone including your header will inherit from you and is considered friendly.

Use a sentinel within each header so that the includes in a file become order independent. A sentinel puts a pre-processor conditional around the whole header file which means that, no matter how many times it is included, the contents will only be included once. At the very start of the file (I mean lines 1 and 2), for a header called my_stuff.h, the sentinel would look like this:

#ifndef MY_STUFF_H
#define MY_STUFF_H

and at the very end of the file (and I mean the very last line):

#endif

The name of the sentinel here is created by uppercasing the filename and changing the dot to an underscore. Some people add a double leading underscore on the name. This is perfectly acceptable. The aim is to ensure that all sentinel names are different. The second style is:

#ifndef __MY_STUFF_H
#define __MY_STUFF_H

Finally, never include the "using namespace std" clause in a header. All STL classes referred to in the header should have the std:: namespace prefix added - for example, string should be referred to as std::string within headers. The reason for this rule is that it is considered unfriendly to people who may wish to include your header in their code to dump all of the std namespace into their code against their wishes, which is what the using... clause does.

In body files, you are free to do what you like with namespaces, since no-one includes body files. By the way, the preferred way of including a C++ system header is:

#include <string>

Note the lack of an extension.

Also note that for C system headers, there are two forms. The normal form still works:

#include <stdlib.h>

This is just as in C and makes the stdlib functions and types available. However, you can drop the ".h" and add a "c" prefix and it puts the header into the std:: namespace:

#include <cstdlib>

If you now add the "using namespace std" then you're back to where you started, but you could alternatively refer to the contents of stdlib with the std:: prefix.

7. Exceptions

Exceptions should only be used for error conditions. They should not be part of the normal execution path of a program. It may seem that an exception can be used to return a value of a different type from the declared return type of a function, but this is extremely bad practice since it obfuscates your code. It also has performance implications, because compiler writers are under absolutely no obligation whatsoever to make exception handling fast. Indeed, there is an unwritten rule that code optimisation should focus on speeding up the normal operation of a program, not the erroneous operation so the implementation of exception handling is usually designed to minimise impact on normal operation.

Mere user errors or input errors should be indicated by returning an appropriate value from a function, setting a flag, dropping out of a loop or other 'normal' C++ operations. Only program failures should be handled by exceptions.

8. New/Delete versus Malloc/Free

You should always delete an object created using new and always free an object created using malloc. This is because the C++ memory manager is not guaranteed to be compatible with the C memory manager, even though it usually is. Note there is a difference between "guaranteed" and "usually". Just because "it works" with your compiler does not make it correct. It will probably not work with another compiler or a later edition of your current one.

Furthermore, realloc should only be used on memory allocated with malloc, never memory created with new.

You need to keep open the possibility of adding either a cached or debugging version of the memory manager. For example, a cached memory manager could speed up new and delete but in a way which could make them incompatible with malloc and free.

The easiest way of ensuring this rule is to only use new and delete and consider malloc, realloc and free to be obsolete, which of course they are, along with most of the C runtime.

9. No Static Objects

You may wish to repackage some or all of your code as a shared library (DLL in Windows-speak) so all code should really be written with this possibility in mind. There can be problems with globals (specifically class globals which need to be constructed) in shared libraries and these problems vary between operating systems and compilers.

My preference is to try to avoid the problem by avoiding statics altogether. This is easy when you start from scratch, but with legacy code it is not always as simple as it sounds.

Fortunately, basic types such as bool, int and all pointers are not affected by this problem. Thus, if you really must have a global class object, make it a global pointer to a class object and dynamically allocate the object on first use:

static my_stuff* stuff = 0;

bool do_something_now(...)
{
  if (!stuff) stuff = new my_stuff(...);
  ...
}

10. Make your code portable

I believe that everyone is responsible for writing portable code at all times. It is not an SEP (someone else's problem). You do not know what will happen to your code in the future - notice for example how Gnu/Linux is faring now against Windows. Ten years ago Unix-type OSs were seen as scientific-interest only - now they are mainstream. Do you want to be adaptable in the future? Then make your code portable!

There are three issues relating to portability:

1) Portability between compilers

On Windows you might use Visual C++ or you might use Gnu gcc. On Gnu/Linux you'll be using Gnu's gcc. On Macintosh you will possibly be using gcc again, but there are other choices. Therefore, if there is any possibility of your code needing to be portable, all code must compile with both compilers. In practice this is pretty easy since there are only slight differences between them.

2) Portability between Run-time Libraries

You should only use standard library functions - ANSI C run-time library and the standard C++ run-time library. You should not use any non-standard system calls. Nor should you use any extensions to the libraries, such as extra classes that a compiler vendor may have added to the STL. Nor should you use non-standard 'features' of the standard library functions.

3) Portability between Operating Systems

Rule (2) goes a long way to meeting this rule, but there are some things that you have to do which are different between Windows and Unix. The three specific areas that could affect your development are in file-system handling, internet access and in subprocess handling. These are solved by using the STLplus library interfaces for the File System, TCP Sockets and Subprocesses respectively. These implement both a Windows and a Unix version of these subsystems accessed through a platform-independent interface.

If you need to add other functionality that is platform-specific, then you should think about providing a Unix and a Windows implementation. You should encapsulate (that means hide) it behind a common platform-independent interface in the same way as the above STLPLus subsystems. There should therefore be no "#ifdef WIN32" or other platform-specific compiler switches anywhere else in your application code.

11. Avoid the C Runtime Library

The truth is that the C runtime library is obsolete. Yes, it is. Practically all of the functionality of the C runtime is provided in a better, more effective and more robust form in the C++ runtime library. For example, the I/O routines of stdio are superceded and vastly improved by iostream.

Furthermore, there are some functions in the C runtime that are positively dangerous and should never be used. They should never have been written. Their use in a program is positive proof of an incompetent programmer. An example is the monster called sprintf. Let me explain why this should never, ever, ever ever be used. Ever.

First look at the interface:

int sprintf(char *, const char *, ...);

The first argument is a char* buffer to print into. The function prints text into this buffer according to the format string which is the second argument and the argument-vector parameters represented by the elipsis (...). What's missing is a parameter that tells sprintf how long the buffer is - so it doesn't know the buffer size and assumes that it is infinite. So, the function has no way of knowing if an overflow happens and cannot prevent it. If the buffer is not long enough, then the function quite happily runs off the end and corrupts other data structures in memory. This kind of memory bug is extremely difficult to diagnose and fix. A common bodge (yes, it is a bodge, not a solution) is to simply make the buffer very large. However, that just pushes the problem further away, it doesn't fix it. Consider the case where one of the parameters in the format string is a command-line argument. You as a programmer have no control over the length of this argument. Therefore you have no way of deciding how big the buffer should be. Any "guess" at the size is a bodge.

This horror of a function is commonly exploited by virus writers who send very long requests to web servers so that sprintf overflows its buffer and overwrites program code, replacing it with virus code. If sprintf did not exist in this form, we'd probably have fewer viruses.

IOStream provides functions for formatting text in a string (namely, a string-stream) that has no potential overflow problems. Therefore there is no justifiable use for sprintf.

In any case, using char* for string handling is obsolete because you have to write buckets of code to constantly check for possible buffer overflows. You should be using std::string which dynamically allocates more memory as required, so you can get on with writing the real code.

Note also that rule 7 explained why malloc/free/realloc are obsolete and potentially dangerous.