These are a set of basic rules that I developed at a previous company that I worked for to try to minimise the incompatibilities in interfaces between different people's code by adopting a standard set of coding rules and guidelines. They are intended to be a minimum set since I don't see the point in dictating silly things like code layout.
Basic data structures should be implemented using the STL template classes wherever they are appropriate. Most of these are container classes implemented using templates but there are other less-used headers too. You should not write your own list structures, sets etc.
String processing should use the STL string type in preference to char*. Indeed, char* should be considered obsolete.
The STL provides:
string | dynamic string type |
vector | dynamic array container |
list | basic list container |
deque | double-ended queue container |
pair | A container for holding a collection of two related objects |
map | associative storage container |
multimap | as map container but with multiple entries per key |
set | associative storage container |
multiset | as set container but with multiple entries per key |
algorithm | sorts, searches etc. acting on containers |
functional | predicates etc. acting on containers |
iostream | I/O system - but see notes below on I/O |
There are a number of books around on using the STL. However, I can't recommend one because I haven't found a good one yet. A reasonably-useful online guide is to be found at CPlusPlus.com.
The STLplus library has three objectives: it extends the STL by providing extra template classes; it deals with portability issues as discussed in the section "Make your code portable"; and it provides a lot of utilities which you will find useful.
The STLplus provides the following continer classes:
smart_ptr | A memory managing container for holding a single object |
digraph | A directed graph container |
hash | A chained hash table container with similar interface to map |
matrix | A 2-dimensional matrix container |
ntree | A rooted tree container |
triple | A container for holding a collection of three related objects |
foursome | A container for holding a collection of four related objects |
If you look at old C programs that do string handling using char* (or char[]) buffers, you will find a mess - every time. There will be constant juggling of buffers, reallocation to avoid overflow, etc. All of this simply shows that the C-style char* is a bad choice of data type for string handling.
All this hassle - and a lot of potential bugs - can be avoided by simply not using char* at all. All string handling should be done using std::string throughout.
If you do need to call a C system function that has char* interfaces, then it is a good policy to put a C++ wrapper around it that does all the conversions to/from std::string. The wrapper function should have only std::string in the interface.
A corollary of this rule is that all C runtime functions using char* interfaces should be considered obsolete and either not used - if there is a C++ equivalent - or wrapped in such a C++ layer before use.
In C++ you have a choice between three I/O systems (unistd from C, stdio from C and iostream from C++). This can cause incompatible interfaces. Therefore it is good practice to standardise on one.
This choice is simple since it is best to consider the C libraries as obsolete in general, although they are useful in some special circumstances. Therefore, all I/O interfaces should use IOStream since this is the only C++ I/O system.
There is also support in the STLplus library for a binary dump format. This has the advantage that any data structure that is dumped can later be restored - it is a two-way mechanism, whereas text I/O is typically human-readable but one-way. See the persistence functions for details.
The recommendation is to have one subsystem declared per header. A subsystem may be a class, with possibly sub-classes declared in the same header. Or it could be a collection of closely-related functions. For that matter it could be one function. The header file should have the same name as the subsystem (no naff abbreviations, you're not limited to 8 letter filenames anymore and haven't been for decades) with the extension .hpp for C++ headers and .h for C-only headers.
Source code should be contained in a file with the same name as the header but with a .cpp extension for C++ and a .c extension for C-only.
I recommend not putting template implementations in headers since headers also need to be human-readable. It is amazing how many headers are not human-readable and I consider this incompetence. However, there is a requirement that template implementations are visible to the compiler in the same way as headers, so should not be put in .cpp files either. My solution to this is to have a third file type with a .tpp extension which contains template implementations. This I #include at the end of the .hpp file that declares the templates so that any code calling a template will have access to both declaration and implementation.
Include only the minimum set of headers in a header file needed to make all the types used in the header available. Any additional headers needed in the C++ body should be included in the body file. This minimises the number of includes that someone including your header will inherit from you and is considered friendly.
Use a sentinel within each header so that the includes in a file become order independent. A sentinel puts a pre-processor conditional around the whole header file which means that, no matter how many times it is included, the contents will only be included once. At the very start of the file (I mean lines 1 and 2), for a header called my_stuff.h, the sentinel would look like this:
#ifndef MY_STUFF_H #define MY_STUFF_H
and at the very end of the file (and I mean the very last line):
#endif
The name of the sentinel here is created by uppercasing the filename and changing the dot to an underscore. Some people add a double leading underscore on the name. This is perfectly acceptable. The aim is to ensure that all sentinel names are different. The second style is:
#ifndef __MY_STUFF_H #define __MY_STUFF_H
Finally, never include the "using namespace std" clause in a header. All STL classes referred to in the header should have the std:: namespace prefix added - for example, string should be referred to as std::string within headers. The reason for this rule is that it is considered unfriendly to people who may wish to include your header in their code to dump all of the std namespace into their code against their wishes, which is what the using... clause does.
In body files, you are free to do what you like with namespaces, since no-one includes body files. By the way, the preferred way of including a C++ system header is:
#include <string>
Note the lack of an extension.
Also note that for C system headers, there are two forms. The normal form still works:
#include <stdlib.h>
This is just as in C and makes the stdlib functions and types available. However, you can drop the ".h" and add a "c" prefix and it puts the header into the std:: namespace:
#include <cstdlib>
If you now add the "using namespace std" then you're back to where you started, but you could alternatively refer to the contents of stdlib with the std:: prefix.
Exceptions should only be used for error conditions. They should not be part of the normal execution path of a program. It may seem that an exception can be used to return a value of a different type from the declared return type of a function, but this is extremely bad practice since it obfuscates your code. It also has performance implications, because compiler writers are under absolutely no obligation whatsoever to make exception handling fast. Indeed, there is an unwritten rule that code optimisation should focus on speeding up the normal operation of a program, not the erroneous operation so the implementation of exception handling is usually designed to minimise impact on normal operation.
Mere user errors or input errors should be indicated by returning an appropriate value from a function, setting a flag, dropping out of a loop or other 'normal' C++ operations. Only program failures should be handled by exceptions.
You should always delete
an object created using new
and always free
an object created using malloc
. This is because the C++ memory manager is not guaranteed to
be compatible with the C memory manager, even though it usually is. Note there is a difference
between "guaranteed" and "usually". Just because "it works" with your compiler does not make it
correct. It will probably not work with another compiler or a later edition of your
current one.
Furthermore, realloc
should only be used on memory
allocated with malloc
, never memory created with new
.
You need to keep open the possibility of adding either a cached or debugging version of the
memory manager. For example, a cached memory manager could speed up new
and delete
but in a way which could make them incompatible with malloc
and free
.
The easiest way of ensuring this rule is to only use new
and delete
and
consider malloc
, realloc
and free
to be obsolete, which of course they
are, along with most of the C runtime.
You may wish to repackage some or all of your code as a shared library (DLL in Windows-speak) so all code should really be written with this possibility in mind. There can be problems with globals (specifically class globals which need to be constructed) in shared libraries and these problems vary between operating systems and compilers.
My preference is to try to avoid the problem by avoiding statics altogether. This is easy when you start from scratch, but with legacy code it is not always as simple as it sounds.
Fortunately, basic types such as bool, int and all pointers are not affected by this problem. Thus, if you really must have a global class object, make it a global pointer to a class object and dynamically allocate the object on first use:
static my_stuff* stuff = 0; bool do_something_now(...) { if (!stuff) stuff = new my_stuff(...); ... }
I believe that everyone is responsible for writing portable code at all times. It is not an SEP (someone else's problem). You do not know what will happen to your code in the future - notice for example how Gnu/Linux is faring now against Windows. Ten years ago Unix-type OSs were seen as scientific-interest only - now they are mainstream. Do you want to be adaptable in the future? Then make your code portable!
There are three issues relating to portability:
On Windows you might use Visual C++ or you might use Gnu gcc. On Gnu/Linux you'll be using Gnu's gcc. On Macintosh you will possibly be using gcc again, but there are other choices. Therefore, if there is any possibility of your code needing to be portable, all code must compile with both compilers. In practice this is pretty easy since there are only slight differences between them.
You should only use standard library functions - ANSI C run-time library and the standard C++ run-time library. You should not use any non-standard system calls. Nor should you use any extensions to the libraries, such as extra classes that a compiler vendor may have added to the STL. Nor should you use non-standard 'features' of the standard library functions.
Rule (2) goes a long way to meeting this rule, but there are some things that you have to do which are different between Windows and Unix. The three specific areas that could affect your development are in file-system handling, internet access and in subprocess handling. These are solved by using the STLplus library interfaces for the File System, TCP Sockets and Subprocesses respectively. These implement both a Windows and a Unix version of these subsystems accessed through a platform-independent interface.
If you need to add other functionality that is platform-specific, then you should think about providing a Unix and a Windows implementation. You should encapsulate (that means hide) it behind a common platform-independent interface in the same way as the above STLPLus subsystems. There should therefore be no "#ifdef WIN32" or other platform-specific compiler switches anywhere else in your application code.
The truth is that the C runtime library is obsolete. Yes, it is. Practically all of the functionality of the C runtime is provided in a better, more effective and more robust form in the C++ runtime library. For example, the I/O routines of stdio are superceded and vastly improved by iostream.
Furthermore, there are some functions in the C runtime that are positively
dangerous and should never be used. They should never have been written. Their
use in a program is positive proof of an incompetent programmer. An example is the
monster called sprintf
. Let me explain why this should never, ever, ever
ever be used. Ever.
First look at the interface:
int sprintf(char *, const char *, ...);
The first argument is a char* buffer to print into. The function prints text into this buffer according to the format string which is the second argument and the argument-vector parameters represented by the elipsis (...). What's missing is a parameter that tells sprintf how long the buffer is - so it doesn't know the buffer size and assumes that it is infinite. So, the function has no way of knowing if an overflow happens and cannot prevent it. If the buffer is not long enough, then the function quite happily runs off the end and corrupts other data structures in memory. This kind of memory bug is extremely difficult to diagnose and fix. A common bodge (yes, it is a bodge, not a solution) is to simply make the buffer very large. However, that just pushes the problem further away, it doesn't fix it. Consider the case where one of the parameters in the format string is a command-line argument. You as a programmer have no control over the length of this argument. Therefore you have no way of deciding how big the buffer should be. Any "guess" at the size is a bodge.
This horror of a function is commonly exploited by virus writers who send very long requests to web servers so that sprintf overflows its buffer and overwrites program code, replacing it with virus code. If sprintf did not exist in this form, we'd probably have fewer viruses.
IOStream provides functions for formatting text in a string (namely, a string-stream) that has no potential overflow problems. Therefore there is no justifiable use for sprintf.
In any case, using char* for string handling is obsolete because you have to write buckets of code to constantly check for possible buffer overflows. You should be using std::string which dynamically allocates more memory as required, so you can get on with writing the real code.
Note also that rule 7 explained why malloc/free/realloc are obsolete and potentially dangerous.