When I was learning the C Programming Language, every time encountering the Preprocessor, though I knew that the Preprocessor can be really powerful, I told myself, I am just a novice and don’t be bothered by that and I only need to know some of the basics like #define, #if, #ifndef, #progma once, #endif and #include. But recently, when I was reading the book Data Structures and Algorithm Analysis in C, and incidentally seeing the following lines, I was bugged and could not help to learn more about the C Preprocessor.
// Following is a quote of code on page 197 of the book
// Data Structures and Algorithm Analysis in C [1]
// This line bothers me
# define Insert( X, H ) ( H = Insert1( (X), H) )
/* DeleteMin macro is left as an exercise */
PriorityQue Insert1( ElementType X, PriorityQueue H );
// Following is the definition on page 199
PriorityQueue
Insert1( ElementType X, PriorityQueue H )
{
PriorityQueue SingleNode;
SingleNode = malloc( sizeof( struct TreeNode) );
if(SingleNode == NULL)
FatalError("Our of space!!!" );
else
{
SingleNode->Element = X; SingleNode->Npl = 0;
SingleNode->Left = SingleNode->Right = NULL;
H = Merge( SingeNode, H );
}
return H;
}
The Basics
Every C programmer must have his or her own tutorial book, which of mine is The Art of Science of C [5]. I referred to it first and found the
The #define specification
All Starts with 3.14159265358979323846…
#define PI 3.14159265
A Tricky One
#define MaxStringSize 100
// frist version, 2 * BufferSize = 202:
// 2 * ( 100 + 1 )
#define BufferSize (MaxStringSize + 1)
// second version, 2 * BufferSize = 201
// 2 * 100 + 1
#define BufferSize MaxStringSize + 1
As you may find, #define only does the dull string replacement work here.
The #include specification
2 Formats
// looks for files in a special area reserved in for system files
#include <filename>
// looks for files in a part of the file system under the control
// of the user. If the file is not found, it works the same as the
// previous angle bracket version.
#include "filename"
Absolute Pathname
You can also #include a file with absolute pathname[4], but it can vary among operating systems. In a UNIX based system:
/home/fred/C/my_proj/declaration2.h
would be
\users\fred\C\my_proj\dclartion2.h
on Windows.
When absolute pathname are used, normal directory search is skipped.
Additional Features not covered
Some of the advanced features are not covered in reference[5], but will be introduced in detailed manner in later parts.
Psuedo-functions ( Macros)
It is also called macros, and is mostly used in the ANSI library. Which is the main topic of the remaining parts of this article.
Conditional Compilation
Following is the case that I use most, this is just like commenting a block out, but much easier.
#ifndef _BLOCK_TEST_H
#define _BLOCK_TEST_H
#define DEBUG
#if BLOCK_DEBUG
// code blocks
#endif
#endif
Conditional Compilation is also useful for writing more advanced programs that can be more easily transferred from on computer system to another.
Formal Debut
In this part, I will introduce the trickiest and most amazing part of the C preprocessor – the Macros.
Macros Like Functions
Intro
As you have seen in the first code block in this article:
# define Insert( X, H ) ( H = Insert1( (X), H) )
Macros can be formally defined as:
#define name(comma-separated-parameter-list) stuff
where the opening parenthesis of the parameter list must be adjacent to the name, otherwise , the parameter list will be interpreted as part of stuff.
To illustrate it with a simpler example, the following macro intends to calculate the square of a number:
#define SQUARE(x) x * x
Macro performs de facto a textual manipulation in preprocessing time by replacing parts of the code with macros. So the when SQUARE(5) is invoked, it is replaced by 5 * 5, which evaluates to be 25.
But what if when you invoke SQUARE( 2 + 3), as stated before, it only does string replacement, so it will be like 2 + 3 * 2 + 3, which is 11, not the intended 25.
To resolve this issue, as primary school have taught us, we need to use extra curly brackets!
Parentheses Help A Lot
Now, if we write the macros as:
#define SQUARE( (x) * (x) )
When calling SQUARE( 2 + 3 ), the textual replacement generates (2 + 3) * (2 + 3) for us, works as expected!
However, there is one more issue, which you might have spotted, (2 + 3) is evaluated twice. No big deal for this problem, but can be a big issue for other problems.
Macros are not functions
Have a look at the following chunk of code:
/* 1 */ int Square(int x) {
/* 2 */ return x * x;
}
/* 3 */ i = 4;
/* 4 */ Square(i++);
Now it is quite different from the (2 + 3) case, with i = 4 as the initial value, Square(i++) and SQUARE(i++) are quite different. Let’s see what happens.
For Square(i++):
// equivalent to
Square(4); // result: 16
For SQUARE(i++):
// equivalent to
(i++) * (i++);
// equivalent to
4 * 5; // result: 20
There is actually no solution for this kind of problems, the only thing to do is to
#define EVENPARITY( ch ) \
( ( count_one_bits( ch ) & 1 ) ? \
( ch ) | PARITYBIT : ( ch ) )
This is a function widely used in communications and disk storage.
When you call:
ch = EVENPARITY( getchar() );
The function
char c = getchar();
ch = EVENPARITY(c);
The only way to avoid this kind of problem is to make sure that never using the operations with side effects like self-increment and getchar() inside a macro. And the best way to tell a macro from a function is:
- Coding with good naming convention, macros all in uppercase
- Refer to the source code written by others.
Macros Should not be our Burden
When the arguments that SQUARE take are complicated, it will be calculated twice, which is not expected. We should simplify the arguments to improve the performance, which is our purpose of using macros instead of functions.
More about Conditional Compilation
#if, #elif, #else, #endif
#if constant-expression
statements
#elif constant-expression
other statements
#else
other statements
#endif
Note: constant-expression should not be any runtime variable.
Equivalent Usages
#if defined(symbol)
#ifdef symbol
#if !defined(symbol)
#ifndef symbol
Nested Directives
Just like the if..else.. statement in C, the directives can also be nested. Following can be applied to software that can run on multiple platforms.
#if defined( OS_UNIX )
#ifdef OPTION1
unix_version_of_option1();
#endif
#ifdef OPTION2
unix_version_of_option2();
#endif
#elif defined( OS_MSDOS )
#ifdef OPTION2
msdos_version_of_option2();
#endif
#endif
Command Line Definitions
This is only supported by some compilers, but still really useful, I will not explain much here
-Dname
-Dname=stuff
# in source code
int array[ARRAY_SIZE]
# on compilation in command line tools
cc -DARRAY_SIZE=100 prog.c
#line
#line number "string"
This directive modifies the value of the __LINE__ symbol(which will be mentioned later). It is used for translating code in other languages to C code.
#error
#error test of error message
# empty directive
ignored by the compiler, but can be used to separate directives from the surrounding code.
#pragma
#pragma differs in different compilers, as in Visual Studio
#pragma once
is equivalent to
#ifndef _HEADER_H
#define _HEADER_H 1
// 1 above is arbitrary, the purpose is to have this
// _HEADER_H symbol defined in the context
#endif
when the option like once is not defined in some compilers, the #pragma is ignored.
Other Funs
Customized C
This is the example C Is Not Algol on reference[7].
#define STRING char *
#define IF if(
#define THEN ){
#define ELSE } else {
#define FI ;}
#define DO ){
#define OD ;}
#define INT int
#define END }
// the #define's enables following code
INT compare(s1, s2)
STRING s1;
STRING s2;
BEGIN
WHILE *s1++ == *s2
DO IF *s2++ == 0
THEN return(0);
FI
OD
return (*--s1 - *s2);
END
The example here is only for fun, it is not recommended to do so. Since it will make team cooperation really hard in projects. Following is a similar usage that makes more sense but you should still be aware that aspects of the other language cannot be mimicked exactly.
#define repeat do
#define until( x ) while( ! (x) )
// now you can write
repeat {
statements
} until ( i >= 10 )
// which is equivalent to
do {
statements
} while( ! (i >= 10 ) );
# to String Literals
#define PRINT(FORMAT,VALUE) \
printf( "The value of " #VALUE \
" is " FORMAT "\n", VALUE )
PRINT( "%d", x + 3);
// output:
The value of x + 3 is 25
## Concatenates Strings
# define ADD_TO_SUM( sum_number, value ) \
sum ## sum_number += value
ADD_TO_SUM( 5, 25 );
// above is equivalent to
sum5 += 25;
How amazing! This is a function I thought only supported on meta-programming languages like Ruby.
Predefined Symbols
There are several predefined symbols like __FILE__, __LINE__, __DATE__, __TIME__, __STDC__, which are used as environment variables.
- __FILE__: Name of the source file being compiled
- __LINE__: Line number of the current line in the file
- __DATE__: Date that the file was compiled
- __TIME__: Time that the file was compiled
- __STDC__: 1 if the compiler confirms to ANSI C, else undefined.
Following is an example:
#define DEBUG_PRINT printf( "File %s line %d:" \
" x=%d, y=%d, z=%d", \
__FILE__, __LINE__, \
x, y, z )
// instantiation of the macro
x *= 2;
y += x;
z = x * y;
DEBUG_PRINT;
Split into multiple lines by ending with a backslash, where adjacent string literals are concatenated into one string.
Ontro
Macros are not statements
Have ever wondered how some of the embedded functions in C implemented? You bet! Macros are applied! [2]
Take the assert function as an example
assert(x > y)
It should terminate program execution with an appropriate error message when x > y evaluates to be 0.
We may implemented as following:
#define assert(e) if (!e) assert_error(__FILE__, __LINE__)
But how about we use it like this:
if (x > 0 && y > 0)
assert (x > y);
else
assert (y > x);
It will expand into something like:
if (x > 0 && y > 0)
if (!(x > y))
assert_error("foo.c", 37);
else
if (!(y > x))
assert_error("foo.c", 39);
The solution is:
#define assert(e) \
((void)((e)||_assert_error(__FILE__, __LINE__)))
which relies on the sequential nature of the || operator, when e is true, it just returns true, otherwise the second statement is evaluated.
Macros are not Type Definitions
You may want to use #define to make type aliases, but this is actually a bad decision. Considering the following example:
#define FOOTYPE struct foo*
FOOTYPE a;
FOOTYPE b, c;
The first application works fine, but when it comes to the second, it will become:
struct foo* b, c;
which is not our intention. So to avoid this situation, we should use
typedef struct foo FOOTYPE;
instead of the faulty one.
References
- Mark Allen Weiss, Data Structures and Algorithms Analysis in C, 2nd Edition, 1997
- Andrew Koenig, C Traps and Pitfalls, 1989
- Dave Thomas with Chad Fowler and Andy Hunt Programming Ruby 1.9 & 2.0 – The Pragmatic Programmer’s Guide, 2013
- Kenneth Reek, Pointers on C, 1997
- Eric S. Roberts, The Art and Science of C – A Library-Based Introduction to Computer Science, 1995
- Self Reverential Macros: https://gcc.gnu.org/onlinedocs/cpp/Self-Referential-Macros.html
- Peter van der Linden, Expert C Programming: Deep C Secrets, 1994