C is a programming language, designed by Dennis Ritchie during the early 1970s, for writing the UNIX operating system. It remains the most widely-used language for writing operating systems and system software. It is also frequently used for writing applications, although its popularity in this area has been eroded by newer programming languages, such as C++ and Java.
|
C is a high level language, meaning that the source code of a program can be written without detailed knowledge of the computer's CPU type. Before the program can be used, the source code must be translated into the required machine language by a compiler. In contrast, programs written in an assembly language can only be run on one type of CPU.
The main features of C are:
Some of these features are not guaranteed by the various C standards. For example, the C preprocessor is generally provided as a separate program, allowing files to be preprocessed without being fully compiled, but this is not required. Similarly, ++
is not required to be an O(1) operation. Nevertheless, most implementations of C provide these features, and the C programming community generally expects them to be present.
Although C is a high level language, it shares some similarities with assembly language, and is significantly lower-level than most other programming languages. Most prominently, it is up to the programmer to manage the contents of computer memory: by default, C provides no facilities for array bounds checking or automatic garbage collection. Manual memory management provides the programmer with greater leeway in tuning the performance of a program, which is particularly important for programs such as device drivers. However, it also makes it easy to write bugs stemming from erroneous memory operations, such as buffer overflows. Tools have been created to help programmers avoid these errors, including libraries for performing array bounds checking and garbage collection, and the lint source code checker.
Some of the perceived shortcomings of C have been addressed by newer programming languages derived from C. The Cyclone programming language has features to guard against erroneous memory operations. C++ and Objective C provide constructs designed to aid object-oriented programming. Java and C# add object-oriented programming constructs as well as a higher level of abstraction, such as automatic memory management.
The initial development of C occurred between 1969 and 1973 (according to Ritchie, the most creative period was during 1972). It was called "C" because many features derived from an earlier language named B, in commemoration of its parent, BCPL. BCPL was in turn descended from an earlier Algol-derived language, CPL.
By 1973, the C language had became powerful enough that most of the kernel of the Unix operating system was reimplemented in C. This was the first time that the kernel of an operating system had been implemented in a high level language. In 1978, Ritchie and Brian Kernighan published The C Programming Language (a.k.a. "the white book", or K&R.) For many years, this book served as the specification of the language; even today, it enjoys great popularity as a manual and learning tutorial.
C became immensely popular outside Bell Labs during the 1980s, and was for a time the dominant language in systems and microcomputer applications programming. It is still the most commonly-used language in systems programming, and is one of the most frequently used programming languages in computer science education.
In the late 1980s, Bjarne Stroustrup and others at Bell Labs worked to add object-oriented programming language constructs to C. The language they produced with Cfront was called C++ (thus avoiding the issue of whether the successor to "B" and "C" should be "D" or "P".) C++ is now the language most commonly used for commercial applications on the Microsoft Windows operating system, though C remains more popular in the Unix world.
C evolved continuously from its beginnings in Bell Labs. In 1978, the first edition of Kernighan and Ritchie's The C Programming Language was published. It introduced the following features to the existing versions of C:
long int
data type
unsigned int
data type
=+
operator was changed to +=
, and so forth (=+
was confusing the C compiler's lexical analyzer).
For several years, the first edition of The C Programming Language was widely used as a de facto specification of the language. The version of C described in this book is commonly referred to as "K&R C." (The second edition covers the ANSI C standard, described below.)
K&R C is often considered the most basic part of the language that is necessary for a C compiler to support. Since not all of the currently-used compilers have been updated to fully support ANSI C fully, and reasonably well-written K&R C code is also legal ANSI C, K&R C is considered the lowest common denominator that programmers should stick to when maximum portability is desired. For example, the bootstrapping version of the GCC compiler, xgcc, is written in K&R C. This is because many of the platforms supported by GCC did not have an ANSI C compiler when GCC was written, just one supporting K&R C.
However, ANSI C is now supported by almost all the widely used compilers. Most of the C code being written nowadays use language features that go beyond the original K&R specification.
In 1989, C was first officially standardized by ANSI in ANSI X3.159-1989 "Programming Language C". One of the aims of the ANSI C standard process was to produce a superset of K&R C. However, the standards committees also included several new features, more than is normal in programming language standardization.
Some of the new features had been "unofficially" added to the language after the publication of K&R, but before the beginning of the ANSI C process. These included:
void
functions
struct
or union
types
void *
data type
const
qualifier to make an object read-only
struct
field names in a separate name space for each struct type
struct
data types
stdio
library and some other standard library functions became available with most implementations (these already existed in at least one implementation at the time of K&R, but were not really standard, and thus not documented in the book)
stddef.h
header file and several other standard header files.
Several features were added during the ANSI C standardization process itself, most notably function prototypes (borrowed from C++). The ANSI C standard also established a standard set of library functions.
The ANSI C standard, with a few minor modifications, was adopted as ISO standard number ISO 9899. The first ISO edition of this document was published in 1990 (ISO 9899:1990.)
After the ANSI standardization process, the C language specification remained relatively static for some time, whereas C++ continued to evolve. (In fact, Normative Amendment 1 created a new version of the C language in 1995, but this version is rarely acknowledged.) However, the standard underwent revision in the late 1990s, leading to ISO 9899:1999, which was published in 1999. This standard is commonly referred to as "C99". It was adopted as an ANSI standard in March 2000.
The new features added in C99 include:
long long int
(to reduce the pain of the 32-bit to 64-bit transition looming for much old code with the predicted obsolescence of the x86 architecture), an explicit boolean datatype, and a type representing complex numbers
snprintf
stdint.h
Interest in supporting the new C99 features is mixed. Whilst GCC and several commercial compilers support most of the new features of C99, the compilers made by Microsoft and Borland do not, and these two companies do not seem to be interested in adding such support.
"Hello, World!" in C
The following simple application prints out "Hello, world!" to the standard output file (which usually the screen, but might be a file or some other hardware device). It appeared for the first time in K&R.
#include <stdio.h> int main(void) { printf("Hello, World!\n"); return 0; }
A C program consists of functions and variables. C functions are like the subroutines and functions of Fortran or the procedures and functions of Pascal. The function main
is special in that the program begins executing at the beginning of main
. This means that every C program must have a main
function.
The main
function will usually call other functions to help perform its job, such as printf
in the above example. The programmer may write some of these functions and others may be called from libraries. In the above example return 0
gives the return value for the main
function. This indicates a successful execution of the program to a calling shell program.
A C function consists of a return type, a name, a list of parameters (or void
in parentheses if there are none) and a function body. The syntax of the function body is equivalent to that of a compound statement.
Compound statements in C have the form
{ <optional-declaration-list> <optional-statement-list> }
and are used as the body of a function and so that several statements can be used where one is expected.
A statement of the form
<optional-expression> ;
is an expression statement. If the expression is missing, the statement is called a null statement.
C has three types of selection statements: two kinds of if
and the switch
statement.
The two kinds of if
statement are
if (<expression>) <statement>
and
if (<expression>) <statement> else <statement>
In the if
statement, if the expression in parentheses is nonzero or true, control passes to the statement following the if
. If the else
clause is present, control will pass to the statement following the else
clause if the expression in parentheses is zero or false.
The two are disambiguated by matching an else
to the next previous unmatched if
at the same nesting level.
The switch
statement causes control to be transferred to one of several statements depending on the value of an expression, which must have integral type. The substatement controlled by a switch is typically compound. Any statement within the substatement may be labeled with one or more case
labels, which consist of the keyword case
followed by a constant expression and then a colon (:). No two of the case constants associated with the same switch may have the same value. There may be at most one default
label associated with a switch; control passes to the default
label if none of the case labels are equal to the expression in the parentheses following switch
. Switches may be nested; a case
or default
label is associated with the smallest switch that contains it. Switch statements can "fall-through", that is, when one case section has completed its execution, statements will continue to be executed downward until a break statement is encountered. This may prove useful in certain circumstances, newer programming languages forbid case statements to "fall-through". In the below example, if <label2> is reached, the statements <statements 2> are executed and nothing more inside the braces. However if <label1> is reached, both <statements 1> and <statements 2> are executed since there is no break
to seperate the two case statements.
switch (<expression>) { case <label1> : <statements 1> case <label2> : <statements 2> break; default : <statements> }
C has three forms of iteration statement:
do <statement> while (<expression>)
while (<expression>) <statement>
for (<expression> ; <expression> ; <expression>) <statement>
In the while
and do
statements, the substatement is executed repeatedly so long as the value of the expression remains nonzero or true. With while
, the test, including all side effects from the expression, occurs before each execution of the statement; with do
, the test follows each iteration.
If all three expressions are present in a for
, the statement
for (e1; e2; e3) s;
is equivalent to
e1; while (e2) { s; e3; }
Any of the three expressions in the for
loop may be omitted. A missing second expression makes the while
test nonzero, creating an infinite loop.
Jump statements transfer control unconditionally. There are four types of jump statements in C: goto
, continue
, break
, and return
.
The goto
statement looks like this:
goto <identifier>
The identifier must be a label located in the current function. Control transfers to the labeled statement.
A continue
statement may appear only within an iteration statement and causes control to pass to the loop-continuation portion of the smallest enclosing such statement. That is, within each of the statements
while (expression) { /* ... */ cont: ; }
do { /* ... */ cont: ; } while (expression)
for (optional-expr; optexp2; optexp3) { /* ... */ cont: ; }
a continue
not contained within a nested iteration statement is the same as goto cont
.
The break
statement is used to get out of a for
loop, while
loop, do
loop, or switch
statement. Control passes to the statement following the terminated statement.
A function returns to its caller by the return
statement. When return
is followed by an expression, the value is returned to the caller of the function. Flowing off the end of the function is equivalent to a return
with no expression. In either case, the returned value is undefined.
() [] -> . ++ -- (cast) postfix operators ++ -- * & ~ ! + - sizeof unary operators * / % multiplicative operators + - additive operators << >> shift operators < <= > >= relational operators == != equality operators & bitwise and ^ bitwise exclusive or | bitwise inclusive or && logical and || logical or ?: conditional operator = += -= *= /= %= <<= >>= &= |= ^= assignment operators , comma operator
Note: The following are typical ranges and lengths for these data types. It is possible that a compiler may use values that vary from the ones below without violating the ANSI Standard. Consult a C reference for more information.
The values in the <limits.h>
and <float.h>
headers determine the ranges of the fundamental data types. The ranges of the float
, double
, and long double
types are typically those mentioned in the IEEE 754 Standard.
name | length | range |
char |
1 | -128..127 or 0..255 |
unsigned char |
1 | 0..255 | signed char |
1 | -128..127 |
int |
2 or 4 | -32768..32767 or -2147483648..2147483647 |
short int |
2 | -32768..32767 |
long int |
4 | -2147483648..2147483647 |
float |
4 | ??? |
double |
8 | ??? | long double |
8 | ??? |
If a declaration is suffixed by a number in square brackets ([]), the declaration is said to be an array declaration. Strings are just character arrays. They are terminated by an ASCII NUL character.
Examples:
int myvector [100]; char mystring [80]; float mymatrix [3] [2] = {2.0 , 10.0, 20.0, 123.0, 1.0, 1.0} char lexicon [10000] [300] ; /* 10000 entries with max 300 chars each. */ int a[3][4];
The last example above creates an array of arrays, but can be thought of as a multidimensional array for most purposes. The 12 int
values created could be accessed as follows:
a[0][0] |
a[0][1] |
a[0][2] |
a[0][3] |
a[1][0] |
a[1][1] |
a[1][2] |
a[1][3] |
a[2][0] |
a[2][1] |
a[2][2] |
a[2][3] |
If a variable has an asterisk (*) in its declaration it is said to be a pointer.
Examples:
int *pi; /* pointer to int */ int *api[3]; /* array of 3 pointers to int */ char **argv; /* pointer to pointer to char */
The value at the address stored in a pointer variable can then be accessed in the program with an asterisk. For example, given the first example declaration above, *pi
is an int
. This is called "dereferencing" a pointer.
Another operator, the &
(ampersand), called the addressof operator, returns the address of variable, array, or function. Thus, given the following
int i, *pi; /* int and pointer to int */ pi = &i;
i
and *pi
could be used interchangably (at least until pi
is set to something else).
char
.
The most important string functions are:
strcat(dest, source)
- appends the string source
to the end of string dest
strchr(s, c)
- finds the first instance of character c
in string s
and returns a pointer to it or a null pointer if c
is not found
strcmp(a, b)
- compares strings a
and b
(lexical ordering); returns negative if a
is less than b
, 0 if equal, positive if greater.
strcpy(dest, source)
- copies the string source
to the string dest
strlen(st)
- return the length of string st
strncat(dest, source, n)
- appends a maximum of n
characters from the string source
to the end of string dest
; characters after the null terminator are not copied.
strncmp(a, b, n)
- compares a maximum of n
characters from strings a
and b
(lexical ordering); returns negative if a
is less than b
, 0 if equal, positive if greater.
strncpy(dest, source, n)
- copies a maximum of n
characters from the string source
to the string dest
strrchr(s, c)
- finds the last instance of character c
in string s
and returns a pointer to it or a null pointer if c
is not found
The less important string functions are:
strcoll(s1, s2)
- compare two strings according to a locale-specific collating sequence
strcspn(s1, s2)
- returns the index of the first character in s1
that matches any character in s2
strerror(err)
- returns a string with an error message corresponding to the code in err
strpbrk(s1, s2)
- returns a pointer to the first character in s1
that matches any character in s2
or a null pointer if not found
strspn(s1, s2)
- returns the index of the first character in s1
that matches no character in s2
strstr(st, subst)
- returns a pointer to the first occurrence of the string subst
in st
or a null pointer if no such substring exists.
strtok(s1, s2)
- returns a pointer to a token within s1
delimited by the characters in s2
.
strxfrm(s1, s2, n)
- transforms s2
into s1
using locale-specific rules
<stdio.h>
header.
stdin
standard input
stdout
standard output
stderr
standard error
The following example demonstrates how a filter program is typically structured:
#include <stdio.h> int main() { int c; while (( c = getchar()) != EOF ) { /* do various things to the characters */ if (anErrorOccurs) { fputs("an error eee occurred\n", stderr); break; } /* ... */ putchar(c); /* ... */ } }
argc
and the individual arguments as character arrays in the pointer array argv
.
So the command
myFilt p1 p2 p3results in something like
(Note: there is no guarantee that the individual strings are contiguous.)
The individual values of the parameters may be accessed with argv[1]
, argv[2]
and argv[3]
.
The C Library
Many features of the C language are provided by the standard C library. A "hosted" implementation provides all of the C library. (Most implementations are hosted, but some, not intended to be used with an operating system, aren't.) Access to library features is achieved by including standard headers via the #include
preprocessing directive.
There were fifteen headers in C89:
<assert.h>
<ctype.h>
<errno.h>
<float.h>
<limits.h>
<locale.h>
<math.h>
<setjmp.h>
<signal.h>
<stdarg.h>
<stddef.h>
<stdio.h>
<stdlib.h>
<string.h>
<time.h>
The 1995 amendment (C95) added three more:
<iso646.h>
<wchar.h>
<wctype.h>
C99 added six more:
<complex.h>
<fenv.h>
<inttypes.h>
<stdbool.h>
<stdint.h>
<tgmath.h>
The Development of the C Language http://cm.bell-labs.com/cm/cs/who/dmr/chist
The C Programming Language, by Brian Kernighan and Dennis Ritchie. Also known as K&R. This is good for beginners.
C: A Reference Manual, by Samuel P. Harbison and Guy L. Steele. This book is excellent as a definitive reference manual, and for those working on C compiler and processors. The book contains a BNF grammar for C.
This article (or an earlier version of it) contains material from FOLDOC, used with permission.
Search Encyclopedia
|
Featured Article
|