Encyclopedia > Programming language

Article Content

Programming language

A programming language is a set of syntactic and semantic rules used to define computer programs.

It is a standardized communication technique for expressing instructions to a computer. A language enables a programmer to precisely specify what kinds of data a computer will act upon, and precisely what actions to take under various circumstances.

The term computer language and programming language are interchangable.

Programming languages are intended to be executed on computers, but they may be also used for specifying algorithms or data-structures, which is why language designers try to make code which is easy to read.

Languages usually enable programmers to express their intent for a computation more easily than machine code does. Understanding programming languages is crucial for those engaged in computer science because today, all types of computation are done with computer languages.

During the last few decades, a large number of computer languages have been introduced, have replaced each other, and have been modified/combined. Although there have been several attempts to make a universal computer language that serves all purposes, all of them have failed. The need for a significant range of computer languages is caused by the fact that the purpose of programming languages varies from commercial software development to hobby use; the gap in skill between novices and experts is huge and some languages are too difficult for beginners to come to grips with; computer programmers have different preferences; and finally, acceptable runtime cost may be very different for programs running on a microcontroller and programs running on a supercomputer.

There are many special purpose languages, for use in special situations: PHP is a language used for outputting web pages; Perl is suitable for text manipulation; the C language has been widely used for development of operating systems and compilers (so-called system programming).

Programming languages make computer programs less dependent on particular machines or environments. This is because programming languages are converted into specific machine code for a particular machine rather than being executed directly by the machine. One ambitious goal of FORTRAN, one of the first programming languages, was this machine-independence.

Most languages can be either compiled or interpreted, but most are better suited for one than the other. In some programming systems, programs are compiled in multiple stages, into a variety of intermediate representations---typically, later stages of compilation are closer to machine code than earlier stages. One common variant of this implementation strategy, first used by BCPL in the late 1960s, was to compile programs to an intermediate representation called "O-code[?]" for a virtual machine, which was then compiled for the actual machine. This successful strategy was later used by Pascal with P-code[?] and Smalltalk with byte code, although in many cases the intermediate code was interpreted rather than being compiled.

If the translation mechanism used is one that translates the program text as a whole and then runs the internal format, this mechanism is spoken of as compilation. The compiler is therefore a program which takes the human-readable program text (called source code) as data input and supplies object code as output. The resulting object code may be machine code which will be executed directly by the computer's CPU, or it may be code matching the specification of a virtual machine.

If the program code is translated at runtime, with each translated step being executed immediately, the translation mechanism is spoken of as an interpreter. Interpreted programs run usually more slowly than compiled programs, but have more flexibility because they are able to interact with the execution environment. See interpreted language for detail.

Table of contents

1 Features of a Programming Language

1.1 Data and Data Structures
1.2 Instruction and Control Flow
1.3 Reference Mechanisms and Re-use
1.4 Design Philosophies

2 History of programming languages

3 Programming Paradigms

4 Languages

Features of a Programming Language

Each programming language can be thought of as a set of formal specifications concerning syntax, vocabulary, and meaning.

These specifications usually include:

Data and Data Structures
Instruction and Control Flow
Reference Mechanisms and Re-use
Design Philosophy

Most languages that are widely used, or have been used for a considerable period of time, have standardization bodies that meet regularly to create and publish formal definitions of the language, and discuss extending or supplementing the already extant definitions.

Data and Data Structures

Internally, all data in a modern digital computer are stored simply as on-off (binary) states. The data typically represent information in the real world such as names, bank accounts and measurements and so the low-level binary data are organised by programming languages into these high-level concepts.

The particular system by which data are organized in a program is the type system of the programming language; the design and study of type systems is known as type theory. Languages can be classified as statically typed systems (e.g. C++ or Java), and dynamically typed languages (e.g. Lisp, JavaScript, Tcl or Prolog). Statically-typed languages can be further subdivided into languages with manifest types, where each variable and function declaration has its type explicitly declared, and type-inferred languages (e.g. MUMPS, ML).

With statically-typed languages, there usually are pre-defined types for individual pieces of data (such as numbers within a certain range, strings of letters, etc.), and programmatically named values (variables) can have only one fixed type, and allow only certain operations: numbers cannot change into names and vice versa. Dynamically-typed languages treat all data locations interchangeably, so inappropriate operations (like adding names, or sorting numbers alphabetically) will not cause errors until run-time. Type-inferred languages superficially treat all data as not having a type, but actually do sophisticated analysis of the way the program uses the data to determine which elementary operations are performed on the data, and therefore deduce what type the variables have at compile-time. Type-inferred languages can be more flexible to use, while creating more efficient programs; however, this capability is difficult to include in a programming language implementation, so it is relatively rare.

It is possible to perform type inference on programs written in a dynamically-typed language, but it is legal to write programs in these languages that make type inference infeasible.

Sometimes statically-typed languages are called "type-safe" or "strongly typed", and dynamically-typed languages are called "untyped" or "weakly typed"; confusingly, these same terms are also used to refer to the distinction between languages like Eiffel, Oberon, Lisp, Scheme, or OCaml, in which it is impossible to use a value as a value of another type and possibly corrupt data from an unrelated part of the program or cause the program to crash, and languages like FORTH, C, assembly language, C++, and most implementations of Pascal, in which it is possible to do this.

Sometimes type-inferred and dynamically-typed languages are called "latently typed."

Most languages also provide ways to assemble complex data structures from built-in types and to associate names with these new combined types (using arrays, lists, stacks, files). Object oriented languages allow the programmer to define new data-types, "Objects", along with the "Functions" to operate upon these new data-types, "Methods", by assembling complex structures along with behaviors specific to those newly defined data structures.

Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as named values (denoted values). Low-level languages like C typically allow programs to name memory locations, regions of memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended this by allowing expressions to return struct values as well. Functional languages often allow variables to name run-time computed values directly instead of naming memory locations where values may be stored. Languages that use garbage collection are free to allow arbitrarily complex data structures as both expressed and denoted values. Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are first-class values that can be created at run-time.

Instruction and Control Flow

Once data has been specified, the machine must be instructed how to perform operations on the data. Elementary statements may be specified using keywords or may be indicated using some well-defined grammatical structure. Each language takes units of these well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping these elementary statements exist. This allows one to write programs that are able to cover a variety of input, instead of being limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a language are those used to control processing (branches, definitions by cases, loops, backtracking, functional composition).

Reference Mechanisms and Re-use

The core of the idea of reference is that there must be a method of indirectly designating storage space. The most common method is through named variables. Depending on the language, further indirection may include references that are pointers to other storage space stored in such variables or groups of variables. Similar to this method of naming storage is the method of naming groups of instructions. Most programming language use macro calls, procedure calls or function calls as the statements that use these names. Using symbolic names in this way allows a program to achieve significant flexibility, as well as a high measure of reusability. Indirect references to available programs or predefined data divisions allow many application-oriented languages to integrate typical operations as if the programming language included them as higher level instructions.

Design Philosophies

For the above-mentioned purposes, each language has been developed using a special design or philosophy. Some aspect or another is particularly stressed by the way the language uses data structures, or by which its special notation encourages certain ways of solving problems or expressing their structure.

Since programming languages are artificial languages, they require a high degree of discipline to accurately specify which operations are desired. Programming languages are not error tolerant; however, the burden of recognising and using the special vocabulary is reduced by help messages generated by the programming language implementation. There are a few languages which offer a high degree of freedom in allowing self-modification in which a program re-writes parts of itself to handle new cases. Typically, only machine language and members of the Lisp family (Common Lisp, Scheme) provide this capability; Some languages such as MUMPS and Perl allow modification of data structures that contain program fragments, and provide methods to transfer program control to those data structures; languages that support dynamic linking and loading such as C, C++, and the Java programming language can emulate self-modification by either embedding a small compiler or calling a full compiler and linking in the resulting object code. Interpreting code by recompiling it in real time is called dynamic recompilation; emulators and other virtual machines exploit this technique for greater performance.

There are a variety of ways to classify programming languages. The distinctions are not clear-cut; a particular language standard may be implemented in multiple classifications. For example, a language may have both compiled and interpreted implementations.

History of programming languages In the 1940s when the first computers were created, it required programmers to operate machines by hand. At that time, computers were extremely expensive and only Konrad Zuse imagined the use of a programming language (Plankalkül) like those of today for solving problems.

Several decades later, as the cost of computers has dropped significantly and the complexity of computer programs has increased dramatically, it has become apparent that development time is more valuable than computer time.

Newer integrated, visual development environments have brought clear progress. They have reduced expenditure of time, money (and nerves). Regions of the screen that control the program can often be arranged interactively. Code fragments can be invoked just by clicking on a control. The work is also eased by prefabricated components and software libraries with re-usable code.

Recent languages are emphasising new features, like mix-ins, delegation, and aspects.

Object-oriented methodology can substantially reduce the complexity of programs.

Programming languages are important tools for helping software engineers write better programs faster.

Programming Paradigms A programming paradigm is a paradigm for programming computer programs or more generally software or software systems. A programming paradigm is often closely connected to a certain school of software architecture, software engineering or similar.

A programming paradigm is often associated with a certain family of programming languages.

Languages The following languages are major languages used by several hundred thousand to several million programmers worldwide:

COBOL
C
C++
Java
FORTRAN
Visual Basic
Delphi
Perl
Postscript, which has been derived from Forth
Python
Ruby
Scheme - a variant of Lisp
Smalltalk