Encyclopedia > Database

Article Content

Database

A database is an information set organized for flexible searching and utilization. There are a wide array of databases, from simple examples such as simple tabular collections to much more complex models such as the relational model.

The types of database are distinguished by many characteristics. One commonly used characteristic is the programming model associated with the database. Several models have been in wide use for some time.

Table of contents

1 Database models

2 Implementations and indexing

3 Mapping objects into databases

4 Applications of databases

5 Transactions and concurrency

Database models

The flat (or table) model is basically a two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password might be used as a part of a system security database. Each row would have the specific password associated with a specific user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers.

The network model[?] expands on the flat model by allowing the addition of multiple tables. A table column type can be defined as a reference to one or more entries of a different table. Thus, the tables are related by references, which can be viewed as a network structure. A particular subset of the network model, the hierarchical model, limits the relationships to a tree structure, instead of the more general directed graph structure implied by the full network model.

Relational databases also initially appear very similar to a collection of flat database tables. Rows of these tables are called "relations", by analogy with the concept of a mathematical relation. The power of the relational model emerges when lookups of more than one table are combined together, allowing relational databases to "join" the tables using mathematical combinations of simple queries to create complex queries that resemble the queries possible with network databases. The theory behind the relational model was developed by Ted Codd.

Unlike network databases, the relationships between relations are not explicitly encoded in the definition of the relation variables. Instead, the presence of attributes on the same domain potentially implies a relationship between the tables. Each tuple in a relation represents a predicate. Operations are made on relations with relational calculus or algebra. As a result, relational databases can be flexibly reorganized and reused in ways their original designers did not foresee. Many business and personal databases are based on bastardised derivations of the relational model, because of this flexibility.

Implementations and indexing

All of these kinds of database can take advantage of indexing to increase their speed. An index is a sorted list of the contents of some particular table column, with pointers to the row associated with the value. An index allows a set of table rows matching some criterion to be located quickly. Various methods of indexing are commonly used, including b-trees, hashes, and linked lists are all common indexing techniques. But it only relational DBMSs that can take advantage of the inherent independence of the physical implementation and the logical model to create and adapt access plans that will perform reasonably under different kinds of loads agains varying databases for different kinds of applications.

Mapping objects into databases

In recent years, the object-oriented paradigm has been applied to databases as well, creating a new programming model known as object databases. Objects sometimes are mistakenly identified with table rows, while they are really the equivalent of values or variables, depending on the definition du jour. Classes are clearer as corresponding to domains, but still this is blundered on by most DBMS vendors. There is no consistent, precise, agreed-upon Object Oriented Database Model. However, polymorphism complicates OO DBs because clients in general only have access to an abstract pointer to an object to be accessed; in other words, unlike the well-defined relational model, in the object model the application program does not know the concrete type of the objects it is accessing). As in SQL, an object database can include stored behaviors, which can be triggered when operations are performed on the object. This requires maintaining two sets of code: the application code, and the database code. Another approach is merely to store object fields in the database, requiring instantiation of the actual objects within the memory space of the application program before the methods (behaviors) of the object can be accessed (which also raises consistency issues when multiple clients, written in different languages, access objects in the database).

Object and SQL databases have converged over time, and object systems today are often implemented atop a SQL database foundation (mistakenly, since SQL is not really relational, called object-relational databases). One school of of though considers there is a fundamental impedance mismatch between the object and SQL data models, so misnamed object-relational databases require misnamed object-relational mapping software to map between the SQL model (row-oriented, ad-hoc querying) and the object-oriented model (polymorphic types, navigational traversal of objects). This mapping always introduces inefficiency, but may be the only choice for many systems which mandate a SQL database back-end (for legacy reasons). Note that writing "object-relational" mapping software is a non-trivial task due to the polymorphic nature of containers in the object paradigm; in other words, it is not as simple as "take all fields in a record, copy them into the object, and voila! you're done." Due to polymorphism, traversing a link (abstract pointer) to an object or issuing an query on an abstract type returns an object whose concrete type must be known to the relational database back-end (so it can actually cause the construction of the appropriate concrete type and all of the concrete fields) whereby the application program only knows the abstract type (and therefore has access only to the fields and methods defined in the abstract parent class, but is guaranteed polymorphic behavior late binding[?] on the concrete object accessed through the abstract pointer). There are several design patterns and research papers on writing object-relational mapping software[?]; additionally, there are many free-software and commercial solutions, with varying degrees of flexibility.

Another alternative school of though maintains the issue is not a true miss-match and that the issues may be solved by a series of 'inversions' and 'transformations' of the two models.

In a relational DBMS, a user could define his own domains, which could map nicely to classes. So not only most, if not all, of the impedance mismatch problem goes away, but also much application logic and integrity constraints can also be implemented in the database, much simplifying both coding and operation.

Applications of databases

Databases are used in many applications, spanning virtually the entire range of computer software. Databases are the preferred method of storage for large multiuser applications, where coordination between many users is needed. Even individual users find them convenient, though, and many electronic mail programs and personal organizers are based on standard database technology.

Transactions and concurrency

In addition to their data model, most practical databases attempt to enforce a database transaction model that has desirable data integrity properties. The requirements most commonly used are the so-called ACID rules:

Atomicity - either all or no operations are completed. (UNDO)
Consistency - all transactions must leave the database in consistent state
Isolation - transactions cannot interfere with each other
Durability - successful transactions must persist through crashes (REDO)

Concurrency control is a method used to ensure transactions are executed in a safe manner and follows the ACID rules. The DBMS must be able to ensure only serializable, recoverable schedules are allowed, and that no actions of committed transactions are lost while undoing aborted transactions.