(image from DOE Human Genome project (http://www.ornl.gov/hgmis))
Deoxyribonucleic acid (DNA) is the primary chemical component of chromosomes and the material of which genes are made. It is sometimes called the "molecule of heredity," because parents transmit copied portions of their own DNA to offspring during reproduction and because in doing so they propagate their traits.
In fact, the units of DNA that reside in the nucleus of eukaryotic cells, and DNA pieces as people typically think of them, are not single molecules. Rather, they are pairs of molecules, which entwine like vines to form a "double helix" (top half of the illustration at the right).
Each vine-like molecule, or strand of DNA, is a chemically linked chain of nucleotides, which each consist of a deoxyribose sugar, a phosphate, and one of four varieties of "aromatic" bases. Because DNA strands are composed of these nucleotide subunits, they are polymers.
The diversity of the bases means that four distinct kinds of nucleotide exist, which are commonly referred to by the identity of their base. These are adenine (A), thymine (T), cytosine (C), and guanine (G).
In a DNA double helix, two polynucleotide strands come together through complementary pairing of the bases, which occurs by hydrogen bonding. Each base forms hydrogen bonds readily to only one other—A to T and C to G—so that the identity of the base on one strand dictates what base must face it on the opposing strand. Thus the entire nucleotide sequence of each strand is complementary to that of the other, and when separated, each may act as a template with which to replicate the other from free nucleotides (middle and lower half of the illustration at the right).
Because pairing causes the nucleotide bases to face the helical axis, the sugar and phosphate groups of the nucleotides run along the outside, and the two chains they form are sometimes called the "backbones" of the helix. In fact, it is chemical bonds between the phosphates and the sugars that link one nucleotide to the next in the DNA strand.
Because hydrogen bonds are weak compared to covalent chemical bonds, the strands of the double helix can be easily separated by enzymes or even, as in PCR, by gentle heating. On the other hand, gentle heating works only on pieces of DNA that are less than about 10,000 base pairs (10 kilobase pairs, or 10 kbp) long. The intertwining of the DNA strands makes long segments difficult to separate. Enzymes knowns as helicases[?] unwind the strands to facilitate the advance of sequence-reading enzymes such as DNA polymerase. The unwinding requires that helicases chemically cleave the phosphate backbone of one of the strands so that it can swivel around the other.
When the ends of a piece of double-helical DNA are joined so that it forms a circle, as in plasmid DNA, the strands are topologically knotted. This means they cannot be separated by gentle heating or by any process that does not involve breaking a strand. The task of unknotting topologically linked strands of DNA falls to enzymes known as topoisomerases. Some of these enzymes unknot circular DNA by cleaving two strands so that another double-stranded segment can pass through. Unknotting is required for the replication of circular DNA as well as for various types of recombination in linear DNA.
The DNA helix can assume one of three slightly different geometries, of which the "B" form described by James Watson and Francis Crick is believed to predominate in cells. It is 2 nanometers wide and extends 3.4 nanometers per 10 bp of sequence. This is also the approximate length of sequence in which the helix makes one complete turn about its axis (a parameter that depends on stacking interactions between the bases).
The narrow breadth of the double helix makes it impossible to detect by conventional electron microscopy, except by heavy staining. At the same time, the DNA found in many cells can be macroscopic in length—approximately 5 centimeters long for strands in a human chromosome. Consequently, cells must compact or "package" DNA to carry it within them. This is one of the functions of the chromosomes, which contain spool-like proteins known as histones, around which DNA winds.
The B form of the DNA helix twists 360° per 10.6 bp in the absence of strain. But many molecular biological processes can induce strain. A DNA segment with excess or insufficient helical twisting is referred to, respectively, as positively or negatively "supercoiled".
The two other known double-helical forms of DNA, called A and Z, differ modestly in their geometry and dimensions. The A form appears likely to occur only in dehydrated samples of DNA, such those used in crystallography experiments, and possibly in hybrid pairings of DNA and RNA strands. Segments of DNA that cells have methylated for regulatory purposes may adopt the Z geometry, in which the strands turn about the helical axis like a mirror image of the B form.
Within a gene, the identity of the nucleotides and the exact sequence in which they appear along a DNA strand decide the amino acid sequence of a protein. Thus, gene sequences are "translated" into (or "encode") amino acid sequences of proteins. The rules cells use for translation are described by the genetic code.
In many species of organisms, only a small fraction of the total sequence of the genome appears to encode protein. The function of the rest is a matter of speculation. It is known that certain nucleotide sequences specify affinity for DNA binding proteins[?], which play a wide variety of vital roles, such as the control of replication and transcription. These sequences are frequently called regulatory sequences, and researchers assume that so far they have identified only a tiny fraction of the total that exist. "Junk DNA" represents sequences that do not yet appear to contain genes or to have a function.
Sequence also determines a DNA segment's susceptibility to cleavage by restriction enzymes, the quintessential tools of genetic engineering. The position of cleavage sites throughout an individual's genome determines one kind of an individual's "DNA fingerprint".
The asymmetric shape and linkage of nucleotides give DNA strands an orientation or directionality. Because of this discernable directionality, close inspection of a double helix reveals that, although the nucleotides of one strand are "ascending," the others are "descending." This arrangement of DNA strands is sometimes described as "antiparallel."
For reasons of chemical nomenclature, people who work with DNA refer to the asymmetric termini of each strand as the 5' and 3' ends (pronounced "five prime" and "three prime"). DNA workers and enzymes alike always read nucleotide sequences in the "five-prime-to-three-prime" direction.
As a result of their antiparallel arrangement, even if sequences on opposing strands of DNA were not merely complimentary (as they always are), but instead were identical, cells could properly translate a gene into a protein from only one of the two strands: cells can read the sequence of the other strand only in the reverse of the proper order. However of course, the sequences of paired strands do not merely run in opposite directions, their bases at every position are the complement of one another. A translated or translatable sequence is called a "sense" sequence, and its compliment is the "antisense" sequence. Somewhat confusingly, it follows then that the antisense strand is the template for transcription. The resulting transcript is an RNA replica of the sense strand and is itself a sense sequence.
The fact that the 3' end of one DNA strand flanks the 5' end of the other makes the arrangement a "crab canon".
Working in the 19th century, biochemists initially isolated DNA and RNA together from cell nuclei. They were relatively quick to appreciate the polymeric nature of their "nucleic acid" isolates, but realized only later that nucleotides were of two types—one containing ribose and the other deoxyribose. It was this subsequent discovery that led to the identification and naming of DNA as a substance distinct from RNA. Not until 1943 did Oswald Theodore Avery provide the first compelling evidence that DNA could carry genetic information.
How this could be true was unimaginable at the time. Because chemical dissection of DNA samples always yielded the same four nucleotides, the chemical composition of DNA appeared simple, perhaps even uniform. Organisms, on the other hand, are fantastically complex individually and widely diverse collectively. Geneticists did not speak of genes as conveyors of "information" in such words, but if they had, they would not have hesitated to quantify the amount of information that genes need to convey as vast. The idea that information might reside in a chemical in the same way that it exists in text—as a finite alphabet of letters arranged in a sequence of unlimited length—had not yet been conceived. It would emerge upon the discovery of DNA's structure, but few researchers imagined that DNA's structure had much to say about genetics.
In the 1950s, only a few groups made it their goal to determine the structure of DNA. These included an American group led by Linus Pauling, and two in England. At Cambridge University, Crick and Watson were building physical models using metal rods and balls, in which they incorporated the known chemical structures of the nucleotides, as well as the known position of the linkages joining one nucleotide to the next along the polymer. At King's College, London, Maurice Wilkins and Rosalind Franklin were examining x-ray diffraction patterns of DNA fibers.
A key inspiration in the work of all of these teams was the discovery in 1948 by Pauling that many proteins included helical (see alpha helix) shapes. Pauling had deduced this structure from x-ray patterns. Even in the intitial crude diffraction data from DNA, it was evident that the structure involved helices. But this insight was only a beginning. There remained the questions of how many strands came together, whether this number was the same for every helix, whether the bases pointed toward the helical axis or away, and ultimately what were the explicit angles and coordinates of all the bonds and atoms. Such questions motivated the modeling efforts of Watson and Crick.
In their modeling, Watson and Crick restricted themselves to what they saw as chemically and biologically reasonable. Still, the breadth of possibilities was very wide. A breakthrough occurred in 1952, when Erwin Chargaff visited Cambridge and inspired Crick with a description of experiments Chargaff had published in 1947. Chargaff had observed that the proportions of the four nucleotides vary between one DNA sample and the next, but that for particular pairs of nucleotides—adenine and thymine, guanine and cytosine—the two nucleotides are always present in equal proportions.
Watson and Crick had begun to contemplate double helical arrangements, and they saw that by reversing the directionality of one strand with respect to the other, they could provide an explanation for Chargaff's puzzling finding. This explanation was the complementary pairing of the bases, which also had the effect of ensuring that the distance between the phosphate chains did not vary along a sequence. Watson and Crick were able to discern that this distance was constant and to measure its exact value of 2 nanometers from an X-ray pattern obtained by Franklin. The same pattern also gave them the 3.4 nanometer-per-10 bp "pitch" of the helix. The pair quickly converged upon a model, which they announced before Franklin herself published any of her work.
The great assistance Watson and Crick derived from Franklin's data has become a subject of controversy, and it has angered people who believe Franklin has not received the credit due to her. The most controversial aspect is that Franklin's critical X-ray pattern was shown to Watson and Crick without Franklin's knowledge or permission. Wilkins showed it to them at his lab while Franklin was away.
Watson and Crick's model attracted great interest immediately upon its presentation. Arriving at their conclusion on February 21, 1953, Watson and Crick made their first announcement on February 28. Their paper 'A Structure for Deoxyribose Nucleic Acid' (http://www.nature.com/genomics/human/watson-crick/) was published on April 25. In an influential presentation in 1957, Crick laid out the "Central Dogma", which foretold the relationship between DNA, RNA, and proteins, and articulated the "sequence hypothesis." A critical confirmation of the replication mechanism that was implied by the double-helical structure followed in 1958 in the form of the Meselson-Stahl experiment. Work by Crick and coworkers deciphered the genetic code not long afterward. These findings represent the birth of molecular biology.