In higher eukaryotes, DNA regions encoding proteins----that is, genes—lie amidst this expanse of apparently non-functional DNA
In molecular terms, a gene commonly is defined as the entire nucleic acid sequence that is necessary for the synthesis of a functional gene product (polypeptide or RNA)
According to this definition, a gene includes more than the nucleotides encoding the amino acid sequence of a protein, referred to as the coding region
Most genes are transcribed into mRNAs, which encode proteins, clearly, some DNA sequences are transcribed into RNAs that do not encode proteins (e.g., tRNAs and rRNAs)
Within a bacterial polycistronic mRNA, a ribosome-binding site is located near the start site for each of the protein-coding regions, or cistrons, in the mRNA
Translation initiation can begin at any of these multiple internal sites, producing multiple proteins
Most genes in multicellular animals and plants contain introns, which are removed during RNA processing
In many cases, the introns in a gene are considerably longer than the exons
The cluster of genes that form a bacterial operon comprises a single transcription unit, which is transcribed from a particular promoter into a single primary transcript
The primary transcript produced from a simple transcription unit is processed to yield a single type of mRNA, encoding a single protein
Mutations in exons, introns, and transcription-control regions all may influence the expression of the protein encoded by a simple transcription unit
Comparisons of the total chromosomal DNA per cell in various species first suggested that much of the DNA in certain organisms do not encode RNA or have any apparent regulatory or structural function
The unicellular protozoal species Amoeba dubia has 200 times more DNA per cell than humans
Many plant species also have considerably more DNA per cell than humans have
Detailed sequencing and identification of exons in chromosomal DNA have provided direct evidence that the genomes of higher eukaryotes contain large amounts of noncoding DNA.
In multicellular organisms, roughly 25–50 percent of the protein-coding genes are represented only once in the haploid genome and thus are termed solitary genes
A well-studied example of a solitary protein-coding gene is the chicken lysozyme gene
The 15-kb DNA sequence encoding chicken lysozyme constitutes a simple transcription unit containing four exons and three introns
Duplicated genes constitute the second group of protein-coding genes
These are genes with close but nonidentical sequences that generally are located within 5–50 kb of one another
Invertebrate genomes, duplicated genes probably constitute half the protein-coding DNA sequences
The genes encoding the B-like globins are a good example of a gene family
The different B-globin genes probably arose by duplication of an ancestral gene, most likely as the result of an “unequal crossover” during meiotic recombination in a developing germ cell (egg or sperm)
Two regions in the human B-like globin gene cluster contain nonfunctional sequences, called pseudogenes, similar to those of the functional-like globin genes
Several different gene families encode the various proteins that make up the cytoskeleton
These proteins are present in varying amounts in almost all cells
Invertebrates, the major cytoskeletal proteins are the actins, tubulins, and intermediate filament proteins like the keratins
Invertebrates and invertebrates, the genes encoding rRNAs and some other noncoding RNAs such as some of the snRNAs involved in RNA splicing occur as tandemly repeated arrays
These are distinguished from the duplicated genes of gene families in that the multiple tandemly repeated genes encode identical or nearly identical proteins or functional RNAs
The tandemly repeated rRNA, tRNA, and histone genes are needed to meet the great cellular demand for their transcripts
All eukaryotes, including yeasts, contain 100 or more copies of the genes encoding 5S rRNA and the large and small subunit rRNAs
The importance of repeated rRNA genes is illustrated by Drosophila mutants called bobbed (because they have stubby wings), which lack a full complement of the tandemly repeated pre-rRNA genes
Besides duplicated protein-coding genes and tandemly repeated genes, eukaryotic cells contain multiple copies of other DNA sequences in the genome, generally referred to
as repetitious DNA
Of the two main types of repetitious DNA, the less prevalent is simple-sequence DNA, which constitutes about 3 percent of the human genome and is composed of perfect or nearly perfect repeats of relatively short sequences
Simple-sequence DNA is commonly called satellite DNA because in early studies of DNAs from higher organisms using equilibrium buoyant-density ultracentrifugation some simple-sequence DNAs banded at a different position from the bulk of cellular DNA
Simple-sequence DNA located at centromeres may assist in attaching chromosomes to spindle microtubules during mitosis
Within a species, the nucleotide sequences of the repeat units composing simple-sequence DNA tandem arrays are highly conserved among individuals
The second type of repetitious DNA in eukaryotic genomes termed interspersed repeats (also known as moderately repeated DNA, or intermediate-repeat DNA) is composed of a very large number of copies of relatively few sequence families
Because moderately repeated DNA sequences have the unique ability to “move” in the genome, they are called mobile DNA elements (or transposable elements)
Although mobile DNA elements, ranging from hundreds to a few thousand base pairs in length, originally were discovered in eukaryotes, they also are found in prokaryotes
The process by which these sequences are copied and inserted into a new site in the genome is called transposition
Barbara McClintock discovered the first mobile elements while doing classical genetic experiments in maize (corn) during the 1940s
Characterized genetic entities that could move into and back out of genes, changing the phenotype of corn kernels
Theories were very controversial until similar mobile elements were discovered in bacteria, where they were characterized as specific DNA sequences, and the molecular basis of their transposition was deciphered
Most mobile elements in bacteria transpose directly as DNA.
Most mobile elements in eukaryotes are retrotransposons, but eukaryotic DNA transposons also occur
The original mobile elements discovered by Barbara McClintock are DNA transposons
Bacterial Insertion Sequences: The first molecular understanding of mobile elements came from the study of certain E.coli mutations caused by the spontaneous insertion of a DNA sequence, ≈1–2 kb long, into the middle of a gene
Eukaryotic DNA Transposon: McClintock’s original discovery of mobile elements came from the observation of certain spontaneous mutations in maize that affect the production of any of the several enzymes required to make anthocyanin, a purple pigment in maize kernels
Mutant kernels are white, and wild-type kernels are purple
One class of these mutations is revertible at high frequency, whereas the second class of mutations does not revert unless they occur in the presence of the first class of mutations
DNA transposition by the cut-and-paste mechanism can result in an increase in the copy number of a transposon when it occurs during S phase
The period of the cell cycle when DNA synthesis occurs
This happens when the donor DNA is from one of the two daughter DNA molecules in a region of a chromosome that has replicated and the target DNA is in the region that has not yet been replicated
The genomes of all eukaryotes studied from yeast to humans contain retrotransposons
Mobile DNA elements that transpose through an RNA intermediate utilizing a reverse transcriptase
These mobile elements are divided into two major categories, those containing and those lacking long terminal repeats (LTRs)
A key step in the retroviral life cycle is the formation of retroviral genomic RNA from integrated retroviral DNA
This process serves as a model for a generation of the RNA intermediate during the transposition of LTR retrotransposons
The resulting retroviral RNA genome, which lacks a complete LTR, is packaged into a virion that buds from the host cell.
After a retrovirus infects a cell, reverse transcription of its RNA genome by the retrovirus-encoded reverse transcriptase yields a double-stranded DNA containing complete LTRs
Integrase, another enzyme encoded by retroviruses that is closely related to the transposase of some DNA transposons, uses a similar mechanism to insert the double-stranded retroviral DNA into the host cell genome
The most abundant mobile elements in mammals are retrotransposons that lack LTRs, sometimes called non-viral retrotransposons
These moderately repeated DNA sequences form two classes in mammalian genomes: long interspersed elements (LINEs) and short interspersed elements (SINEs)
LINEs: Human DNA contains three major families of LINE sequences that are similar in their mechanism of transposition, but differ in their sequences: L1, L2, and L3
Only members of the L1 family transpose in the contemporary human genome
LINE sequences are present at ≈900,000 sites in the human genome, accounting for a staggering 21 percent of total human DNA
Since LINEs do not contain LTRs, their mechanism of transposition through an RNA intermediate differs from that of LTR retrotransposons
ORF1 and ORF2 proteins are from a LINE RNA
In vitro studies indicate that transcription by RNA polymerase II is directed by promoter sequences at the left end of integrated LINE DNA
LINE RNA is polyadenylated by the same post-transcriptional mechanism that polyadenylates other mRNAs
SINEs The second most abundant class of mobile elements in the human genome, SINEs constitute ≈13 percent of total human DNA
Varying in length from about 100 to 400 base pairs, these retrotransposons do not encode protein, but most contain a 3’ A/T-rich sequence similar to that in LINEs
SINEs are transcribed by RNA polymerase III, the same nuclear RNA polymerase that transcribes genes encoding tRNAs
SINEs occur at about 1.6 million sites in the human genome
Of these, ≈1.1 million are Alu elements, so named because most of them contain a single recognition site for the restriction enzyme AluI
Alu elements exhibit considerable sequence homology with and may have evolved from 7SL RNA, a component of the signal-recognition particle
Similar to other mobile elements, most SINEs have accumulated mutations from the time of their insertion in the germ line of an ancient ancestor of modern humans
Although mobile DNA elements appear to have no direct function other than to maintain their own existence
Their presence probably had a profound impact on the evolution of modern-day organisms
About half the spontaneous mutations in Drosophila result from the insertion of a mobile DNA element into or near a transcription unit
In lineages leading to higher eukaryotes, homologous recombination between mobile DNA elements dispersed throughout ancestral genomes may have generated gene duplications and other DNA rearrangements during evolution
Evidence suggests that during the evolution of higher eukaryotes, recombination between interspersed repeats in introns of two separate genes also occurred
Generating new genes made from novel combinations of preexisting exons
Transcription of many genes is controlled through the combined effects of several enhancer elements
During interphase, when cells are not dividing, the genetic material exists as a nucleoprotein complex called chromatin, which is dispersed through much of the nucleus
Eukaryotic Nuclear DNA Associates
with Histone Proteins to Form Chromatin
When the DNA from eukaryotic nuclei is isolated in isotonic buffers (i.e., buffers with the same salt concentration found in cells, ≈0.15 M KCl), it is associated with an equal mass of protein as chromatin
The most abundant proteins associated with eukaryotic DNA are histones, a family of small, basic proteins present in all eukaryotic nuclei
The five major types of histone proteins—termed H1, H2A, H2B, H3, and H4—are rich in positively charged basic amino acids, which interact with the negatively charged phosphate groups in DNA
The amino acid sequences of four histones (H2A, H2B, H3, and H4) are remarkably similar among distantly related species
The amino acid sequence of H1 varies more from organism to the organism than do the sequences of the other major histones
In certain tissues, H1 is replaced by special histones
When chromatin is extracted from nuclei and examined in the electron microscope, its appearance depends on the salt concentration to which it is exposed
At a low salt concentration in the absence of divalent cations such as Mg+2, isolated chromatin resembles “beads on a string”
Structure of Nucleosomes: The DNA component of nucleosomes is much less susceptible to nuclease digestion than is the linker DNA between them
If nuclease treatment is carefully controlled, all the linker DNA can be digested, releasing individual nucleosomes with their DNA component
Nucleosomes from all eukaryotes contain 147 base pairs of DNA wrapped slightly less than two turns around the protein core
In cells, newly replicated DNA is assembled into nucleosomes shortly after the replication fork passes
But when isolated histones are added to DNA in vitro at physiological salt concentration, nucleosomes do not spontaneously form
Structure of Condensed Chromatin: When extracted from cells in isotonic buffers, most chromatin appears as fibers ≈30 nm in diameter
In these condensed fibers, nucleosomes are thought to be packed into an irregular spiral or solenoid arrangement, with approximately six nucleosomes per turn
The chromatin in chromosomal regions that are not being transcribed exists predominantly in the condensed
Each of the histone proteins making up the nucleosome core contains a flexible amino terminus of 11–37 residues extending from the fixed structure of the nucleosome; these termini are called histone tails
Each H2A also contains a flexible C-terminal tail
The histone tails are required for chromatin to condense from the beads-on-a-string conformation into the 30-nm fiber
The histone tails can also bind to other proteins associated with chromatin that influence chromatin structure and processes such as transcription and DNA replication
The interaction of histone tails with these proteins can be regulated by a variety of covalent modifications of histone tail amino acid side chains
Multiple types of covalent modifications of histone tails can influence chromatin structure by altering histone-DNA interactions and interactions between nucleosomes and by controlling interactions with additional proteins that participate in the regulation of transcription
Genetic studies in yeast indicate that specific histone acetylases are required for the full activation of transcription of a number of genes
Although histones are the predominant proteins in chromosomes, nonhistone proteins are also involved in organizing chromosome structure
Electron micrographs of histone-depleted metaphase chromosomes from HeLa cells
reveal long loops of DNA anchored to a chromosome scaffold composed of nonhistone proteins
Generally, SARs are found between transcription units
Genes are located primarily within chromatin loops, which are attached at their bases to a chromosome scaffold
Experiments with transgenic mice indicate that in some cases SARs are required for transcription of neighboring gene
Individual interphase chromosomes, which are less condensed than metaphase chromosomes, cannot be resolved by standard microscopy or electron microscopy
The total mass of the histones associated with DNA in chromatin is about equal to that of the DNA
Interphase chromatin and metaphase chromosomes also contain small amounts of a complex set of other proteins
A few other nonhistone DNA-binding proteins are present in much larger amounts than the transcription or replication factors
Some of these exhibit high mobility during electrophoretic separation and thus have been designated HMG (high-mobility group) proteins
In lower eukaryotes, the sizes of the largest DNA molecules that can be extracted indicate that each chromosome contains a single DNA molecule
The eukaryotic chromosome is a linear structure composed of an immensely long, single DNA molecule that is wound around histone octamers about every 200 bp, forming strings of closely packed nucleosomes
Nucleosomes fold to form a 30-nm chromatin fiber, which is attached to a flexible protein scaffold at intervals of millions of base pairs, resulting in long loops of chromatin extending from the scaffold
Early microscopic observations on the number and size of chromosomes and their staining patterns led to the discovery of many important general characteristics of chromosome structure
In nondividing cells, individual chromosomes are not visible, even with the aid of histologic stains for DNA (e.g., Feulgen or Giemsa stains) or electron microscopy
During mitosis and meiosis, but, the chromosomes condense and become visible in the light microscope
Almost all cytogenetic work (i.e., studies of chromosome morphology) has been done with condensed metaphase chromosomes obtained from dividing cells—either somatic cells in mitosis or dividing gametes during meiosis
Certain dyes selectively stain some regions of metaphase chromosomes more intensely than other regions, producing characteristic banding patterns that are specific for individual chromosomes
G bands are produced when metaphase chromosomes are subjected briefly to mild heat or proteolysis and then stained with Giemsa reagent, a permanent DNA dye
A recently developed method for visualizing each of the human chromosomes in distinct, bright colors, called chromosome painting, greatly simplifies differentiating chromosomes of similar size and shape
The larval salivary glands of Drosophila species and other dipteran insects contain enlarged interphase chromosomes that are visible in the light microscope
When fixed and stained, these polytene chromosomes are characterized by a large number of reproducible
Well-demarcated bands that have been assigned a standardized number
The highly reproducible banding pattern seen in Drosophila salivary gland chromosomes provides an extremely powerful method for locating specific DNA sequences along the lengths of the chromosomes in this species
A generalized amplification of DNA gives rise to the polytene chromosomes found in the salivary glands of Drosophila
This process, termed polytenization, occurs when the DNA repeatedly replicates, but the daughter chromosomes do not separate
As cells exit from mitosis and the condensed chromosomes uncoil, certain sections of the chromosomes remain dark-staining
The dark-staining areas, termed heterochromatin, are regions of condensed chromatin
The light-staining, less condensed portions of chromatin are called euchromatin
In mammalian cells, heterochromatin appears as darkly staining regions of the nucleus, often associated with the nuclear envelope
Pulse labeling with 3H-uridine and autoradiography have shown that most transcription occurs in regions of euchromatin and the nucleolus
Although chromosomes differ in length and number between species, cytogenetic studies have shown that they all behave similarly at the time of cell division
Replication of DNA begins from sites that are scattered throughout eukaryotic chromosomes
The yeast genome contains many ≈100-bp sequences, called autonomously replicating sequences (ARSs), that act as replication origins
Even though circular ARS-containing plasmids can replicate in yeast cells, only about 5–20 percent of progeny cells contain the plasmid because mitotic segregation of the plasmids is faulty
If circular plasmids containing an ARS and CEN sequence are cut once with a restriction enzyme, the resulting linear plasmids do not produce LEU colonies unless they contain special telomeric (TEL) sequences ligated to their ends
Once the yeast centromere regions that confer mitotic segregation were cloned, their sequences could be determined and compared, revealing three regions (I, II, and III) conserved between them
In the fission yeast S. pombe, centromeres are ≈40 kb in length and are composed of repeated copies of sequences similar to those in S. cerevisiae centromeres
Multiple copies of proteins homologous to those that interact with the S. cerevisiae centromeres bind to these complex S. pombe centromeres and in turn bind the much longer S. pombe chromosomes to several microtubules of the mitotic spindle apparatus
In higher eukaryotes, a complex protein structure called the kinetochore assembles at centromeres and associates with multiple mitotic spindle fibers during mitosis
Sequencing of telomeres from a dozen or so organisms, including humans, has shown that most are repetitive oligomers with a high G content in the strand with its 3’ end at the end of the chromosome
The telomere repeat sequence in humans and other vertebrates is TTAGGG
Eukaryotic chromosomes are apparent when we consider that all known DNA polymerases elongate DNA chains at the 3’ end, and all require an RNA or DNA primer
The problem of telomere shortening is solved by an enzyme that adds telomeric sequences to the ends of each chromosome
The enzyme is a protein and RNA complex called telomere terminal transferase, or telomerase
While telomerase prevents telomere shortening in most eukaryotes, some organisms use alternative strategies
The research on circular and linear plasmids in yeast identified all the basic components of a yeast artificial chromosome (YAC)
To construct YACs, TEL sequences from yeast cells or from the protozoan Tetrahymena are combined with yeast CEN and ARS sequences
To these are added DNA with selectable yeast genes and enough DNA from any source to make a total of more than 50 kb
The vast majority of DNA in most eukaryotes is found in the nucleus, some DNA is present within the mitochondria of animals, plants, and fungi and within the chloroplasts of plants
These organelles are the main cellular sites for ATP formation, during oxidative phosphorylation in mitochondria and photosynthesis in chloroplasts
Individual mitochondria are large enough to be seen under the light microscope, and even the mitochondrial DNA (mtDNA) can be detected by fluorescence microscopy
The mtDNA is located in the interior of the mitochondrion, the region known as the matrix
Since the dyes used to visualize nuclear and mitochondrial DNA do not affect cell growth or division, replication of mtDNA and division of the mitochondrial network can be followed in living cells using time-lapse microscopy
Such studies show that in most organisms mtDNA replicates throughout interphase
Studies of mutants in yeasts and other single-celled organisms first indicated that mitochondria exhibit cytoplasmic inheritance and thus must contain their own genetic system
In the mating by fusion of haploid yeast cells, both parents contribute equally to the cytoplasm of the resulting diploid; thus inheritance of mitochondria is biparental
The entire mitochondrial genome from a number of different organisms has now been cloned and sequenced, and mtDNAs from all these sources have been found to encode rRNAs, tRNAs, and essential mitochondrial proteins
Most proteins localized in mitochondria, such as the mitochondrial RNA and DNA polymerases, are synthesized on cytosolic ribosomes and are imported into the organelle by processes
Surprisingly, the size of the mtDNA, the number and nature of the proteins it encodes, and even the mitochondrial genetic code itself vary greatly between different organisms
Human mtDNA, a circular molecule that has been completely sequenced, is among the smallest known mtDNAs, containing 16,569 base pairs
The mtDNA from most multicellular animals (metazoans) is about the same size as human mtDNA and encodes similar gene products
Differences in the size and coding capacity of mtDNA from various organisms most likely reflect the movement of DNA between mitochondria and the nucleus during evolution
Direct evidence for this movement comes from the observation that several proteins encoded by mtDNA in some species are encoded by nuclear DNA in others
All RNA transcripts of mtDNA and their translation products remain in the mitochondrion, and all mtDNA-encoded proteins are synthesized on mitochondrial ribosomes
Mitochondria encode the rRNAs that form mitochondrial ribosomes, although all but one or two of the ribosomal proteins (depending on the species) are imported from the cytosol
In most eukaryotes, all the tRNAs used for protein synthesis in mitochondria are encoded by mtDNAs
The genetic code used in animal and fungal mitochondria is different from the standard code used in all prokaryotic and eukaryotic nuclear genes; remarkably
The code even differs in mitochondria from different species
The severity of disease caused by a mutation in mtDNA depends on the nature of the mutation and on the proportion of mutant and wild-type mtDNAs present in a particular
cell type
Generally, when mutations in mtDNA are found, cells contain mixtures of wild-type and mutant mtDNAs—a condition known as heteroplasmy
Each time a mammalian somatic or germ-line cell divides, the mutant and wild-type mtDNAs will segregate randomly into the daughter cells, as occurs in yeast cells
The structure of chloroplasts is similar in many respects to that of mitochondria
Like mitochondria, chloroplasts contain multiple copies of the organellar DNA and ribosomes
Which synthesize some chloroplast-encoded proteins using the “standard” genetic code
Other chloroplast proteins are fabricated on cytosolic ribosomes and are incorporated into the organelle after translation
Although the overall organization of chloroplast DNAs from different species is similar, some differences in gene composition occur