Chapter 11 - Transcriptional Control of Gene Expression
Synthesis of mRNA requires that an RNA polymerase initiate transcription, polymerize ribonucleoside triphosphates complementary to the DNA coding strand, and then terminate transcription
In bacteria, gene control serves mainly to allow a single cell to adjust to changes in its environment so that its growth and division can be optimized
In multicellular organisms, environmental changes also induce changes in gene expression
In most cases, once a developmental step has been taken by a cell, it is not reversed
So these decisions are fundamentally different from the reversible activation and repression of bacterial genes in response to environmental conditions
Direct measurements of the transcription rates of multiple genes in different cell types have shown that regulation of transcription initiation is the most widespread form of gene control in eukaryotes, as it is in bacteria
The nascent-chain analysis is a common method for determining the relative rates of transcription of different genes in cultured cells
The total radioactive label incorporated into RNA is a measure of the overall transcription rate
The fraction of the total labeled RNA produced by transcription of a particular gene—that is
Its relative transcription rate—is determined by hybridizing the labeled RNA to the cloned DNA of that gene attached to a membrane
In eukaryotes, as in bacteria, a DNA sequence that specifies where RNA polymerase binds and initiates transcription of a gene is called a promoter
Transcription from a particular promoter is controlled by DNA-binding proteins, termed transcription factors, that are equivalent to bacterial repressors and activators
By constructing and analyzing a 5-deletion series upstream of the TTR gene, researchers identified two control elements that stimulate reporter-gene expression in hepatocytes, but not in other cell types
One region mapped between ≈2.01 and 1.85 kb upstream of the TTR gene start site; the other mapped between ≈200 base pairs upstream and the start site
The nuclei of all eukaryotic cells examined so far (e.g., vertebrate, Drosophila, yeast, and plant cells) contain three different RNA Polymerases, designated I, II, and III
These enzymes are eluted at different salt concentrations during ion-exchange chromatography and also differ in their sensitivity to -amanitin, a poisonous cyclic octapeptide produced by some mushrooms
Each eukaryotic RNA polymerase catalyzes the transcription of genes encoding different classes of RNA. RNA polymerase I, located in the nucleolus, transcribes genes encoding precursor rRNA (pre-rRNA), which is processed into 28S, 5.8S, and 18S rRNAs
RNA polymerase III transcribes genes encoding tRNAs, 5S rRNA, and an array of small, stable RNAs
Including one involved in RNA splicing (U6) and the RNA component of the signal-recognition particle (SRP) involved in directing nascent proteins to the endoplasmic reticulum
The two large subunits (RPB1 and RPB2) of all three eukaryotic RNA polymerases are related to each other and are similar to the E. coli and subunits
Likely that all the subunits are necessary for eukaryotic RNA polymerases to function normally
The carboxyl end of the largest subunit of RNA polymerase II (RPB1) contains a stretch of seven amino acids that is nearly precisely repeated multiple times
Neither RNA polymerase I nor III contains these repeating units
This heptapeptide repeat, with a consensus sequence of Tyr-Ser-Pro-Thr-Ser-Pro-Ser, is known as the carboxyl-terminal domain (CTD)
In vitro experiments with model promoters first showed that RNA polymerase II molecules that initiate transcription have an unphosphorylated CTD
Once the polymerase initiates transcription and begins to move away from the promoter, many of the serine and some tyrosine residues in the CTD are phosphorylated
Several experimental approaches have been used to identify DNA sequences at which RNA polymerase II initiates transcription
Approximate mapping of the transcription start site is possible by exposing cultured cells or isolated nuclei to 32P-labeled ribonucleotides for very brief times
The precise base pair where RNA polymerase II initiates transcription in the adenovirus late transcription unit was determined by analyzing the RNAs synthesized
During in vitro transcription of adenovirus DNA restriction fragments that extended somewhat upstream and downstream of the approximate initiation region determined by nascent-transcript analysis
Similar in vitro transcription assays with other cloned eukaryotic genes have produced similar results
In each case, the start site was found to be equivalent to the capped 5’ sequence of the corresponding mRNA
Expression of eukaryotic protein-coding genes is regulated by multiple protein-binding DNA sequences, generically referred to as transcription-control regions
The first genes to be sequenced and studied in in vitro transcription systems were viral genes and cellular protein-coding genes that are very actively transcribed either at particular times of the cell cycle or in specific differentiated cell types
In all these rapidly transcribed genes, a conserved sequence called the TATA box was found ≈25–35 base pairs upstream of the start site
Instead of a TATA box, some eukaryotic genes contain an alternative promoter element called an initiator
Most naturally occurring initiator elements have a cytosine (C) at the -1 position and an adenine (A) residue at the transcription start site (+1)
Recombinant DNA techniques have been used to systematically mutate the nucleotide sequences upstream of the start sites of various eukaryotic genes in order to identify transcription-control regions
By now, hundreds of eukaryotic genes have been analyzed, and scores of transcription-control regions have been identified
One approach frequently taken to determine the upstream border of a transcription-control region for a mammalian gene involves constructing a set of 5 deletions
Once the 5 borders of a transcription-control region is determined, analysis of linker scanning mutations can pinpoint the sequences with regulatory functions that lie between the border and the transcription start site
Changes in spacing between the promoter and promoter-proximal control elements of 20 nucleotides or fewer had little effect
Insertions of 30 to 50 base pairs between a promoter-proximal element and the TATA box was equivalent to deleting the element
Similar analyses of other eukaryotic promoters have also indicated that considerable flexibility in the spacing between promoter-proximal elements is generally tolerated
But separations of several tens of base pairs may decrease transcription
Transcription from many eukaryotic promoters can be stimulated by control elements located thousands of base pairs away from the start site
Such long-distance transcription-control elements, referred to as enhancers, are common in eukaryotic genomes but fairly rare in bacterial genomes
Soon after the discovery of the SV40 enhancer, enhancers were identified in other viral genomes and in eukaryotic cellular DNA
Some of these control elements are located 50 or more kilobases from the promoter they control
Initially, enhancers and promoter-proximal elements were thought to be distinct types of transcription-control elements
As more enhancers and promoter-proximal elements were analyzed, the distinctions between them became less clear
The S. cerevisiae genome contains regulatory elements called upstream activating sequences (UASs)
Which function similarly to enhancers and promoter-proximal elements in higher eukaryotes
The various transcription-control elements found in eukaryotic DNA are binding sites for regulatory proteins
In yeast, Drosophila, and other genetically tractable eukaryotes, numerous genes encoding transcriptional activators and repressors have been identified by classical genetic analyses
Two common techniques for detecting such cognate proteins are DNase I footprinting and the electrophoretic mobility shift assay
DNase I footprinting takes advantage of the fact that when a protein is bound to a region of DNA, it protects that DNA sequence from digestion by nucleases
Footprinting also identifies the specific DNA sequence to which the transcription factor binds
The electrophoretic mobility shift assay (EMSA), also called the gel-shift or band-shift assay, is more useful than the footprinting assay for quantitative analysis of DNA-binding proteins
Generally, the electrophoretic mobility of a DNA fragment is reduced when it is complexed to protein, causing a shift in the location of the fragment band
In the biochemical isolation of a transcription factor, an extract of cell nuclei commonly is subjected sequentially to several types of column chromatography
Once a transcription factor is isolated and purified, its partial amino acid sequence can be determined and used to clone the gene or cDNA encoding it
Studies with a yeast transcription activator called GAL4 provided early insight into the domain structure of transcription factors
The gene encoding the GAL4 protein, which promotes the expression of enzymes needed to metabolize galactose, was identified by complementation analysis of gal4 mutants
A remarkable set of experiments with gal4 deletion mutants demonstrated that the GAL4 transcription factor is composed of separable functional domains: an N-terminal DNA-binding domain
Which binds to specific DNA sequences, and a C-terminal activation domain, which interacts with other proteins to stimulate transcription from a nearby promoter
The presence of flexible domains connecting the DNA-binding domains to activation domains may explain why alterations in the spacing between control elements are so well-tolerated in eukaryotic control regions
Eukaryotic transcription is regulated by repressors as well as activators
A type of unregulated, abnormally high expression is called constitutive expression and results from the inactivation of a repressor that normally inhibits the transcription of these genes
Repressor-binding sites in DNA have been identified by systematic linker scanning mutation
In this type of analysis, mutation of an activator-binding site leads to decreased expression of the linked reporter gene
Whereas mutation of a repressor-binding site leads to increased expression of a reporter gene
Eukaryotic transcription repressors are the functional converse of activators
They can inhibit transcription from a gene they do not normally regulate when their cognate binding sites are placed within a few hundred base pairs of the gene’s start site
The DNA-binding domains of eukaryotic activators and repressors contain a variety of structural motifs that bind specific DNA sequences
The ability of DNA-binding proteins to bind to specific DNA sequences commonly results from noncovalent interactions between atoms in an ox helix in the DNA-binding domain and atoms on the edges of the bases within a major groove in the DNA
A structural element, which is present in many bacterial repressors, is called a helix-turn-helix motif
There are several common classes of DNA-binding proteins whose three-dimensional structures have been determined
In all these examples and many other transcription factors, at least one ox helix is inserted into a major groove of DNA
Homeodomain Proteins: Many eukaryotic transcription factors that function during development contain a conserved 60-residue DNA-binding motif that is similar to the helix-turn-helix motif of bacterial repressors
Zinc-Finger Proteins: A number of different eukaryotic proteins have regions that fold around a central Zn2 ion, producing a compact domain from a relatively short length of the polypeptide chain
The C2H2 zinc finger is the most common DNA-binding motif encoded in the human genome and the genomes of most other multicellular animals
It is also common in multicellular plants but is not the dominant type of DNA-binding domain in plants as it is in animals
The second type of zinc-finger structure, designated the C4 zinc finger (because it has four conserved cysteines in contact with the Zn2), is found in ≈50 human transcription factors
A characteristic feature of C4 zinc fingers is the presence of two groups of four critical cysteines, one toward each end of the 55- or 56-residue domain
Leucine-Zipper Proteins Another structural motif present in the DNA-binding domains of a large class of transcription factors contains the hydrophobic amino acid leucine at every seventh position in the sequence
These proteins bind to DNA as dimers, and mutagenesis of the leucines showed that they were required for dimerization
GCN4 forms dimers via hydrophobic interactions between the C-terminal regions of the ox helices, forming a coiled-coil structure
This structure is common in proteins containing amphipathic ox helices in which hydrophobic amino acid residues are regularly spaced alternately three or four positions apart in the sequence, forming a stripe down one side of the ox helix
The first leucine-zipper transcription factors to be analyzed contained leucine residues at every seventh position in the dimerization region
Additional DNA-binding proteins containing other hydrophobic amino acids in these positions subsequently were identified
Basic Helix-Loop-Helix (bHLH) Proteins: The DNA-binding domain of another class of dimeric transcription factors contains a structural motif very similar to the basic-zipper motif except that a non-helical loop of the polypeptide chain separates two ox-helical regions in each monomer
Two types of DNA-binding proteins discussed in the previous section—basic-zipper proteins and bHLH proteins—often exist in alternative heterodimeric combinations of monomers
In some heterodimeric transcription factors, each monomer has a different DNA-binding specificity
The resulting combinatorial possibilities increase the number of potential DNA sequences that a family of transcription factors can bind
Three different factor monomers theoretically could combine to form six homo- and heterodimeric factors
Four different factor monomers could form a total of 10 dimeric factors; five monomers, 16 dimeric factors; and so forth
Similar combinatorial transcriptional regulation is achieved through the interaction of structurally unrelated
Neither NFAT nor AP1 binds to its site in the IL-2 control region in the absence of the other
The affinities of the factors for these particular DNA sequences are too low for the individual factors to form a stable complex with DNA
But, when both NFAT and AP1 are present, protein-protein interactions between them stabilize the DNA ternary complex composed of NFAT, AP1, and DNA
Cooperative binding by NFAT and AP1 occurs only when their weak binding sites are located at a precise distance, quite close to each other in DNA
Recent studies have shown that the requirements for cooperative binding are not so stringent in the case of some other transcription factors and control regions
Experiments with fusion proteins composed of the GAL4 DNA-binding domain and random segments of E. coli proteins demonstrated that a diverse group of amino acid sequences can function as activation domains
1% of all E. coli sequences, even though they evolved to perform other functions
Biophysical studies indicate that acidic activation domains have an unstructured, random-coil conformation
These domains stimulate transcription when they are bound to a protein co-activator
The interaction with a co-activator causes the activation domain to assume a more structured -helical conformation in the activation domain–co-activator complex
Some activation domains are larger and more highly structured than acidic activation domains
As noted previously, enhancers generally range in length from about 50 to 200 base pairs and include binding sites for several transcription factors
The multiple transcription factors that bind to a single enhancer are thought to interact
The term enhanceosome has been coined to describe such large nucleoprotein complexes that assemble from transcription factors as they bind cooperatively to their multiple binding sites in an enhancer
HMGI binds to the minor groove of DNA regardless of the sequence and, as a result, bends the DNA molecule sharply
In vitro transcription by purified RNA polymerase II requires the addition of several initiation factors that are separated from the polymerase during purification
These initiation factors, which position polymerase molecules at transcription start sites and help to melt the DNA strands so that the template strand can enter the active site of the enzyme, are called general transcription factors
The general transcription factors that assist Pol II in the initiation of transcription from most TATA-box promoters in vitro have been isolated and characterized
Detailed biochemical studies revealed how the Pol II preinitiation complex
Comprising a Pol II molecule and general transcription factors bound to a promoter region of DNA, is assembled
Once TBP has bound to the TATA box, TFIIB can bind
TFIIB is a monomeric protein, slightly smaller than TBP
The C-terminal domain of TFIIB makes contact with both TBP and DNA on either side of the TATA-box, while its N-terminal domain extends toward the transcription start site
The helicase activity of one of the TFIIH subunits uses energy from ATP hydrolysis to unwind the DNA duplex at the start site
Allowing Pol II to form an open complex in which the DNA duplex surrounding the start site is melted and the template strand is bound at the polymerase active site
Although the general transcription factors discussed above allow Pol II to initiate transcription in vitro, another general transcription factor, TFIIA, is required for initiation by Pol II in vivo
Purified TFIIA forms a complex with TBP and TATA-box DNA. X-ray crystallography of this complex shows that TFIIA interacts with the side of TBP that is upstream from the direction of transcription
The TAF subunits of TFIID appear to play a role in initiating transcription from promoters that lack a TATA box
For many years it has been clear that inactive genes in eukaryotic cells are often associated with heterochromatin
Regions of chromatin that are more highly condensed and stain more darkly with DNA dyes than euchromatin, where most transcribed genes are located
Regions of chromosomes near the centromeres and telomeres and additional specific regions that vary in different cell types are organized into heterochromatin
The promoters and UASs controlling transcription of the a and genes lie near the center of the DNA sequence that is transferred and are identical whether the sequences are at the MAT locus or at one of the silent loci
Consequently, the function of the transcription factors that interact with these sequences is somehow blocked at HML and HMR
Researchers found that GATC sequences within the MAT locus and most other regions of the genome in these cells were methylated, but not those within the HML and HMR loci
These results indicate that the DNA of the silent loci is inaccessible to the E. coli methylase and presumably to proteins in general, including transcription factors and RNA polymerase
Genetic studies led to the identification of several proteins, RAP1, and three SIR proteins, that are required for repression of the silent mating-type loci and the telomeres in yeast
RAP1 was found to bind within the DNA silencer sequences associated with HML and HMR and to a sequence that is repeated multiple times at each yeast chromosome telomere
The importance of histone deacetylation in chromatin-mediated gene repression has been further supported by studies of eukaryotic repressors that regulate genes at internal chromosomal positions
These proteins are now known to act in part by causing deacetylation of histone tails in nucleosomes that bind to the TATA box and promoter-proximal region of the genes they repress
The SIN3-RPD3 complex functions as a co-repressor
Co-repressor complexes containing histone deacetylases also have been found associated with many repressors from mammalian cells
Some of these complexes contain the mammalian homolog of SIN3 (mSin3), which interacts with the repressor protein
Other histone deacetylase complexes identified in mammalian cells appear to contain additional or different repressor-binding proteins
The discovery of mSin3-containing histone deacetylase complexes provides an explanation for earlier observations
Invertebrates transcriptionally inactive DNA regions often contain the modified cytidine residue 5-methylcytidine (mC) followed immediately by a G, whereas transcriptionally active DNA regions lack mC residues
Genetic and biochemical studies in yeast led to the discovery of a large multiprotein complex containing the protein GCN5, which has histone acetylase activity
Maximal transcription activation by GCN4 depends on these histone acetylase complexes, which thus function as co-activators
A similar activation mechanism operates in higher eukaryotes
One domain of CBP binds the phosphorylated acidic activation domain in the CREB transcription factor
Other domains of CBP interact with different activation domains in other transcription factors
Yet another domain of CBP has histone acetylase activity, and another CBP domain associates with a multiprotein histone acetylase complex that is homologous to the yeast GCN5-containing complex
Histone tails in chromatin can undergo reversible phosphorylation of serine and threonine residues
Reversible monoubiquitination of a lysine residue in the H2A C-terminal tail, and irreversible methylation of lysine residues
Synthesis of mRNA requires that an RNA polymerase initiate transcription, polymerize ribonucleoside triphosphates complementary to the DNA coding strand, and then terminate transcription
In bacteria, gene control serves mainly to allow a single cell to adjust to changes in its environment so that its growth and division can be optimized
In multicellular organisms, environmental changes also induce changes in gene expression
In most cases, once a developmental step has been taken by a cell, it is not reversed
So these decisions are fundamentally different from the reversible activation and repression of bacterial genes in response to environmental conditions
Direct measurements of the transcription rates of multiple genes in different cell types have shown that regulation of transcription initiation is the most widespread form of gene control in eukaryotes, as it is in bacteria
The nascent-chain analysis is a common method for determining the relative rates of transcription of different genes in cultured cells
The total radioactive label incorporated into RNA is a measure of the overall transcription rate
The fraction of the total labeled RNA produced by transcription of a particular gene—that is
Its relative transcription rate—is determined by hybridizing the labeled RNA to the cloned DNA of that gene attached to a membrane
In eukaryotes, as in bacteria, a DNA sequence that specifies where RNA polymerase binds and initiates transcription of a gene is called a promoter
Transcription from a particular promoter is controlled by DNA-binding proteins, termed transcription factors, that are equivalent to bacterial repressors and activators
By constructing and analyzing a 5-deletion series upstream of the TTR gene, researchers identified two control elements that stimulate reporter-gene expression in hepatocytes, but not in other cell types
One region mapped between ≈2.01 and 1.85 kb upstream of the TTR gene start site; the other mapped between ≈200 base pairs upstream and the start site
The nuclei of all eukaryotic cells examined so far (e.g., vertebrate, Drosophila, yeast, and plant cells) contain three different RNA Polymerases, designated I, II, and III
These enzymes are eluted at different salt concentrations during ion-exchange chromatography and also differ in their sensitivity to -amanitin, a poisonous cyclic octapeptide produced by some mushrooms
Each eukaryotic RNA polymerase catalyzes the transcription of genes encoding different classes of RNA. RNA polymerase I, located in the nucleolus, transcribes genes encoding precursor rRNA (pre-rRNA), which is processed into 28S, 5.8S, and 18S rRNAs
RNA polymerase III transcribes genes encoding tRNAs, 5S rRNA, and an array of small, stable RNAs
Including one involved in RNA splicing (U6) and the RNA component of the signal-recognition particle (SRP) involved in directing nascent proteins to the endoplasmic reticulum
The two large subunits (RPB1 and RPB2) of all three eukaryotic RNA polymerases are related to each other and are similar to the E. coli and subunits
Likely that all the subunits are necessary for eukaryotic RNA polymerases to function normally
The carboxyl end of the largest subunit of RNA polymerase II (RPB1) contains a stretch of seven amino acids that is nearly precisely repeated multiple times
Neither RNA polymerase I nor III contains these repeating units
This heptapeptide repeat, with a consensus sequence of Tyr-Ser-Pro-Thr-Ser-Pro-Ser, is known as the carboxyl-terminal domain (CTD)
In vitro experiments with model promoters first showed that RNA polymerase II molecules that initiate transcription have an unphosphorylated CTD
Once the polymerase initiates transcription and begins to move away from the promoter, many of the serine and some tyrosine residues in the CTD are phosphorylated
Several experimental approaches have been used to identify DNA sequences at which RNA polymerase II initiates transcription
Approximate mapping of the transcription start site is possible by exposing cultured cells or isolated nuclei to 32P-labeled ribonucleotides for very brief times
The precise base pair where RNA polymerase II initiates transcription in the adenovirus late transcription unit was determined by analyzing the RNAs synthesized
During in vitro transcription of adenovirus DNA restriction fragments that extended somewhat upstream and downstream of the approximate initiation region determined by nascent-transcript analysis
Similar in vitro transcription assays with other cloned eukaryotic genes have produced similar results
In each case, the start site was found to be equivalent to the capped 5’ sequence of the corresponding mRNA
Expression of eukaryotic protein-coding genes is regulated by multiple protein-binding DNA sequences, generically referred to as transcription-control regions
The first genes to be sequenced and studied in in vitro transcription systems were viral genes and cellular protein-coding genes that are very actively transcribed either at particular times of the cell cycle or in specific differentiated cell types
In all these rapidly transcribed genes, a conserved sequence called the TATA box was found ≈25–35 base pairs upstream of the start site
Instead of a TATA box, some eukaryotic genes contain an alternative promoter element called an initiator
Most naturally occurring initiator elements have a cytosine (C) at the -1 position and an adenine (A) residue at the transcription start site (+1)
Recombinant DNA techniques have been used to systematically mutate the nucleotide sequences upstream of the start sites of various eukaryotic genes in order to identify transcription-control regions
By now, hundreds of eukaryotic genes have been analyzed, and scores of transcription-control regions have been identified
One approach frequently taken to determine the upstream border of a transcription-control region for a mammalian gene involves constructing a set of 5 deletions
Once the 5 borders of a transcription-control region is determined, analysis of linker scanning mutations can pinpoint the sequences with regulatory functions that lie between the border and the transcription start site
Changes in spacing between the promoter and promoter-proximal control elements of 20 nucleotides or fewer had little effect
Insertions of 30 to 50 base pairs between a promoter-proximal element and the TATA box was equivalent to deleting the element
Similar analyses of other eukaryotic promoters have also indicated that considerable flexibility in the spacing between promoter-proximal elements is generally tolerated
But separations of several tens of base pairs may decrease transcription
Transcription from many eukaryotic promoters can be stimulated by control elements located thousands of base pairs away from the start site
Such long-distance transcription-control elements, referred to as enhancers, are common in eukaryotic genomes but fairly rare in bacterial genomes
Soon after the discovery of the SV40 enhancer, enhancers were identified in other viral genomes and in eukaryotic cellular DNA
Some of these control elements are located 50 or more kilobases from the promoter they control
Initially, enhancers and promoter-proximal elements were thought to be distinct types of transcription-control elements
As more enhancers and promoter-proximal elements were analyzed, the distinctions between them became less clear
The S. cerevisiae genome contains regulatory elements called upstream activating sequences (UASs)
Which function similarly to enhancers and promoter-proximal elements in higher eukaryotes
The various transcription-control elements found in eukaryotic DNA are binding sites for regulatory proteins
In yeast, Drosophila, and other genetically tractable eukaryotes, numerous genes encoding transcriptional activators and repressors have been identified by classical genetic analyses
Two common techniques for detecting such cognate proteins are DNase I footprinting and the electrophoretic mobility shift assay
DNase I footprinting takes advantage of the fact that when a protein is bound to a region of DNA, it protects that DNA sequence from digestion by nucleases
Footprinting also identifies the specific DNA sequence to which the transcription factor binds
The electrophoretic mobility shift assay (EMSA), also called the gel-shift or band-shift assay, is more useful than the footprinting assay for quantitative analysis of DNA-binding proteins
Generally, the electrophoretic mobility of a DNA fragment is reduced when it is complexed to protein, causing a shift in the location of the fragment band
In the biochemical isolation of a transcription factor, an extract of cell nuclei commonly is subjected sequentially to several types of column chromatography
Once a transcription factor is isolated and purified, its partial amino acid sequence can be determined and used to clone the gene or cDNA encoding it
Studies with a yeast transcription activator called GAL4 provided early insight into the domain structure of transcription factors
The gene encoding the GAL4 protein, which promotes the expression of enzymes needed to metabolize galactose, was identified by complementation analysis of gal4 mutants
A remarkable set of experiments with gal4 deletion mutants demonstrated that the GAL4 transcription factor is composed of separable functional domains: an N-terminal DNA-binding domain
Which binds to specific DNA sequences, and a C-terminal activation domain, which interacts with other proteins to stimulate transcription from a nearby promoter
The presence of flexible domains connecting the DNA-binding domains to activation domains may explain why alterations in the spacing between control elements are so well-tolerated in eukaryotic control regions
Eukaryotic transcription is regulated by repressors as well as activators
A type of unregulated, abnormally high expression is called constitutive expression and results from the inactivation of a repressor that normally inhibits the transcription of these genes
Repressor-binding sites in DNA have been identified by systematic linker scanning mutation
In this type of analysis, mutation of an activator-binding site leads to decreased expression of the linked reporter gene
Whereas mutation of a repressor-binding site leads to increased expression of a reporter gene
Eukaryotic transcription repressors are the functional converse of activators
They can inhibit transcription from a gene they do not normally regulate when their cognate binding sites are placed within a few hundred base pairs of the gene’s start site
The DNA-binding domains of eukaryotic activators and repressors contain a variety of structural motifs that bind specific DNA sequences
The ability of DNA-binding proteins to bind to specific DNA sequences commonly results from noncovalent interactions between atoms in an ox helix in the DNA-binding domain and atoms on the edges of the bases within a major groove in the DNA
A structural element, which is present in many bacterial repressors, is called a helix-turn-helix motif
There are several common classes of DNA-binding proteins whose three-dimensional structures have been determined
In all these examples and many other transcription factors, at least one ox helix is inserted into a major groove of DNA
Homeodomain Proteins: Many eukaryotic transcription factors that function during development contain a conserved 60-residue DNA-binding motif that is similar to the helix-turn-helix motif of bacterial repressors
Zinc-Finger Proteins: A number of different eukaryotic proteins have regions that fold around a central Zn2 ion, producing a compact domain from a relatively short length of the polypeptide chain
The C2H2 zinc finger is the most common DNA-binding motif encoded in the human genome and the genomes of most other multicellular animals
It is also common in multicellular plants but is not the dominant type of DNA-binding domain in plants as it is in animals
The second type of zinc-finger structure, designated the C4 zinc finger (because it has four conserved cysteines in contact with the Zn2), is found in ≈50 human transcription factors
A characteristic feature of C4 zinc fingers is the presence of two groups of four critical cysteines, one toward each end of the 55- or 56-residue domain
Leucine-Zipper Proteins Another structural motif present in the DNA-binding domains of a large class of transcription factors contains the hydrophobic amino acid leucine at every seventh position in the sequence
These proteins bind to DNA as dimers, and mutagenesis of the leucines showed that they were required for dimerization
GCN4 forms dimers via hydrophobic interactions between the C-terminal regions of the ox helices, forming a coiled-coil structure
This structure is common in proteins containing amphipathic ox helices in which hydrophobic amino acid residues are regularly spaced alternately three or four positions apart in the sequence, forming a stripe down one side of the ox helix
The first leucine-zipper transcription factors to be analyzed contained leucine residues at every seventh position in the dimerization region
Additional DNA-binding proteins containing other hydrophobic amino acids in these positions subsequently were identified
Basic Helix-Loop-Helix (bHLH) Proteins: The DNA-binding domain of another class of dimeric transcription factors contains a structural motif very similar to the basic-zipper motif except that a non-helical loop of the polypeptide chain separates two ox-helical regions in each monomer
Two types of DNA-binding proteins discussed in the previous section—basic-zipper proteins and bHLH proteins—often exist in alternative heterodimeric combinations of monomers
In some heterodimeric transcription factors, each monomer has a different DNA-binding specificity
The resulting combinatorial possibilities increase the number of potential DNA sequences that a family of transcription factors can bind
Three different factor monomers theoretically could combine to form six homo- and heterodimeric factors
Four different factor monomers could form a total of 10 dimeric factors; five monomers, 16 dimeric factors; and so forth
Similar combinatorial transcriptional regulation is achieved through the interaction of structurally unrelated
Neither NFAT nor AP1 binds to its site in the IL-2 control region in the absence of the other
The affinities of the factors for these particular DNA sequences are too low for the individual factors to form a stable complex with DNA
But, when both NFAT and AP1 are present, protein-protein interactions between them stabilize the DNA ternary complex composed of NFAT, AP1, and DNA
Cooperative binding by NFAT and AP1 occurs only when their weak binding sites are located at a precise distance, quite close to each other in DNA
Recent studies have shown that the requirements for cooperative binding are not so stringent in the case of some other transcription factors and control regions
Experiments with fusion proteins composed of the GAL4 DNA-binding domain and random segments of E. coli proteins demonstrated that a diverse group of amino acid sequences can function as activation domains
1% of all E. coli sequences, even though they evolved to perform other functions
Biophysical studies indicate that acidic activation domains have an unstructured, random-coil conformation
These domains stimulate transcription when they are bound to a protein co-activator
The interaction with a co-activator causes the activation domain to assume a more structured -helical conformation in the activation domain–co-activator complex
Some activation domains are larger and more highly structured than acidic activation domains
As noted previously, enhancers generally range in length from about 50 to 200 base pairs and include binding sites for several transcription factors
The multiple transcription factors that bind to a single enhancer are thought to interact
The term enhanceosome has been coined to describe such large nucleoprotein complexes that assemble from transcription factors as they bind cooperatively to their multiple binding sites in an enhancer
HMGI binds to the minor groove of DNA regardless of the sequence and, as a result, bends the DNA molecule sharply
In vitro transcription by purified RNA polymerase II requires the addition of several initiation factors that are separated from the polymerase during purification
These initiation factors, which position polymerase molecules at transcription start sites and help to melt the DNA strands so that the template strand can enter the active site of the enzyme, are called general transcription factors
The general transcription factors that assist Pol II in the initiation of transcription from most TATA-box promoters in vitro have been isolated and characterized
Detailed biochemical studies revealed how the Pol II preinitiation complex
Comprising a Pol II molecule and general transcription factors bound to a promoter region of DNA, is assembled
Once TBP has bound to the TATA box, TFIIB can bind
TFIIB is a monomeric protein, slightly smaller than TBP
The C-terminal domain of TFIIB makes contact with both TBP and DNA on either side of the TATA-box, while its N-terminal domain extends toward the transcription start site
The helicase activity of one of the TFIIH subunits uses energy from ATP hydrolysis to unwind the DNA duplex at the start site
Allowing Pol II to form an open complex in which the DNA duplex surrounding the start site is melted and the template strand is bound at the polymerase active site
Although the general transcription factors discussed above allow Pol II to initiate transcription in vitro, another general transcription factor, TFIIA, is required for initiation by Pol II in vivo
Purified TFIIA forms a complex with TBP and TATA-box DNA. X-ray crystallography of this complex shows that TFIIA interacts with the side of TBP that is upstream from the direction of transcription
The TAF subunits of TFIID appear to play a role in initiating transcription from promoters that lack a TATA box
For many years it has been clear that inactive genes in eukaryotic cells are often associated with heterochromatin
Regions of chromatin that are more highly condensed and stain more darkly with DNA dyes than euchromatin, where most transcribed genes are located
Regions of chromosomes near the centromeres and telomeres and additional specific regions that vary in different cell types are organized into heterochromatin
The promoters and UASs controlling transcription of the a and genes lie near the center of the DNA sequence that is transferred and are identical whether the sequences are at the MAT locus or at one of the silent loci
Consequently, the function of the transcription factors that interact with these sequences is somehow blocked at HML and HMR
Researchers found that GATC sequences within the MAT locus and most other regions of the genome in these cells were methylated, but not those within the HML and HMR loci
These results indicate that the DNA of the silent loci is inaccessible to the E. coli methylase and presumably to proteins in general, including transcription factors and RNA polymerase
Genetic studies led to the identification of several proteins, RAP1, and three SIR proteins, that are required for repression of the silent mating-type loci and the telomeres in yeast
RAP1 was found to bind within the DNA silencer sequences associated with HML and HMR and to a sequence that is repeated multiple times at each yeast chromosome telomere
The importance of histone deacetylation in chromatin-mediated gene repression has been further supported by studies of eukaryotic repressors that regulate genes at internal chromosomal positions
These proteins are now known to act in part by causing deacetylation of histone tails in nucleosomes that bind to the TATA box and promoter-proximal region of the genes they repress
The SIN3-RPD3 complex functions as a co-repressor
Co-repressor complexes containing histone deacetylases also have been found associated with many repressors from mammalian cells
Some of these complexes contain the mammalian homolog of SIN3 (mSin3), which interacts with the repressor protein
Other histone deacetylase complexes identified in mammalian cells appear to contain additional or different repressor-binding proteins
The discovery of mSin3-containing histone deacetylase complexes provides an explanation for earlier observations
Invertebrates transcriptionally inactive DNA regions often contain the modified cytidine residue 5-methylcytidine (mC) followed immediately by a G, whereas transcriptionally active DNA regions lack mC residues
Genetic and biochemical studies in yeast led to the discovery of a large multiprotein complex containing the protein GCN5, which has histone acetylase activity
Maximal transcription activation by GCN4 depends on these histone acetylase complexes, which thus function as co-activators
A similar activation mechanism operates in higher eukaryotes
One domain of CBP binds the phosphorylated acidic activation domain in the CREB transcription factor
Other domains of CBP interact with different activation domains in other transcription factors
Yet another domain of CBP has histone acetylase activity, and another CBP domain associates with a multiprotein histone acetylase complex that is homologous to the yeast GCN5-containing complex
Histone tails in chromatin can undergo reversible phosphorylation of serine and threonine residues
Reversible monoubiquitination of a lysine residue in the H2A C-terminal tail, and irreversible methylation of lysine residues