POST-TRANSCRIPTIONAL EVENTS

So far we have looked at chromatin structure and transcription. In this lecture I want to turn to what happens to RNA molecules once they have been produced - how they are modified and processed, concentrating on messenger RNA.

The RNA molecules produced by RNA polymerase II strictly speaking are not messenger RNAs until they have been processed and modified and leave the nucleus. What pol II produces is a primary transcript or precursor RNA.

One of the first modifications to pol II primary transcripts is that a structure called a cap is added to the 5' end. This occurs while the pre-mRNA is still being synthesised. This cap is a methylated guanosine triphosphate.

A newly initiated RNA molecule has a triphosphate group at the 5' end - this is because synthesis goes in a 5' to 3' direction. Nucleotides are added to the 3' OH so the first nucleotide has to have a triphosphate group.

The capping enzyme adds a GTP molecule generating a triphosphate link - two phosphates are derived from the RNA and one from GTP. Note that the linkage is 5' to 5' rather than the normal 5' to 3'. Next a methyl group is added by methyl transferase to the nitrogen number 7 in the guanine base. The first nucleotide in the mRNA proper also has a methyl group added but this time to the 2' hydroxyl group of the ribose sugar. The function of this cap structure seems to be to protect the mRNA from degradation. It also seems to be necessary for binding of the mRNA to the ribosome during translation initiation since translation of un-capped mRNAs is very inefficient.

The next modification of the pre-mRNA occurs at the opposite end of the RNA molecule - at the 3' end.

Transcription in prokaryotes stops at a specific DNA sequence at the end of the gene called a termination site. In eukaryotes, however, transcription stops at random after the end of the gene. You would think that this would lead to mRNAs of different lengths but this is not the case because the RNA is cut, or trimmed, at a specific site at the 3' end by a clipping enzyme. The signal in the RNA molecule which tells the enzyme where to make the cut is AAUAAA and occurs a few bases upstream of where the cut is made.

After this clipping event adenyl (AMP) residues are added to the new 3' end of the pre-mRNA by an enzyme called poly(A) polymerase. This generates a poly(A) tail at the 3' end of the pre-mRNA - usually around 150-200 A residues are added.

Poly(A) tails are present at the 3' end of all mRNAs except the mRNAs for histones. The function of the poly(A) tail is unknown and it is unclear why histone mRNAs lack a poly(A) tail when all, or nearly all, other mRNAs have a poly(A) tail. The poly(A) tail may have something to do with protecting the 3' end of the mRNA from degradation just as the 5' cap protects the 5' end from degradation.

We now have a precursor mRNA which has a 5' cap at one end and a poly(A) tail at the other end but, in most cases, this is still not the finished product.

Before the advent of cloning techniques which allow us to look at the structure of individual genes, it was known that the average size of RNA in the nucleus is much larger than mRNA found in the cytoplasm. Since this large RNA varies considerably in size it was called heterogeneous nuclear RNA or hnRNA. Since hnRNA molecules have poly(A) tails it was assumed that hnRNA molecules are precursors of mRNA and that large stretches of RNA would be cleaved off at the 5' end to generate the mature mRNA.

When cloning techniques came along and the structure of individual genes and mRNAs was examined it became clear that hnRNAs are indeed precursors of mRNA and are therefore the same as the precursor mRNAs which we've been discussing. However, the extra RNA which is removed is not at the 5' end but in the middle of the hnRNA molecules.

It turns out that there are regions within the hnRNA which are not found in the mRNA. There may be just one of these regions or there may be many. We call the regions of the pre-mRNA which end up in the mRNA exons and those regions which are removed introns. The introns are removed by a process called splicing to generate the mature transcript or mRNA. Note that introns split up the actual coding sequence of the mRNA and, if they were not removed, the mRNA could not be translated to give a complete protein.

Introns may be small (a few 10s of base pairs) or large (1000s of base pairs). Lower eukaryotes such as yeast have only a few of their genes interrupted by introns and these are usually short and there is just one. The higher up the evolutionary tree you go, the more genes there are which contain introns and the more introns you find in genes. These introns also tend to be longer. The largest number of introns in a gene so far discovered is more than 70 in the gene which, when mutated, causes Duchenne muscular dystrophy.

Since introns interrupt the protein coding sequence, it is of vital importance that they are removed accurately since leaving one or two bases behind or accidentally deleting one or two bases from the exon could cause a frameshift which would cause a defective protein to be produced.

To ensure that introns are removed accurately from the pre-mRNA there are signals at the intron/exon boundaries that mark the exact nucleotides where exon ends and intron begins.

When the nucleotide sequences of a number of introns from different species were compared, it was found that introns always begin with GU and end with AG. When more sequences were compared it was found that some bases occur more frequently than others at the exon/intron boundaries so that a consensus sequence could be determined. This shows the bases which occur most often at these positions - it is only the GU at the beginning of the intron and the AG at the end which are always found. YN is a stretch of pyrimidines - a stretch of cytosines and uracils. We can show that these consensus sequences are important by changing the bases found at various positions and then looking to see if the intron is removed correctly. Sure enough if these bases are changed intron removal either doesn't occur or doesn't occur properly.

The conclusion is that something specifically recognises these consensus sequences, cuts the RNA at exactly the right place - the intron/exon boundaries and then joins, or splices, the two exons together.

So what is the "something" that does this? In fact it is not just one something but a small number of somethings. You might expect these to be RNA binding proteins and enzymes. They are, in fact, particles of RNA and protein called small nuclear ribonucleoprotein particles (snRNPs or "snurps"). Each snurp contains a small RNA molecule called a small nuclear RNA or snRNA. There is a different one of these for each snurp. In addition to the snRNA there are about 6-8 proteins which make up each snurp. The snurps get their name from the snRNA which they contain. There are 5 involved in splicing called U1, U2, U4, U5 and U6. These snurps come together on the pre-mRNA to form a complex structure called a spliceosome which actually carries out the splicing reactions.

From what I said earlier, you might expect splicing to occur by simultaneous cuts at intron/exon junctions followed by joining of the exons. In fact it is more complex than this and occurs in a two step process.

In step 1, cleavage at the 5' end of the intron occurs and the free 5' end of the intron becomes attached to the 2' OH group of an adenine residue close to the 3' end of the intron - this creates a branched structure sometimes called a lariat. The first exon remains tightly bound to the spliceosome to avoid being lost. In the second step, cleavage at the 3' end of the intron occurs, the two exons are joined together and the intron is released as a lariat structure.

The function of introns remains a mystery. Why should eukaryotes have them and prokaryotes not? Have they appeared late during evolution or early - with prokaryotes subsequently losing them because of selective pressures to keep their genome size small. Evidence for the latter theory came from the discovery of introns in a group of bacteria called the archaebacteria - these live in bizarre habitats and may be similar to the original prokaryotes. Further evidence for the "introns early" hypothesis is that some genes have the same intron/exon structure in organisms which are evolutionary unrelated - in other words whose ancestors diverged a very long time ago. For instance the gene coding for triose phosphate isomerase has three introns which are in the same place in chickens and maize - so these introns must have been in place before plants and animals diverged more than a billion years ago.

One possible function of introns is that they allow whole chunks of genes to be exchanged thus speeding up evolution. So if each exon coded for a particular functional region, or domain, of a protein, these could be swapped around to create proteins with novel functions. Circumstantial evidence for this is that exons often, but not always, correspond to protein domains and that some genes have exons which appear to have been put together from a number of different genes. For example, the gene which codes for the low density lipoprotein (LDL) receptor has eighteen exons, eight of which are related to exons found in the gene for epidermal growth factor and five are related to a protein of the immune system called complement factor C9. This apparent swapping of exons between genes is called exon shuffling.

To get back to the theme of these lectures - gene expression. Can splicing be used to regulate gene expression?

Conceivably if a protein is required in a particular cell type this could be achieved by splicing the pre-mRNA for that protein only in that cell type since only the spliced RNA can produce a functional protein. Clearly this is a wasteful method of gene regulation since the cell has gone to the trouble of making the precursor RNA but in a few cases this cell-specific splicing has been shown to occur.

A more subtle form of gene regulation can occur if a gene has more than one intron. Some of these genes produce mRNAs which have been alternatively spliced. The splicing mechanism can miss out an exon thus generating two related but different mRNAs and therefore two different proteins - both from the same gene.

Having produced a mature, spliced mRNA which is exported to the cytoplasm the next step at which regulation could occur is stability of the mRNA. The amount of protein synthesised depends on the amount of its mRNA in the cytoplasm and this depends on how quickly it is made and how quickly it is broken down.

If a mRNA is broken down quickly, little will accumulate in the cytoplasm and therefore only a small amount of protein will be made. On the other hand if a mRNA is broken down slowly, more will accumulate in the cytoplasm and more protein will be made. So, if the stability of a particular mRNA is altered, the amount of protein synthesised will also be altered.

This has been shown to occur for the mRNA coding for the milk protein casein. Casein synthesis increases following treatment of breast tissue with the hormone prolactin. Only a small fraction of this increase is due to an increase in transcription of the gene. Most of the increase (which is about 20-fold) is due to a selective stabilisation of the casein mRNA. It's not clear whether this is due to any changes in the 5' cap or poly(A) tail.

The final level of gene regulation I'm going to consider is the translational level. Obviously the amount of protein produced from a mRNA can be regulated by regulating how efficiently the mRNA is translated.

The most striking example of translational control occurs in the eggs of a number of animals such as sea urchins and amphibians. During oogenesis (egg formation) the messenger RNAs for a number of proteins which are required for early embryonic development are accumulated in the oocyte. These mRNAs are called maternal mRNAs and are not translated until after fertilisation.

So, in the oocyte and unfertilised egg, translation of these mRNAs is blocked or masked, probably by proteins which bind to the mRNA forming messenger ribonucleoprotein particles (mRNP particles). After fertilisation this block is removed, allowing translation to take place.

The reason for having this maternal mRNA is that eggs of these species divide rapidly after fertilisation and the zygote couldn't possibly produce enough mRNA to keep up with this rapid rate of cell division.