Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

The textbook view of alternative splicing

As most of you know, I'm interested in the problem of alternative splicing. I believe that the number of splice variants that have been detected is perfectly consistent with the known rate of splicing errors and that there's no significant evidence to support the claim that alternative splicing leading to the production of biologically relevant protein variants is widespread. In fact, there's plenty of evidence for the opposite view; namely, splicing errors (lack of conservation, low abundance, improbable protein predictions, inability to detect the predicted proteins).

My preferred explanation is definitely the minority view. What puzzles me is not the fact that the majority is wrong () but the fact that they completely ignore any other explanation of the data and consider the case for abundant alternative splicing to be settled.

This bizarre situation is reflected in the textbooks, which means we are training an entire generation of undergraduates to ignore critical thinking and simply adopt some strange ideas that have permeated the scientific literature. I recently purchased a textbook that illustrates my point. It's Genomes 4 (2017), the fourth edition of a book by T.A. Brown that was first published in 1999. The relevant section is in chapter 7 in a section titled "Gene numbers can be misleading."

The section begins by saying that scientists expected humans to have a lot of genes because humans are "the most sophisticated species on the planet." It points out that while we have more genes than yeast, fruit flies, and chickens we have fewer genes than most plants and only about the same number as the nematode Caenorhaditis elegans. The author then offers an explanation,
These gene numbers lead us into an important aspect of genome biology. Before the human genome was sequenced it was anticipated that there would be 80,000-100,000 protein-coding genes, this number remaining in vogue up to a few months before the draft sequence was completed in 2000. This yearly estimate was high because it was based on the supposition that, in most cases, a single gene specifies a single mRNA and single protein. According to this model, the number of genes in the human genome should be similar to the number of proteins in human cells, leading to the estimates of 80,000-100,000. The discovery that the actual number of protein-coding genes is much lower indicates that it is possible for an individual gene to specify more than one protein. This is the case for many of the discontinuous genes in the human genome. When introns were first discovered, it was thought that a discontinuous gene would have just one splicing pathway, in which all the exons are joined together to give a single mRNA. We now know that many discontinuous genes have alternative splicing pathways, which means that their pre-mRNAs can be processed in a variety of ways, to give a series of mRNAs made up of different combinations of exons. Each of these genes can therefore direct synthesis of related but different proteins. ... Alternative splicing is relatively common in vertebrates, with 75% of all human protein-coding genes, representing 95% of those with two or more intron's, undergoing alternative splicing, giving rise to an average of four different spliced mRNA's per gene. Alternative splicing also occurs in lower eukaryotes, but it is less prevalent. In C. elegans for example, only about 25% of the protein coding genes have alternative splicing pathways, with an average of 2.2 variants per gene

Because of alternative splicing, the question "How many genes are there?" has no real biological significance, as the number of genes does not indicate the number of proteins that can be synthesized and hence is not a measure of the coding capacity of a genome. A better measure of the biological complexity of an organism is provided by categorizing the genes, including the splice variants according to function.
This is the consensus view of textbook authors, science journalists, and probably most of the researchers in the field of gene expression although some of the details may differ. In this case, for example, the author claims that the high gene count was based on thinking that humans have 80,000-100,000 different proteins. That's not a rationale for the false expectation of gene number that I've seen before.1 I don't know where it comes from but I do know that current estimates of proteome complexity do not support such a claim. I direct your attention to a recent paper by Tress et al. (2017) who looked at the mass spec data and concluded,
Alternative splicing is well documented at the transcript level, and microarray and RNA-seq experiments routinely detect evidence for many thousands of splice variants. However, large-scale proteomics experiments identify few alternative isoforms. The gap between the numbers of alternative variants detected in large-scale transcriptomics experiments and proteomics analyses is real and is difficult to explain away as a purely technical phenomenon. While alternative splicing clearly does contribute to the cellular proteome, the proteomics evidence indicates that it is not as widespread a phenomenon as suggested by transcript data. In particular, the popular view that alternative splicing can somehow compensate for the perceived lack of complexity in the human proteome is manifestly wrong.
This is not something that was only discovered recently so the textbook author (T.A. Brown) cannot be excused on the grounds that the data came out after his book was in press.

Putting that aside, lets think about what we might be teaching undergraduates. The underlying assumption is that primitive eukaryotes produced only one or two different protein isoforms per gene but over time there was selection for more complexity by evolving alternative splicing to create more isoforms culminating in humans where an average of four different protein isoforms are made per gene. Think about what that means by considering typical housekeeping genes like those involved in glycolysis, the citric acid cycle, amino acid biosynthesis and a host of other metabolic pathways. Think about the genes for all the subunits of RNA polymerase, the subunits of the electron transport complexes in mitochondria, and the 80 or so proteins of the ribosome. The genes for all these proteins have multiple splice variants in the databases so it follows that there must have been selection for each of them to produce, on average, three protein variants in addition to the standard conserved protein seen in bacteria. It's very difficult to imagine why humans would need three additional versions of glyceraldehyde 3-phosphate dehydrogenase or citrate synthase where each new variant is missing a block of internal amino acid residues or has an extra stretch of amino acids inserted into the middle of the structure. It's even more difficult to imagine why such altered versions of the subunits of complex structures could provide a selective advantage. Nevertheless, this is exactly what we are teaching these days in our undergraduate courses.

I don't get it.

Let's assume that alternative splicing is restricted to just a small number of genes as I claim. How would scientists respond to that if they still think there's a problem reconciling the number of genes with the idea that humans are "the most sophisticated species on the planet"? Would it create such a serious problem for them that they are reluctant to consider the possibility? Is this an idea that undergraduates can't handle so we have to protect them from even thinking about it?


1. Here's the explanation given in the original publication of the draft human genome sequence by Lander, et al, (2001, p. 898).
Previous estimates of human gene number. Although direct enumeration of human genes is only now becoming possible with the advent of the draft genome sequence, there have been many attempts in the past quarter of a century to estimate the number of genes indirectly. Early estimates based on reassociation kinetics estimated that the mRNA complexity of typical vertebrate tissues to be 10,000-20,000, and were extrapolated to suggest around 40,000 for the entire genome. In the mid-1980s, Gilbert suggested that there might be about 100,000 genes, based on the approximate ratio of the size of a typical gene (~3 × 104bp) to the size of the genome (3 × 109bp). Although this was intended only as a back-of-the-envelope estimate, the pleasing roundness of the figure seems to have led to it being widely quoted and adopted in many textbooks (W. Gilbert, personal communication).

Lander, E., Linton, L., Birren, B., Nusbaum, C., Zody, M., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J., Miranda, C., Morris, W., Naylor, J., Raymond, C., Rosetti, M., Santos, R., Sheridan, A., Sougnez, C. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409:860-921. [doi: 10.1038/35057062]

Tress, M.L., Abascal, F., and Valencia, A. (2017) Alternative splicing may not be the key to proteome complexity. Trends in biochemical sciences, 42:98-110. [doi: 10.1016/j.tibs.2016.08.008]



This post first appeared on Sandwalk, please read the originial post: here

Share the post

The textbook view of alternative splicing

×

Subscribe to Sandwalk

Get updates delivered right to your inbox!

Thank you for your subscription

×