June 24th 2017

Mammalian genomes are very large. It looks like 90% of it is junk DNA. These genomes are pervasively transcribed, meaning that almost 90% of the bases are complementary to a transcript produced at some time during development. I think most of those transcripts are due to inappropriate transcription initiation. They are mistakes in transcription. The genome is littered with transcription factor binding sites but only a small percentage are directly involved in regulating gene expression. The rest are due to spurious binding—a well-known property of DNA binding proteins. These conclusions are based, I believe, on a proper understanding of evolution and basic biochemistry.

If you add up all the known genes, they cover about 30% of the genome sequence. Most of this (>90%) is intron sequence and introns are mostly junk. The standard mammalian gene is transcribed to produce a precursor RNA that is subsequently processed by splicing out introns to produce a mature RNA. If it's a messenger RNA (mRNA) then it will be translated to produce a protein (technically, a polypeptide). So far, the vast majority of protein-coding genes produce a single protein but there are some classic cases of alternative splicing where a given gene produces several different protein isoforms, each of which has a specific function.

Over the years it has been possible to detect RNA variants that don't correspond to the standard mRNA. Most genes have a large number of such variants. They are deposited in various databases such as the ECgene database (ECgene: an alternative splicing database). Here's an example of splice variants of the human triose phosphate isomerease gene (TPI1).¹

Most of these splice variants would produce various isoforms of triose phosphate isomerase if the RNA variant were translated.

There are two explanation of the data ...

Massive alternative splicing: The processing variants represent true biologically functional molecules and the predicted protein products all have a biological function that's unknown at the present time. This gives rise to the meta-claim that almost all human genes are subject to alternative splicing and almost all human genes produce 5-10 different functional protein isoforms. This suggests that there's exquisite regulation of alternative splicing due to splicing factors.

Splicing errors: Most of the variants are due to splicing errors (mistakes). This view is consistent with the fact that the variants are present at very low concentrations and that the error rate of the spliceosome reaction is known to be far less accurate than DNA replication or transcription.

Most of the labs that work on these splice variants are proponents of massive alternative splicing. They operate on the assumption that this phenomenon is a real biological phenomenon and that there are tens of thousands of undiscovered protein isoforms inside most cells.

Most biochemists and molecular biologists accept this explanation. That's why statements like to one below go unchallenged on the Nature science education website (Pray, 2008).

Alternative splicing was the first phenomenon scientists discovered that made them realize that genomic complexity cannot be judged by the number of protein-coding genes. During alternative splicing, which occurs after transcription and before translation, introns are removed and exons are spliced together to make an mRNA molecule. However, the exons are not necessarily all spliced back together in the same way. Thus, a single gene, or transcription unit, can code for multiple proteins or other gene products, depending on how the exons are spliced back together. In fact, scientists have estimated that there may be as many as 500,000 or more different human proteins, all coded by a mere 20,000 protein-coding genes.

The minority view, which I support, is that almost all of those splice variants are due to processing errors and the average protein-coding gene produces a single functional polypeptide.

1. Triose phosphate isomerase is a well-studied enzyme of the gluconeogenesis and glycolysis pathways. It is present in all species.

This post first appeared on Sandwalk, please read the originial post: here

People also like

Master Your Gaming Skills with Luna Cloud Gaming: A Step-by-Step Guide

Is the Euphoria Around Electric Vehicles Fading?

This Massive Entertainment Company Is Now Laying Off 900

Debating alternative splicing (part II)

Share the post

Subscribe to Sandwalk

Thank you for your subscription