Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

How many protein-coding genes in the human genome? (2)

It's difficult to know how many protein-coding genes there are in the Human Genome because there are several different ways of counting and the counts depend on what criteria are used to identify a Gene. Last year I commented on a review by Abascal et al. (2018) that concluded there were somewhere between 19,000 and 20,000 protein-coding genes. Those authors discussed the problems with annotation and pointed out that the major databases don't agree on the number of gene [How many protein-coding genes in the human genome?].

Abascal et al. also said that before publication of the human genome most researchers were expecting about 25,000 - 40,000 genes so the actual number of protein-coding genes is pretty close to those estimates. (Keep in mind that there are several thousand noncoding genes.) This helps to debunk the standard myth that scientists were expecting 100,000 or more genes [False History and the Number of Genes 2101] [Facts and Myths Concerning the Historical Estimates of the Number of Genes in the Human Genome].

Now there's a new review that continues this discussion (Hatje et al. (2019). One of the best things in this latest review is a new figure showing a much better history of gene number estimates. Readers might recall that back in 2010, Pertea and Salzberg published some false information on this subject [see False History and the Number of Genes 2010]. A modified version of their figure (right) was published just last year in Nature (Willyard, 2018).

Here's the new figure from Hatje et al. (below). It's much better but it still gives too much credence to high estimates of gene number that were not supported by reliable data or logic (e.g. US Human Genome Project (1990), CpG Islands, EST data, and GeneSweep). However, the new figure is a far more accurate history than the one published by Pertea and Salzberg—don't you agree?


This really should put an end to the ridiculous myth that experts were "shocked" and "surprised" at the low number of genes in the human genome back in 2001. Let's hope that we never have to hear that canard again, especially in the scientific literature.

There's another figure in the Hatje et al. paper that nicely illustrates the differences between various ways of calculating the number of protein-coding genes. The estimates by Gencode, Ensembl, and RefSeq have drifted downward so that they now cluster around 20,000 genes. Unfortunately, these three databases do not agree on that core number of genes—only about 19,000 are common to all three databases. Those estimates are based mostly on computer models of potential genes with help from human annotators.


CCDS, neXtProt, and PeptideAtlas are databases that require independent evidence that a potential gene is functional. Usually this means identifying the protein product [see How many proteins in the human proteome?, How many different proteins are made in a typical human cell?, How many proteins do humans make?]. These values are increasing over the years so that we can be confident that there are at least 19,000 protein-coding genes but probably not more than 20,000.


Thanks to Martin Kollmar for alerting me to this paper from his lab.

Hatje, K., Mühlhausen, S., Simm, D., and Kollmar, M. The Protein-Coding Human Genome: Annotating High-Hanging Fruits. BioEssays 1900066. [doi: 10.1002/bies.201900066]

Pertea, M., and Salzberg, S. (2010) Between a chicken and a grape: estimating the number of human genes. Genome biology, 11:206. [doi:10.1186/gb-2010-11-5-206]


This post first appeared on Sandwalk, please read the originial post: here

Share the post

How many protein-coding genes in the human genome? (2)

×

Subscribe to Sandwalk

Get updates delivered right to your inbox!

Thank you for your subscription

×