We can’t fault C. elegans on grounds of utility, but it is clearly a much less complex organism than our good selves. Why are we so much more sophisticated? Given the importance of proteins in cellular function, the original assumption was that complex organisms like mammals have more protein-coding genes than simple creatures like C. elegans. This was a perfectly reasonable hypothesis but it has fallen foul of a phenomenon described by Thomas Henry Huxley. He was Darwin’s great champion in the 19th century and it was Huxley who first described ‘the slaying of a beautiful hypothesis by an ugly fact’.

As DNA sequencing technologies improved in cost and efficiency, numerous labs throughout the world sequenced the genomes of a number of different organisms. They were able to use various software tools to identify the likely protein-coding genes in these different genomes. What they found was really surprising. There were far fewer protein-coding genes than expected. Before the human genome was decoded, scientists had predicted there would be over 100,000 such genes. We now know the real number is between 20,000 and 25,000 genes[128]. Even more oddly, C. elegans contains about 20,200 genes[129], not so very different a number from us.

Not only do we and C. elegans have about the same number of genes, these genes tend to code for pretty much the same proteins. By this we mean that if we analyse the sequence of a gene in human cells, we can find a gene of broadly similar sequence in the nematode worm. So the phenotypic differences between worms and humans aren’t caused by Homo sapiens having more, different or ‘better’ genes.

Admittedly, more complicated organisms tend to splice their genes in more ways than simpler creatures. Using our CARDIGAN example from Chapter 3 as an analogy once again, C. elegans might only be able to make the proteins DIG and DAN whereas mammals would be able to make those two proteins and also CARD, RIGA, CAIN and CARDIGAN.

This certainly would allow humans to generate a much greater repertoire of proteins than the 1mm worm, but it introduces a new problem. How do more complicated organisms regulate their more complicated splicing patterns? This regulation could in theory be controlled solely by proteins, but this in turn has difficulties. The more proteins a cell needs to regulate in a complicated network, the more proteins it needs to do the regulation. Mathematical models have shown that this rapidly leads to a situation where the number of proteins that we need begins to out-strip the number of proteins that we actually possess – clearly a non-starter.

Do we have an alternative? We do, and it’s indicated in Figure 10.1.

Figure 10.1 This graph demonstrates that the complexity of living organisms scales much better with the percentage of the genome that doesn’t code for protein (black columns) than it does with the number of basepairs coding for protein in a genome (white columns). The data are adapted from Mattick, J. (2007), Exp Biol. 210: 1526–1547.

At one extreme we have the bacteria. Bacteria have very small, highly compacted genomes. Their protein-coding genes cover about 4,000,000 base-pairs, which is about 90 per cent of their genome. Bacteria are very simple organisms and fairly rigid in the way they control their gene expression. But things change as we move further up the evolutionary tree.

The protein-coding genes of C. elegans cover about 24,000,000 base-pairs, but that only accounts for about 25 per cent of their genome. The remaining 75 per cent doesn’t code for protein. By the time we reach humans, the protein-coding regions cover about 32,000,000 base-pairs, but this only represents about 2 per cent of the total genome. There are various ways that we can calculate the protein-coding regions, but they make relatively little difference to the astonishing bottom line. Over 98 per cent of the human genome doesn’t code for protein. All but 2 per cent of our genome is ‘junk’.

In other words, the numbers of genes, or the sizes of these genes, don’t scale with complexity. The only feature of a genome that really seems to get bigger as organisms get more complicated is the section that doesn’t code for protein.

The tyranny of language

Перейти на страницу:

Похожие книги