In 1984, George Orwell described how devastating it would be if we were to reduce our vocabulary/dictionary. We need appropriate words for complex thoughts, and Orwell reasoned that it would be impossible to have complex thoughts without those words. It would be, for example, very difficult for us to talk about totalitarianism if the word didn’t exist in our vocabulary. But what happens when we get rid of some synonyms in our genetic code? That’s what Fredens and his team wanted to find out. They described their findings in their recent paper, and here, I want to go over that paper.
Since we normally don’t think of synonyms when we think of biology, let me explain what I mean. If the purpose of life is to produce food, then we can think of our DNA as an encyclopedic cooking book that we could use to make a particular dish. Like the book, all the information a cell needs to make a protein or RNA is contained within our DNA (I will explain what RNA is later on). Unlike the book, however, our genetic code uses only four letters Adenine, Thymine, Guanine, and Cytosine or A, T, G, and C for short. The specific arrangements of those bases/letters give us the words and sentences—the genetic code. For example, a small part of the genetic code could be ATGCATTAGA. Just to illustrate how complex genetic code can be, even this 10 letter code gives us 10! or over 3.6 million combinations!
Since we normally don’t think of synonyms when we think of biology, let me explain what I mean. If the purpose of life is to produce food, then we can think of our DNA as an encyclopedic cooking book that we could use to make a particular dish. Like the book, all the information a cell needs to make a protein or RNA is contained within our DNA (I will explain what RNA is later on). Unlike the book, however, our genetic code uses only four letters Adenine, Thymine, Guanine, and Cytosine or A, T, G, and C for short. The specific arrangements of those bases/letters give us the words and sentences—the genetic code. For example, a small part of the genetic code could be ATGCATTAGA. Just to illustrate how complex genetic code can be, even this 10 letter code gives us 10! or over 3.6 million combinations!
Let’s suppose that a cell needs to make a protein from that code. For that to happen, the specific DNA code needs to be copied or ‘transcribed’ into an RNA. Just as we write down the recipe for a dish because we don’t want to carry around a big cooking book, that particular sequence for a protein needs to be written in RNA.
RNA stands for Ribonucleic Acid, whereas DNA stands for Deoxyribonucleic Acid. From their names, we can deduce that DNA and RNA are similar though oxygen is missing from the DNA molecule—the deoxy part. But there is another crucial difference: the base/letter T is absent in RNA. RNA molecules use Uracil instead of Thymine, and whenever there is a T in the DNA code, it gets written as U in the RNA. So for the code above, ATGCATTAGA, the resulting RNA would become AUGCAUUAGA. To make the necessary protein, the cells would use this rewritten code in the RNA.
Every three letters or triplets in that RNA specify one amino acid, the building blocks of a protein. If we write the RNA in triplets, we get AUG GAU UAG A. Since these triplet codes are more or less universal, there is a chart that anyone can use to determine the protein sequence (the chart is shown in the picture below).
To read the chart, we start from the left, then go to the top and finally go to the right to find the first, second and the third letter in the triplets. So when we look up AUG we get methionine, CAU gives us arginine, but UAG gives us 'Stop'. What that means that the protein production would halt there—it’s a signal for the protein-making machinery to stop adding additional amino acid. In this case, like adding meat chunks on a kabob skewer, the cellular machinery would add methionine-arginine and then stop. Actual protein are much longer—they would have hundreds of amino acids—but we are going to stick to these ten letters for simplicity.
But what does all this have to do with synonyms? Well, if we look at the table, we can see that several triplets—referred to as codons—can specify one amino acid. We can clearly see why that’s the case with some math. If we have four letters and we take three letters to make a word/amino acid, we can have 64 different combinations. Since there are 20 universal amino acids, we need only 20 of those combinations plus one for the stop codon. However, all of those 64 combinations are used by cells, and this gives rise to redundancy or synonyms.
Six different combinations, for example, give us amino acid serine, and three different combinations give us the stop signal. Another way of saying this would be that serine has six synonyms whereas stop has three synonyms. In their paper, the researchers wanted to find out what happens when we take out two synonyms for serine and one synonym for the stop codon. They wanted to see if the re-coded organism would be viable.
Even though part of the genetic codes had been altered like that before, no one had altered the whole genome, the entire encyclopedia. The organism the scientists chose for this project was Escherichia coli, a type of bacteria found in our guts among other places. The entire genome of this bacteria is about four mega basepairs. To put that in a context, the code we worked on only had 10 basepairs because it had only 10 letters. In other words, the code the scientists worked on was 400,000 times longer. Here, I will briefly go over the way they re-coded the entire organism.
First, the researchers put the sequence in their computers and then looked for all the serine and stop codes within all the protein sequences (doing this by hand would be exhausting!). Whenever the researchers found a sequence for serine, they systematically changed it to four predetermined sequences for serine. This reduced serine synonyms from six to four. They did the same for the stop codons as well, but in this case, they reduced the three stop codons to two. They thus altered the codes for the entire DNA strand/genome. Once done, they divided the entire DNA into eight big chunks and then divided those chunks further into smaller DNA segments. This way, the researchers generated a necessary blueprint/plan for their project.
Based on that blueprint, they artificially made those smaller DNA fragments and added them together to make those eight DNA chunks. Finally, chunk by chunk, they replaced the original DNA in the bacteria with the synthesized DNA constructs. Once they replaced the last natural chunk with the synthesized DNA, they had viable bacteria that used only 61 codons. Not surprisingly, they named this synthesized bacteria syn61.
There are a few reasons why the researchers chose this sequential approach instead of just making the whole genome. One, it would have been very costly—if not technically impossible—to synthesize the entire genome at one go. Two, it would be also difficult to introduce that large, re-coded DNA in the bacteria without running into problems. With this modular approach, if something went wrong after replacing a particular DNA chunk, they could rectify the problem without having to synthesize the entire DNA again. For example, by the time the researchers were onto the eighth chunk, they had a viable organism with seven-eighths of its DNA replaced by the synthetic DNA. If the eighth replacement caused any problem, they could redesign that piece only and try again.
Unlike the nightmares in 1984, the consequences of reducing the genetic code were not all that dramatic. Syn61 divided about 1.6 times slower than their mothers (under standard growing conditions), and they seemed to be slightly longer. While I want to know why that happened, I understand that answering the question was beyond the scope of this study. The researchers wanted to see if they could design a viable organism with a reduced number of codons, and the answer to the question is yes. Only future research can explain why the re-coded syn61 was slightly different. In the meantime, we might see more extensive attempts to reduce the genetic vocabulary even further.
Unlike the nightmares in 1984, the consequences of reducing the genetic code were not all that dramatic. Syn61 divided about 1.6 times slower than their mothers (under standard growing conditions), and they seemed to be slightly longer. While I want to know why that happened, I understand that answering the question was beyond the scope of this study. The researchers wanted to see if they could design a viable organism with a reduced number of codons, and the answer to the question is yes. Only future research can explain why the re-coded syn61 was slightly different. In the meantime, we might see more extensive attempts to reduce the genetic vocabulary even further.