A Real Human Genome is 6.4 Billion Letters (base pairs) Long — Not 3.2 Billion
“The universe is true for us all and dissimilar to each of us.”
― Marcel Proust, a la recherche du temps perdu (In Search of Lost Time)
At 9,609,000 characters, Marcel Proust’s a la recherche du temps perdu holds the record for the longest novel ever written. That’s a lot of information. But, even if you multiplied the number of characters in the longest book ever written by five-hundred, you’d fall short of the number of letters in a single person’s genome. That’s a lot of A’s, T’s, C’s and G’s.
We are at an inflection point in the genomic revolution. As with any consciousness raising moment, it’s going to take some time for people to catch up. Because of that, a lot of numbers and figures get tossed around. But, if someone says 3.2 billion letters is the size of a person’s genome, they’re not telling the full story.
A real human genome is 6.4 billion letters (base pairs) long. Not 3.2 billion.
So, how did this misunderstanding become so commonplace?
It starts back at The Human Genome Project (HGP). The goal of the HGP was to create a reference genome that would serve as a standard for genomic research and testing. To establish that standard, the objective of the project was to produce a single, contiguous DNA sequence for each of the 22 non-sex chromosomes (autosomes), the X and Y sex chromosomes, and the mitochondrial DNA.
The approximate total number of letters in this reference, computer code genome is 3.2 billion. On a technical level, the reference human genome is a computational abstraction. It is not a whole human genome in the real world--the genome that is necessary to make a human being and that is found inside the cells of the body.
That human genome--the real, physical human genome--is diploid; in other words, it has two pairs of each chromosome--one from each parent. Everyone has a chromosome 1 from mom and another from dad; and a chromosome 2 from mom and another from dad; and so forth. The total is 46 chromosomes, or two pairs of 23. Thus, 6.4 billion letters (or bases) long. Note: two paired bases in the double helix provide redundant information and count as one unit.
Our genome, like our universe, still has uncharted regions (although the human genome sequence is over 90% complete). And — as Proust might say — our genome, like the universe, is “true for all of us and dissimilar to each of us.” The first step for both universal truths and individual discovery is understanding the difference between the abstract and the real.
Learn more about whole genome sequencing from our Cinema Veritas series, or by reading about our myGenome product.