Who is this post for
This post is an almost direct transcription of the first paragraphs of Chapter 1, of my PhD thesis. When I was writing it, I assumed the reader would have at least a basic knowledge of biology, and some familiarity with genetics and developmental biology. Some terms may be hard to digest if you’re not familiar with the science / history behind them. I hope this does not discourage you, but if it does, I provided a Glossary at the end of this page, to help guide you through it.
DNA methylation
DNA methylation is a covalent molecular modification by which methyl groups (i.e., a molecule composed of one carbon and three hydrogen atoms) are added to the 5th carbon position of cytosine.
Hold on: 5th carbon position of cytosine?…
What does that mean?
You’re right, let’s go back to basics, for a bit.
DNA is made of two linked strands that wind around each other forming a double helix. Each strand has a backbone made of alternating carbohydrates (molecules composed of carbon, hydrogen and oxygen atoms), more commonly known as sugars. I bet you know a few. You may even be addicted to some of them! In the case of DNA, the sugars are from the deoxyribose family. The 5’-end (pronounced “five prime end”) designates the end of the DNA strand that has the fifth carbon in the sugar-ring of the deoxyribose at its terminus.
Figure 1: A furanose (sugar-ring) molecule with carbon atoms labeled using standard notation. The 5’ is upstream; the 3’ is downstream. DNA and RNA are synthesized in the 5’-to-3’ direction. From Wikipedia.
Also part of the DNA backbone are phosphate groups. The phosphate residue is attached to the hydroxyl group of the 5’ carbon of one sugar and the hydroxyl group of the 3’ carbon of the sugar of the next nucleotide, which forms a 5’–3’ phosphodiester linkage. These are very important in providing stability to DNA.
Attached to each sugar is also one of four bases: adenine (A), cytosine (C), guanine (G) or thymine (T). They are bases due to the basic nature of their nitrogen functional groups.
If you look at the figure below on the left, you can see the amplified structure of A, C, G & T. Do you notice how T & C (or A & G) resemble each other much more than A & T (or C & G)? Take a close look at their atomic composition. T and C are pyrimidines. A and G are purines. Purines always bind with pyrimidines – known as complementary pairing. Try to make it a game and figure out the differences for yourself!
Click for the solution
Figure 2: DNA molecular structure (on the left) and structure of a methyl group (on the right). Relative methyl group dimensions are amplified here!
Going back to DNA methylation itself, in 1975, two key independent studies suggested that methylation of cytosine residues in the context of cytosine-guanine (CpG) dinucleotides could act as epigenetic regulators of gene expression in vertebrates (Holliday and Pugh 1975; Jones 2012; Riggs 2008). The CpG sites sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5’ → 3’ direction.
In the vertebrate world (and eukaryotes in general), the most common methylation modification occurs at the fifth carbon of the pyrimidine ring (5mC) at Cytosine-Guanine (CpG) sites. The symmetrical presence of CpG methylation marks on both DNA strands allows the maintenance of DNA methylation patterns after mitosis and is therefore a key feature of epigenetic regulation (Holliday and Pugh 1975; Riggs 2008).
Most of the genome is depleted of CpGs, due to spontaneous deamination, with certain regions showing expected CpG levels, known as CpG islands (Smith and Meissner 2013). These occur mainly at transcription start sites - sites at which genes start being transcribed into RNA, a single-stranded molecular that can be translated into a protein - of housekeeping and developmental regulator genes (Deaton and Bird 2011), and are generally unmethylated.
Figure 3: How methylation of CpG sites followed by spontaneous deamination leads to a lack of CpG sites in methylated DNA. As a result, residual CpG islands are created in areas where methylation is rare, and CpG sites stick (or where C to T mutation is highly detrimental).. From Wikipedia.
However, of the roughly 28 million CpGs in the human genome, 60-80% are methylated (Smith and Meissner 2013). Furthermore, most bulk genomic methylation patterns are stable across cell-types and throughout life, changing only in localized contexts – for example, due to disease-associated processes.
Two exceptions to this occur during mammalian embryonic development: a first demethylation wave occurs during pre-implantation, enabling totipotency of the zygote. During this process, global demethylation of the father’s genome at fertilization is followed by a depletion in both parental genomes, beginning in the zygote through the first few early embryonic replication cycles (Cedar and Bergman 2012) (with the exception of differentially methylated regions associated with imprinted genes, retrotransposons and centromeric heterochromatin). A wave of methylation then occurs during implantation throughout embryogenesis, allowing tissue-specific formation. Finally, a second demethylation wave takes place during genesis of primordial germ cells – including imprinting erasure – which is once again vital to maintain totipotency (Hajkova et al. 2008; Surani, Hayashi, and Hajkova 2007).
Figure 4: Dynamic of DNA methylation during mouse embryonic development. E3.5-E6, etc., refer to days after fertilization. PGC: primordial germ cells. From Wikipedia.
In addition to methyl groups, hydroxymethyl groups (OH) have also been observed to be bound to cytosine nucleotides, forming 5-hydroxymethylcytosine (5hmC), an oxidative process catalysed by TET enzymes. Interestingly, 5hmC was found to be 10-fold more abundant in neurons than other tissues or embryonic stem cells suggesting that it might have a significant role in brain (Sun et al. 2014).
Additionally, 5hmC is enriched in gene bodies, promoters, and transcription factor binding sites and literature evidence suggest roles of 5hmC in regulating gene expression and controlling cell identity. DNA methylation is also found at non-CpG sites (mCHG and mCHH, where H = A, C or T) (Lister et al. 2009), a phenomenon firstly described in the plant genome (Gruenbaum et al. 1981; Lindroth et al. 2001) with a well-established functional role (Chan, Henderson, and Jacobsen 2005).
Animal studies show that non-CpG methylation is more frequent in cultured pluripotent stem cells – including human embryonic stem cells, induced pluripotent stem cells (Guo et al. 2014; Laurent et al. 2010; Lister et al. 2009, 2011; Ramsahoye et al. 2000; Stadler et al. 2011; Ziller et al. 2011), and cells in the mouse germline (Guo et al. 2014; Ichiyanagi et al. 2013; Smith et al. 2012; Tomizawa et al. 2011) –, than in most somatic tissues (Guo et al. 2014; Ramsahoye et al. 2000; Ziller et al. 2011). However, several recent profiling studies have shown the presence of non-CpG methylation in the adult mouse dentate gyrus (Guo et al. 2014) and cortex (Guo et al. 2014; Lister et al. 2013; Xie et al. 2012), and human brain (Guo et al. 2014; Lister et al. 2013; Varley et al. 2013), with evidence suggesting clear distinctions between mCHs in the brain and those in pluripotent stem cells (Guo et al. 2014).
Most notably, non-CpG methylation accumulates significantly in neurons through early childhood and adolescence, becoming the dominant form of DNA methylation in mature human neurons (Lister et al. 2013). Indeed, several studies have suggested an independent epigenetic function of non-CpG methylation, particularly during neuronal maturation (Lister et al. 2013). For example, methylation at sites other than CpG dinucleotides can recruit methyl-CpG-binding protein 2 (MECP2), which is an important transcriptional repressor, particularly for long genes with neuronal functions (Chen et al. 2015; Gabel et al. 2015). Therefore it is important to keep in mind that other chemical modification (e.g., 5hmC) to DNA and methylation at non-CpG sites may also have an important (if not more important) functional role in brain tissue.
Glossary
- 🧬centromeric heterochromatin - The centromere is the primary constriction observed in condensed chromosomes during mitosis (the process by which a cell replicates its chromosomes and then segregates them, producing two identical nuclei in preparation for cell division) (Bloom 2014) and provides the site of assembly for the kinetochore (large protein assemblies that connect chromosomes to microtubules of the mitotic and meiotic spindles in order to distribute the replicated genome from a mother cell to its daughters).
- 🔗covalent bond - A covalent bond is a chemical bond that involves the sharing of electrons to form electron pairs between atoms. It is a stable, but reversible bond. Very useful in biology!
- 🪼eukaryotes - any cell or organism that possesses a clearly defined nucleus. Eukaryotic cells have a nuclear membrane that surrounds the nucleus.
- deamination - the removal of an amino group (chemical groups that contain basic nitrogen with a lone pair) from an aminoacid or other compound.
- retrotransposons - evolutionarily widespread genetic elements that replicate through reverse transcription of an RNA copy and integrate the product DNA into new sites in the host genome.
- 🪺zygote - fertilized egg cell that results from the union of a female gamete (egg, or ovum) with a male gamete (sperm).