6 Discussion

The epigenomics of ageing has enormous potential for growth in the coming years. DNA methylation based ageing biomarkers have a bright future as a reliable and convenient broad indicators of biological ageing, that have the potential to be used as proxies for longevity in clinical trials intervening in ageing. For instance there is growing interest and investment in drugs with potential anti-ageing activities such as senolytics [462], and in re-purposing of existing drugs like metformin for pro-longevity and increased healthspan interventions [205]. Advances in epigenetic editing [463] promise the possibility of experimentally establishing causal roles of age related epigenetic changes, and the ability to dissect the mechanisms of the involvement of epigenetic changes in the processes of ageing. Improvements are taking plance in DNA methylation assay technologies such as NEBNext Enzymatic Methyl-seq ‘EM-seq’ [135], better DNA methylation calling in nanopore sequencing [464], advances in single-cell DNA methylation assays [465], and Illumina methylation arrays for sites conserved across mammals [203]. With these tools there are many additional opportunities to characterise the DNA methylomes and capture changes in DNA methylation which were not previously accessible, as well as to begin more mechanistic studies.

6.1 Epigenomic analysis of the developmental origins of long-term bone health

The developmental origins of health and disease hypothesis (DOHaD) holds that early life environmental influences have long term consequences for the risk of developing various pathologies in adulthood and later life. It is with this lens that the EWAS for relationships between umbilical cord DNA methylation and bone health outcomes in Chapter 3 were framed. DNA methylation being an epigenetic mark is influenced by environmental factors and is heritable by subsequent generations of cells, thus it could be a medium through which environmental factors could act on long term health in accordance with DOHaD. In this chapter two CpGs were identified as having genome wide significant associations with the outcome of interest in their respective EWAS. The first of these was CpG cg26559250 which is located at Chr6:157,653,445-157,653,447 at the ZDHHC14 (zinc finger DHHC-type palmitoyltransferase 14) gene. This CpG was identified In the EWAS for total bone mineral content minus head at 6 years adjusted for age and sex, with a p-value of \(2.52\times 10^{-8}\) for an increase of 1.46% per kg in an uncorrected model and a corrected model. The Second was CpG cg22570676 located at Chr19:2,527,492-2,527,494 at the GNG7 (G protein subunit gamma 7) gene. This CpG was identified on the EWAS for periosteal circumference at 38% from the distal end of the tibia at 6 years (mm) adjusted for age and sex, with a p-value of \(4.24\times 10^{-8}\) for an increase of 0.370% per mm in an uncorrected model and a corrected model The corrected models included covariates for: blood cell-type composition, maternal age at time of birth (years), sex, maternal BMI at 11 weeks gestation, parity, whether or not the mother smoked during pregnancy, and gestational age.

EWAS were performed for nine different outcomes across three groups of samples, and in each case four different models were fitted. Within a given EWAS the quite stringent Bonferroni standard for multiple testing correction was applied, however conducting multiple EWAS across different groups creates a secondary multiple testing problem. This means that these findings could still be false positives despite the aspiration of family wise correction to minimise type 1 errors. Adjusting for the number of tests performed in a given EWAS should in theory minimise false positives but if several are performed the effective number of tests increases and is not adjusted for increasing the probability that a result could be a false positive from the near zero level family wise correction is aimed at achieving. Confirmation of these associations in another cohort would be necessary in order for the biological reality of these associations to be asserted with confidence. These results are correlational and experimental follow up would be needed to establish any mechanistic or causal relationships between the DNA methylation state at these sites and bone properties in early life. One could for example attempt epigenetic editing of orthologous sites in a model organism and look for an effect on bone measurements [463].

This study did not find significant correlations between the examined bone measurements at CDKN2A, where previously an inverse relationship between DNA methylation and bone size, mineral content and mineral density at 4 years had been documented [264]. There was ample opportunity to see changes at this locus in the data, with 95 probe sites in the vicinity of this gene. Nor did this study see significantly reduced DNA methylation at RXRA in umbilical cord with maternal vitamin D supplementation or increased circulating vitamin D at the 75 probe sites near RXRA [263]. Whilst this study did not replicate these specific findings it has highlighted two new loci with possible relationships to bone health outcomes.

Epigenome wide association studies are here being used as a discovery platform for processes which may be implicated in the interaction of the in utero environment and bone health outcomes as mediated by the epigenome, all of which are complex and multifactorial. There is not strong prior knowledge of the relationships between systems under investigation with which to make precise predictions, the aim is rather to provide a starting point for generating more specific models with which to generate more precise hypotheses. This presents a challenge as there are many sources of noise which could obscure any relationships which do exist between these properties or produce the spurious appearance of a relationship when none may exist. Striking the balance between sensitivity and specificity is particularly challenging in the context. Lowering the threshold for specificity and admitting some type 1 errors might generate sufficient additional hits with which to attempt methods such as gene set enrichment analysis, and related approaches, to identify the biologically relevant systems and processes which may mediate the observed association. However, an excessive number of false positive inputs to such analyses can lead to spuriously identifying associated terms. Simply increasing the sample size of studies to reach the level of power necessary to detect small effect size changes is expensive, impractical and does not help when it comes to analysing existing datasets underpowered for the analysis as initially conceived. The hypothesis free approach has some advantages when attempting to elevate as yet unknown aspects of biology relevant the association being tested to the attention of investigators for further exploration. However, searching for associations between outcomes and particular genomic locations may be of limited effectiveness, even when sufficient statistical power is available to uncover very small effects. As, when individual loci have only very minor contributions to a given effect there are many of them, often spread across many systems [466]. Greater temporal and tissue specificity may reveal larger effect sizes in particular tissues at particular times. Time and tissue specific signals may currently be flattened out in the aggregate signal. The combinatorial complexity of possible times and tissues renders a brute force search impractical, so some prior reason based in biological understanding is likely to be needed to go looking in a particular time and/or tissue for an association.

If the primary interest is in identifying pathways or other functional biological units then making use of dimentionality reduction methods such as weighted correlation network analysis (WGCNA) [467] could potentially help to address some of the power issues faced by these studies. Though this approach also has the limitation that biologically relevant effects may be realised through small perturbations across many systems [466], meaning no whole network may stand out. Grouping the outcomes into effects on correlated gene networks rather than individual genes dramatically reduces the number of statistical tests performed. This approach could be used to narrow the set of tests to perform when looking for gene level associations in other datasets. If changes in gene networks are associated with an outcome of interest in one cohort it is reasonable to take this prior information to a second cohort and test only genes in this network for an association with the outcome of interest dramatically reducing the number of CpG level tests. One could also perform the reciprocal analysis, (dimentionality reduction in the second cohort and CpG level tests on a reduced set on the first cohort), as a means of validation. An ongoing collaboration with colleagues at MRC-IEU, University of Bristol is including this data in a meta-analysis and is replicating several of these EWAS in other cohorts. This provides an opportunity to attempt to replicate the sites identified here and ascertain if they are sufficiently robust to warrant functional follow-up.

6.2 The implications of ageing-related changes in the epigenetic state of tRNA genes

tRNAs are central to the core cellular process of translation but relatively little is known about the epigenetic state of the genes which encode this essential cellular component. Almost half of all tRNA genes are silent [305], and there is evidence that they are expressed in a tissue specific fashion [325,326], indeed this study found differences in tRNA gene DNA methylation between blood cell types. There are many indicators that there is a great deal of potential biological insight to be gained from a deeper understanding of the epigenetic regulation of tRNA genes.

The finding, in Chapter 4, that the tRNAome is enriched for age related hypermethylation and that there are two loci in which this effect is distinct and replicable is challenging to interpret given the number of different regulatory paradigms in which tRNA genes are involved. There are several ways in which the effects of changes on tRNA DNA methylation could manifest. There is the canonical function of tRNAs in which changes would impact on the expression of the tRNA gene which alters the amount of mature tRNA produced, with potential consequences in translation. There is also action through altering the amount or type of tRNA derived small RNA molecules produced. There is altering the chromatin dynamics of the tRNA gene loci, with potential knock-on effects for other systems given their insulator activity [353] and tendency to cluster in three dimensional space with other tRNA genes [354]. In addition these are not mutually exclusive.

Despite the relatively localised effects on DNA methylation that were observed (Figure 4.13) these changes could still have an effect on larger scale chromatin dynamics. The action of zinc finger CXXC domain proteins binding a unmethylated CpGs and Methyl-CpG-Binding domains at methylated CpGs can alter histone modifications and affect alternate histone usage respectively [468,469]. In addition changes in the nucleosome organisation at tRNA genes may affect genome architecture [470]. So it is hard to rule out effects on chromatin architechture even from localised DNA methylation changes. The circulating tRNA derived small RNAs have been documented as exhibiting age related changes and thus represent an additional signalling vector through which changes in tRNA expression could act [346,471].

The effects of increased DNAm on tRNA genes was to repress their expression in a plasmid based experimental system [320], however this does not necessarily imply that increased DNA methylation at a tRNA gene locus would result in the reduced expression of tRNA derived small RNAs. Changes in transcriptional dynamics may alter the fate of the transcription product in terms of fragmentation pattern or post transcriptional modifications just as changes in polII transcriptional dynamics impact alternative splicing in protein coding genes [472]. It is plausible that increased DNA methylation at a tRNA gene could cause the tRNA gene to favour production of tRNA derived small RNA transcripts over canonical mature tRNAs rather than silencing it entirely.

tRNAs are a key component of the translation system, a core component of the cell, any age related alterations in this system have the potential to impact on essentially all other systems in the cell if they affect translation. Translation’s tight coupling to metabolic regulation and the effect of modulating it on ageing [344] means that any effect of tRNA gene DNAm on translation could be relevant for the regulation of ageing processes.

Unpicking the potential mechanism of action of changes in tRNA gene DNA methylation, if indeed these changes have physiologically relevant impacts, will require greater understanding of a number of aspects of tRNA biology. Our collective understanding of the post-transcriptional modifications of tRNAs and the regulatory functions of tRNA derived small RNAs is still in its infancy [340]. In conclusion with possible effects on chromatin structure, on tRNA derived small RNA signalling and on translation, there are too many mechanisms of action through which changes in epigenetic state of tRNA genes could impact on ageing related systems, and at present insufficient evidence to favour any one particular mechanism. In addition, these possibilities are not mutually exclusive so any effects need not be limited to a single mechanism.

A number of age related tRNA DNA hypermethylation signals persisted after correction for cell-type composition, and there were some differences in tRNA gene DNA methylation between cell-types. This suggests that there are at least some cell-type independent changes in tRNA DNA methylation and leaves open the possibility that there are some changes which are cell-type specific. More detailed characterisation of the cell-type specific activities of tRNA genes may aid in the elucidation of the regulatory functions of tRNAs by permitting the association of the pathways active in particular cell-types. The tRNA-iMet-CAT-1-4 locus which showed the most consistent result has an overall increase of 3.7% from age 4 years to age 78 years in targeted bisulfite sequencing data. The 450k array and Targeted bisulfite sequencing results put the rate of increase in DNA methylation at tRNA-iMet-CAT-1-4 at somewhere in the approximate range of 0.05-0.22 percentage points of DNA methylation increase per year, starting with a baseline methylation near zero at birth. CpGs differentially methylated with age have generally shown changes on the order of 0.1-0.125% per year [120,382], placing the magnitude of the effect seen here in a comparable range to previously observed age related changes in DNA methylation.

The targeted bisulfite sequencing method employed here is quite efficient at capturing DNA methylation within tRNA genes as a single amplicon is generally sufficient to span a given tRNA gene. This panel could be expanded to most of the human tRNAome though some loci have CpG loci positioned flanking the tRNA genes which prevent the design of primers spanning the tRNA gene locus. It would be interesting to attempt this assay with enzymatic (NEBNext Enzymatic Methyl-seq ‘EM-seq’) instead of bisulfite conversion as this might permit some inaccessible sites to be targeted with slightly longer amplicons and improve yields due the reduced fragmentation arising from bisulfite treatment [135]. It would be particularly interesting to employ this in mice at tRNA loci not well covered by RRBS, to see if the finding of tRNA gene enrichment for age related hypermethylation is also the case in mice. The RRBS mouse data analysed in this study showed age-related hypermethylation in three of 51 tRNAs. This limited coverage of the 401 high confidence tRNAs in the mouse genome [304] limits the ability to generalise about the enrichment for tRNA gene DNA hypermethylation in mice.

An effective follow up experiment would be to use Pacific Biosciences no amplification CRISPR based targeted sequencing method to examine tRNA gene clusters [473]. A dataset of this type has the potential to elucidate several features of the tRNA genes. Firstly, it would permit a more detailed characterisation of tRNA gene copy number variation, at least those within known clusters, due to the long read sequencing technology. Secondly, the ability to detect DNA methylation, and not simply average levels but patterns of methylation distribution within single molecules, which might reveal additional information about the nature of any age related changes [474]. The long sequencing reads would also permit unambiguous mapping to particular tRNA gene copies which would provide further evidence, in addition to that provided by the targeted bisulfite sequencing analysis, that the observed increase in DNA methylation with age is not somehow an artefact of mapping issues. Coupled with this it would also be illuminating to employ the Hydro-tRNAseq and PAR-CLIP methods employed by Gogakos et al. [352] to perform detailed characterisation of pre-tRNA and mature tRNA transcripts from the same system to permit the relationship between tRNA gene DNA methylation and tRNA transcription levels to be examined simultaneously.

Little was previously known about the epigenetic state of the human tRNA genes [347] and much remains to be characterised. This work represents the first detailed characterisation of the DNA methylation state of the human tRNAome, and revealed a novel pattern of age related hypermethylation in these genes.

6.3 Assessing Biological ageing by DNA methylation changes within Alu repeat elements

Repetitive elements make up some 45% of the human genome [410] and the global hypomethylation observed with age is driven by these repeats [411], but to date limited coverage of these regions [213] has meant that their potential to contain information relevant to biological ageing has gone underexplored.

The best Alu DNA methylation age predictor constructed in Chapter 5 was able to predict chronological age with an R of 0.65 from a training set of n=774 in an unrelated replication set of n=664 with a median absolute error of 8.1 years. Whilst this is less accurate than many of the other DNA methylation based age predictors [434] it was not the primary goal of this predictor to generate the most accurate age predictions but rather to be sufficiently accurate to capture a signature of age acceleration specific to the Alu repeat elements. The difference between the predicted and chronological age, the age acceleration, was strongly correlated with the chronological age such that the age of older individuals is prone to be overestimated and vice versa. One explanation for this may be down to the limitations of DNA methylation quantitation by MeDIP-seq. The elastic net regression may have identified loci which have a consistent direction of change with age but are of variable magnitude. If selecting for sites which had a relatively consistent magnitude in the training set but which varied in a manner which skews higher in the larger population this could lead to the overestimates of the age of older samples and the underestimates of the age of the younger ones. This is speculative and it would be interesting to examine further by looking at the properties of the data in the training and prediction groups of the predictor sites and seeing how they are distributed. In addition it may be possible to study this bias with a simulation approach to see what data properties can produce this pattern of error. This effect in the quantile normalised data was not mitigated by binarising the data by locus, absolute methylation estimates, or in the raw reads per million base pairs data. The difficulty with absolute DNA methylation quantitation by MeDIP-seq is not simple to resolve with common data transformations.

The strong association between Alu age acceleration and chronological age limited the interpretability of the GWAS for age acceleration. This is because associations found here could easily be driven by differences in allele frequencies between different age strata within the GWAS population and not with signal driven by the Alu age acceleration independent of chronological age [475].

The samples used to train and test the models have a median age of approximately 60 years. A larger proportion of samples at a particular part of the age distribution is also a potential source of bias, as poor performance in the lower numbers of older and younger individuals will not be penalised as much as poor performance at the ages with more samples. This also negatively impacts the ability to determine the quality of the predictor as a good age range is is required to reliably estimate the R of a predictor [293]. Mitigating this by equalising the numbers in certain age groups is a possibility but a substantial number of samples must be left out to achieve this, excessively shrinking the training set. In addition this unlikely to be the sole factor at play in poor prediction performance of these clocks as similarly imperfect age distributions have been used in the training of more performant clocks.

The Twins UK dataset provides a powerful tool for assessing the impact of genetics on age predictors. The age predictors generally performed only marginally better when predicting the ages of the twins of the individuals in the training set than on unrelated individuals, suggesting a minimal impact of genetic factors on the Alu DNA methylation age predictor.

Despite the issues with the correlation of age acceleration with chronological age this work and Wang et al.’s rDNA clock [189] demonstrate that a DNA methylation based age predictors can be trained in repetitive sequence elements. This suggests that constructing DNA methylation based age predictors targeted to particular subsets of the genome is possible. Though, the aim of capturing signatures of biological ageing specific to those subsets is yet to be adequately explored. It may be possible to revisit this approach once large whole genome bisulfite or the enzymatically converted equivalent datasets are generated as it seems unlikely that it would be economical to examine the ~1.1 million Alu elements with any of the available targeted methods. Alternatively if the cost of long-read sequencing drops and the quality of methylated base calls using these methods increases [464] they may also be a viable source of such data. Other possible features on which to attempt to construct age predictors could include: MIR repeat elements some of which have been co-opted as enhancers and which are associated with tissue specific gene expression [476]; Long Terminal Repeat (LTR) elements which are enriched for chromatin marks which characterise active cis regulatory elements [477]; Histone genes because of their core function and the wide spread genomic implications of alterations in their availability. They are also located in early-replicating domains [398] so should generally have high fidelity DNA methylation copying during mitosis [123] so any changes observed here are less likely to be the product of epigenetic drift. This would include also alternative histones as they have functions in genome stability and DNA repair [478]. However, Histone genes present a relatively limited set of possible sites with which to predict.

6.4 Conclusion

The unifying theme of this work is the relationship between DNA methylation and healthy ageing. From its possible function as a mediator for the effects of early life environmental influences on long term bone health, through age-related hypermethylation of genes encoding core components of the transcriptional machinery, to signatures of biological ageing in the repetitive regions of the genome. The epigenome sits upon the genome encoding the annotations to the genome necessary for cells with diverse and dynamic functions to arise from a singular set of genetic information. The ability to construct epigenetic clocks reveals that this layer of information storage and processing contains much that is important for understanding the molecular and cellular processes of ageing. The environmental malleability of the epigenome is its core strength, it is this plasticity to adopt multiple roles that permits multicellularity [479], this malleability both leaves the epigenome open to disruption and presents the possibility of correcting any errant changes. The integrative understanding of epigenomics has the potential to contribute many novel scientific insights into the fundamental mechanisms of ageing in the years to come with profound impacts for our ability to ameliorate chronic and ageing related conditions by intervening in their underlying causes to increase longevity and healthspan.