4 The Genomic Loci of Specific Human tRNA Genes Exhibit Ageing-Related DNA Hypermethylation

4.1 Abstract

Understanding how the epigenome deteriorates with age and subsequently impacts on biological function may bring unique insights to ageing-related disease mechanisms. As a central cellular apparatus, tRNAs are fundamental to the information flow from DNA to proteins. Whilst only being transcribed from ~46kb (<0.002%) of the human genome, their transcripts are the second most abundant in the cell. Furthermore, it is now increasingly recognised that tRNAs and their fragments also have complex regulatory functions. In both their core translational and additional regulatory roles, tRNAs are intimately involved in the control of metabolic processes known to affect ageing. Experimentally DNA methylation can alter tRNA expression, but little is known about the genomic DNA methylation state of tRNAs.

Here, we find that the human genomic tRNA loci are enriched for ageing-related DNA hypermethylation. We initially identified DNA hypermethylation of 44 and 21 specific tRNA genes, at study-wide (p < \(8.36\times10^{-5}\)) and genome-wide (\(p < 4.34\times10^{-9}\)) significance, respectively, in 4,350 MeDIP-seq peripheral blood DNA methylomes (16 - 82 years). This starkly contrasted with 0 hypomethylated at both these significance levels. Further analysing the 21 genome-wide results, we found 3 of these tRNAs to be independent of major changes in cell-type composition (tRNA-iMet-CAT-1-4, tRNA-Ser-AGA-2-6, tRNA-Ile-AAT-4-1). We also excluded the ageing-related changes being due to the inherent CpG density of the tRNAome by permutation analysis (1,000x, Empirical p-value < \(1\times10^{-3}\)). We additionally explored 79 tRNA loci in an independent cohort using Fluidigm deep targeted bisulfite-sequencing of pooled DNA (n=190) across a range of 4 timepoints (aged ~4, ~28, ~63, ~78 years). This revealed these ageing changes to be specific to particular isodecoder copies of these tRNAs (tRNAs coding for the same amino acid but with sequence body differences) and included replication of 2 of the 3 genome-wide tRNAs (tRNA-iMet-CAT-1-4, tRNA-Ser-AGA-2-6). Additionally, this isodecoder-specificity may indicate the potential for regulatory fragment changes with age.

In this study we provide the first comprehensive evaluation at the genomic DNA methylation state of the human tRNAome, revealing a discreet and strongly directional hypermethylation with advancing age.

4.2 Introduction

Ageing is implicated as a risk factor in multiple chronic diseases [288]. Understanding how the ageing process leads to deteriorating biological function is now a major research focus. This field has hopes of increasing human longevity and ‘healthspan’ whilst ameliorating the extensive physical, social and economic costs of these ageing-related disorders [289]. Epigenetic processes, which influence or inform cell-type specific gene expression, are altered with age and are a fundamental hallmark of this progression, indeed they are arguably a hub mediating other hallmarks including stem cell exhaustion, cellular senescence, and mitochondrial dysfunction [11,30].

DNA methylation (DNAm) is the most common epigenetic modification of DNA and age-associated changes in this mark were recognised in mammalian tissues as early as 1983 [138]. Changes in DNAm with age are extensive with thousands of loci affected. Many of these changes represent ‘drift’ [119] arising from the imperfect maintenance of methylation state. Specific genomic regions show distinct directional changes, with loss of DNA methylation in repetitive or transposable elements [290], and gains in certain promoters, including the targets of polycomb repressor complex [155] as well as bivalent domains [153]. The advent of high-throughput DNAm arrays [291,155,167] has elucidated more detailed patterns of ageing related changes in DNAm. The identification of precise individual CpG sites that exhibit consistent changes with age enabled the construction of predictors of chronological age known as epigenetic or DNAm ‘clocks’ [167,168,193,292,194]. Furthermore, it was observed that ‘acceleration’ of this DNAm-derived measure is a biomarker of ‘biological’ ageing due to associations with morbidity and mortality [293,294]. In a previous investigation of ageing-related DNAm changes within common disease-associated GWAS regions, Bell et al. identified hypermethylation of the specific transfer RNA gene, tRNA-iMet-CAT-1-4 [209]. The initiator methionine tRNA possesses certain unique properties [295297], including its capacity to be rate limiting for translation [298], association with the translation initiation factor eIF2 [299], and ability to impact the expression of other tRNA genes [300].

tRNAs are fundamental in the translation process for all domains of life and are thus evolutionarily ancient [301]. This translation machinery and the regulation of protein synthesis are controlled by conserved signalling pathways shown to be modifiable in longevity and ageing interventions [302]. The human tRNAome, comprising all of the genomic locations of tRNA genes, represents an extremely small portion of the genome [303]. There are 416 high confidence tRNA genes when additionally considering tRNA pseudo genes, nuclear encoded mitochondrial tRNA genes and possibly some closely related repetitive sequences the number of sequences closely resembling tRNAs is 610 (gtRNAdb [304]) this extended set covers <46 kb (including introns) and represents <0.002% of the human genome. Despite their small genomic footprint, and the observation that approximately half of all tRNA genes are transcribed at negligible levels if at all [305], these genes produce the second most abundant RNA species next to ribosomal RNA [306] and are required for the production of all proteins. Mature tRNAs have an L-shaped three dimensional structure arising from a ‘clover-leaf’ shaped two dimensional structure comprised of three hairpin stem-loop structures (Figure 4.1).

Structure of a mature tRNA Two and three dimensional representations of tRNA structure with matching colour coding. Adapted from the wikimedia foundation structure based on PDBID: 1ehz. tRNAs are ‘charged’ when an amino acid is attached at the CCA site at the 3’ end.

Figure 4.1: Structure of a mature tRNA Two and three dimensional representations of tRNA structure with matching colour coding. Adapted from the wikimedia foundation structure based on PDBID: 1ehz. tRNAs are ‘charged’ when an amino acid is attached at the CCA site at the 3’ end.

tRNA genes are transcribed by RNA polymerase III (polIII) [307] and have type II polIII promoters which contain A and B-box internal promoter elements bound by the complex TFIIIC, followed by TFIIIB, and polIII [308] (Figure 4.2). Transcription is terminated by a simple run of Ts and proceeds in rounds of fast re-initiation where the same polIII molecule is preferentially re-used [309]. tRNA gene expression is modulated by the polIII specific transcription factor Maf1 a highly conserved factor which represses tRNA transcription [310312]. The activity of Maf1 is modulated by the Target of Rapamycin Kinase Complex 1 (TORC1) [313], a highly conserved hub for signals that modulate ageing [314]. Several general transcription factors also influence tRNA gene expression; the tumour suppressors p53 [315] and Rb [316] both negatively regulate tRNA expression and c-Myc upregulates tRNA gene expression, all act via TFIIIB [317]. tRNAs are dysregulated in cancer, and have potential utility as prognostic markers [318]. tRNA regulation may play an important role in cancer [319]. DNAm is able to repress the expression of tRNA genes experimentally [320], in plasmid expression systems, but may also represent co-ordination with the local repressive chromatin state [321]. In addition, recent results from Gerber et al. [322] show a mechanism by which polII can function in the regulation of certain polIII transcribed loci including several tRNA genes.

tRNA Transcription Cartoon representation of the RNA polIII transcription initiation complex and structure of the type II RNA polymerase III promoter. Colour coding here corresponds to that in figure 4.1 illustrating that promoter is internal as the A box corresponds approximately to the D-loop and the B box to the T-loop in the tRNA structure.

Figure 4.2: tRNA Transcription Cartoon representation of the RNA polIII transcription initiation complex and structure of the type II RNA polymerase III promoter. Colour coding here corresponds to that in figure 4.1 illustrating that promoter is internal as the A box corresponds approximately to the D-loop and the B box to the T-loop in the tRNA structure.

Assuming similar rates of transcription one would expect that the more frequently an amino acid is used in the exome, the more copies of that tRNA gene there would be in the genome [323]. Indeed, tRNA gene dosage is quite closely matched to amino acid usage frequency in the human exome, though the correlation is less strong for codon usage (Figure 4.3). The imperfect nature of this correlation suggests that there may be regulation of tRNA expression beyond simply having copy numbers proportionate to usage frequency.

tRNA gene copy number is imperfectly correlated with amino acid and codon usage frequency A) Amino acid usage frequency in the human exome vs tRNA gene copy number B) Codon usage frequency in the human exome vs tRNA gene copy number Usage Frequency Data [324], tRNA gene count data from GtRNAdb [304].

Figure 4.3: tRNA gene copy number is imperfectly correlated with amino acid and codon usage frequency A) Amino acid usage frequency in the human exome vs tRNA gene copy number B) Codon usage frequency in the human exome vs tRNA gene copy number Usage Frequency Data [324], tRNA gene count data from GtRNAdb [304].

A variety of observations have indicated that many tRNA genes are expressed in a tissue specific fashion in diverse organisms [325,326]. These include the tissue specific manifestations of diseases caused by mutations involving tRNAs and their usage [327],[328]. Though detailed characterisation of tissue specific tRNA expression patterns are still lacking. Considering the broad similarity of transcriptome codon usage across time and tissues and the purported functional equivalence in translation of tRNAs with the same anticodon, there is an open question as to the reason for tissue specificity of tRNA expression. The tissue specific expression of tRNAs whilst substantially maintaining codon ratios would require highly coordinated tRNA gene regulation to obtain similar expression levels of putatively functionally equivalent genes. This situation is strongly suggestive of as yet undiscovered function.

Additionally, beyond their core role in the information flow from DNA to protein sequence, tRNAs can fragment into numerous tRNA-derived small RNAs (tsRNAs). There are a number of types of tRNA derived small RNAs (tsRNAs) [329] which serve as signalling molecules [330] (Figure 4.4). Four main types of tsRNAs have been identified, these are 5’ tRNA halves, 3’ tRNA halves, and the two main types of tRNA derived fragments (tRFs), the ~18bp and the ~22bp sizes [331333]. tRNAs and pre-tRNAs also give rise to piRNAs [334336] and internal tRNA fragments or itRFs [337]. The terminology and ontology of tsRNAs is still stabilising as our understanding of these RNA species evolves. Some of these tsRNAs feedback on protein synthesis by regulating ribosome biogenesis [338], others have diverse regulatory functions such as targeting transposable element transcripts [339]. The extent of the functional significance of tsRNAs is also an open question [340].

The Types and Functions of tRNA derived small RNAs Reproduced from Cristodero et al. [341].

Figure 4.4: The Types and Functions of tRNA derived small RNAs Reproduced from Cristodero et al. [341].

Ageing is linked to core aspects of metabolic regulation, with nutrient sensing and stress response acting as major modulators of ageing. tRNAs as well as tsRNAs are integral to the regulation of protein synthesis and stress response. Protein synthesis represents a substantial proportion of total cellular energy expenditure and this fraction can vary considerably with nutrient availability [342]. Metabolic processes are also recognised to modulate the age estimates of DNAm clocks [343]. Partial inhibition of translation increases lifespan in multiple model organisms [344] and polIII inhibition increases longevity acting downstream of TORC1 [345]. Furthermore, 5’ tRNA halves circulating in serum are modulated by ageing and caloric restriction [346].

DNAm is able to repress the expression of tRNA genes on methylated expression vectors [320]. Whilst the broader chromatin milieu affects tRNA transcription, tRNA genes are unusual in that they are sufficiently short to fit within a single nucleosome. tRNA genes are generally ‘nucleosome free’ and precise placement of the nucleosome immediately upstream of the transcriptional start site is of importance for their expression [347]. A recent review of the epigenetic regulation of the polIII transcriptome [348] noted the very limited CpG methylation data available at polIII loci. Transcription by RNA polymerase III at SINE loci is suppressed by histone methylation but not by DNA methylation [321], indicating that DNA methylation may not directly influence the expression of tRNA genes but may do so by influencing the surrounding chromatin state. Standard RNA-seq is of limited utility in examining tRNA expression due to the issues of mappability, in addition much RNA-seq data is generated with size and polyA selection methods which would exclude tRNA derived transcripts. Also, tRNAs are a major target for ‘epitranscriptomic’ modification, with an average of 11-13 modifications per tRNA [349,340], some of which stop polymerases from elongating or alter base pairing, creating further mapping challenges. Thus, variants on standard RNA-seq procedures have been developed [350352]. polIII ChIP-seq has been used as a proxy for tRNA gene expression and, unlike RNA-seq based methods, generates reads from uniquely identifiable flanking regions which map to a known locus of origin [308]. This same advantage exists for the MeDIP-seq data used in this study.

tRNA gene loci may also play a role in large scale genome organisation. tRNA gene clusters act as insulators [353], and have extensive long-range chromatin interactions with other tRNA gene loci [354]. The coordinated transcription of tRNAs at subnuclear foci may represent an organising principle for 3D-chromatin by providing spatial constraints. In both budding and fission yeast tRNA genes localise to the nucleolus. In fission yeast, a subset of B-box sequence elements are bound by TFIIIC and not polIII serving as chromatin anchors to the nuclear periphery and acting as boundaries between euchromatic and heterochromatic regions [355]. It is unclear to what extent tRNA genes may play similar roles in large scale chromatin organisation in other organisms.

In this study ageing-related changes in the epigenetic DNA methylation state of the entire tRNAome were directly investigated, facilitated by the availability of a large-scale MeDIP-seq dataset. Arrays poorly cover this portion of genome. The 450k and EPIC arrays have 110 robust probes covering 84 tRNAs and 129 robust probes covering 89 tRNAs respectively, thus even the latest EPIC arrays cover <15% of the tRNA genes, with robust probes, and in total only ~4.7% of all the tRNA gene CpGs [223].

tRNA genes sit at the heart not only of the core biological process of translation but at a nexus of signalling networks operating in several different paradigms, from small RNA signalling to large scale chromatin organisation [354]. In summation tRNA biology, protein synthesis, nutrient sensing, stress response and ageing are intimately interlinked.

4.3 Primary questions

Are there additional loci which show the age-related DNA methylation changes observed in tRNA-iMet-CAT-1-4?

Is there an overall pattern of age-related hypermethylation in the tRNAome?

Can age-related DNA methylation changes at tRNA loci be validated with different technology to MeDIP-seq in which was initially identified?

Can age-related DNA methylation changes at tRNA loci be replicated in a cohort independent of that in which it was originally identified?

Can age-related DNA methylation changes at tRNA loci be observed in other organisms?

Can tissue specific differences in age-related DNA methylation changes at tRNA loci be observed?

4.4 Methods

Code for analyses performed in this chapter can be found here: https://github.com/RichardJActon/tRNA_paper_code

4.4.1 Participants

4.4.1.1 MeDIP-seq DNA methylomes

Participants in the ‘EpiTwins’ study are adult volunteers from the TwinsUK Register. The participants were aged between 16 and 82 years, with a median of age at sampling of ~55 years for 4350 methylomes taken from 3652 unique participants some of whom where sampled more than once. (cohort profile [232]). The majority of samples 4054 are female and 270 are male. 3001 have full blood counts. These individuals originate from 1933 unique families. There are 1234 monozygotic (MZ) twin pairs (2468 individuals), and 458 dizygotic (DZ) twin pairs (916 individuals). See also the participants section in general Methods 2.2.1. Ethics for the collection of these data were approved by Guy’s & St Thomas’ NHS Foundation Trust Ethics Committee (EC04/015—15-Mar-04) and written informed consent was obtained from all participants.

4.4.1.2 Targeted Bisulfite sequencing

Participants for our targeted bisulfite sequencing of select tRNA loci were drawn from two studies. Samples from participants aged 4 and 28 years are from the MAVIDOS [207] study, a randomised controlled trial of the effect of maternal vitamin D supplementation of the bone health of their offspring. Participants aged 63 and 78 years are from the Hertfordshire cohort study, which consists of individuals originally recruited in the 1980’s for a study of the impact of birth weight on heart attack risk because high quality records of their birth weights and other useful data had been recorded in the 1920s & 30s [356]. Due to a limited number of available samples, the two 4 year old pools contained DNA from 20 individuals each, with all other pools having 25 contributing individuals. Pool 1, the first 4 year old pool used DNA from all male samples, with all other pools using all female samples. Thus, the total number of participants was 190 (see Table 4.1). Samples from the 28 year old time point are all from pregnant women at ~11 weeks gestation.

Richard Acton selected the loci to target, designed the primers and selected the samples to go in each pool. Selected DNA samples were pooled by Nikki Graham at the institute of developmental sciences at the university of Southampton. With Fluidigm amplification and sequencing performed at Bart’s and the London genome centre, the genomics core facility at Queen Mary, University of London. The targeted BiS-sequencing data is available at: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA635108.

4.4.2 tRNA Gene coordinates and other annotation information

Genomic coordinates of all tRNA genes including some likely pseudogenes were downloaded from GtRNAdb [304]. The 2 tRNAs located in chr1_gl000192_random are tRNA-Gly-CCC-8-1 & tRNA-Asn-ATT-1-2 (The tRNA annotation used is provided in appendix 6.6). The 213 probes overlapping tRNA genes were derived from intersecting the tRNA gene annotation data from gtRNAdb with the Illumina 450k array manifest annotation for the hg19 genome build using bedtools v2.17.0 [231]. 107 tRNA genes were excluded from blacklisted regions of hg19 [357], that is to say regions of the genome which have “anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment.”

4.4.2.1 tRNA Gene Clustering

To explore the genomic spatial distribution of the tRNA genes, the tRNA loci were clustered by grouping together all tRNAs within 5Mb of one another using the bedtools merge tool v2.17.0 [231]. A command of the form: bedtools merge -c 4 -o collapse -d <N> -i hg19-tRNAs.bed was used, where <N> is the binsize. The binsize was varied and 5Mb was selected as it is at approximately this size that number of clusters with more than one tRNAs exceeds the number of singleton tRNAs (Figure ). The further requirements that these groupings contain at least 5 tRNA genes with a density of at least 5 tRNA genes per Mb were added for these groups to be considered clusters.

tRNA gene cluster numbers at different bin sizes total: total number of tRNA clusters. singletons: number of tRNAs in clusters alone. moreThan1: number of tRNAs in clusters with more than one tRNA. maxSize: the number of tRNAs in the largest cluster.

Figure 4.5: tRNA gene cluster numbers at different bin sizes total: total number of tRNA clusters. singletons: number of tRNAs in clusters alone. moreThan1: number of tRNAs in clusters with more than one tRNA. maxSize: the number of tRNAs in the largest cluster.

4.4.2.2 tRNA gene mappability assessment

To assess the mappability of tRNA gene regions a value for mappability score density was computed to facilitate comparisons of regions of the genome. Mappability score density is computed as the area under the encode mappability tracks [358] over the length of the region.

4.4.3 DNA methylome data

4.4.3.1 TwinsUK MeDIP-seq methylomes

The Methylated DNA Immunoprecipitation sequencing (MeDIP-seq) data was processed as previously described [209,107] and detailed in Methods 2.2. These processed data are available from the European Genome-phenome Archive (EGA) (https://www.ebi.ac.uk/ega) under study number EGAS00001001910 and dataset EGAD00010000983 and were generated by BGI Shenzen for TwinsUK. The dataset used in this work consists of 4350 whole blood methylomes with age data. This data consists of a matrix of RPM (reads per million base pairs) values for overlapping 500 base pair windows of the genome with a 250bp slide. This is approximately 12 million windows for 4350 samples, or ~52 billion data points.

4.4.3.2 Analysis of DNA methylome data for Significant Ageing-related changes

All analysis was performed in R/3.5.2. Linear models were fitted to age using the MeDIP-seq DNA methylome data, as quantile normalised RPM scores at each 500bp window. Quantile normalisation was performed with the qqnorm R function with the theoretical quantiles of the RPM values at each window used in subsequent analysis.

Models were fitted with:

  1. Uncorrected, simply modelling age from DNA methylation
  2. Batch information as a fixed effect
  3. Blood cell-type counts for neutrophils, monocytes, eosinophils, and lymphocytes as fixed effects
  4. Batch and Blood Cell counts as fixed effects
  • Models 1 & 2 were fitted on the full set of 4350 as batch information was available for all samples but blood cell count data was only available for a subset of 3001 methylomes.
  • Models 1 & 2 fitted in the n=3001 subset were similar to those fitted in the complete set of 4350.
  • Models 3 & 4 were fitted in the n=3001 subset with full covariate information and sets of significant tRNAs identified at study-wide and genome wide levels in model 4 were used in subsequent analyses.

Models were also fitted for two unrelated subsets created by selecting one twin from each pair (Monozygotic or Dizygotic), yielding sets with n = 1198 & 1206 DNA methylomes, which timepoint was used when more than one sample was available for an individual was selected arbitrarily. One additional model was fitted for longitudinal analysis, samples were selected by identifying individuals with a DNA methylome at more than one time point and filtering for only those with a minimum of 5 years between samples. This yielded 658 methylomes from 329 individuals with age differences of 5-16.1 years, median 7.6 years. Models for this set included participant identifier as a fixed effect in addition to blood cell counts and batch information.

4.4.3.3 Permutation Analysis for Enrichment with Age-related Changes

Permutation analysis was performed to determine whether the CpG distribution of sets of the tRNAome was the principle driver of the ageing-related changes observed. Windows overlapping tRNAs have a higher proportion of windows with a greater CpG density than their surrounding sequences (see Figure 4.6). CpGs residing within moderate CpG density loci are the most dynamic in the genome [86] and CpG dense CpG island regions include specific ageing-related changes [155,153,209]. For comparison the permutation was also performed in the CGI regions from the Polycomb group protein target promoters in Teschendorff et al. [155] and bivalent loci from ENCODE ChromHMM ‘Poised Promoter’ classification in the GM12878 cell-line [359]. A random set of 500bp windows representing an equivalent CpG density distribution of the feature set in question were selected from the genome-wide data. Above a certain CpG density there are insufficient windows to sample without replacement within a permutation. Furthermore, above ~\(\ge18\%\) CpG density CpG Islands become increasingly likely to hypomethylated [360]. Therefore, all windows with a CpG density of \(\ge18\%\) (45 CpGs per 500bp) were grouped and sampled from the same pool. i.e. a window overlapping a tRNA gene which had a 20% density could be represented in permutation by one with any density \(\ge18\%\). This permutation was performed 1,000 times to determine an Empirical p value by calculating the number of times the permutation result exceeded the observed number of significant windows in the feature set. \(Empirical~p-value = \frac{r+1}{N+1}\), where r is the sum of significantly hypermethylating windows in all permutations and N is number of permutations [361].

CpG Density is higher in windows directly overlapping tRNA genes compared to that of non-tRNA overlapping windows in arbitrary flanking sequences (+/-5kb). This difference in CpG density between tRNA loci and other regions of the genome is a potential source of bias if age related DNA methylation changes vary with CpG density which they may as baseline DNA methylation levels also vary with CpG density.

Figure 4.6: CpG Density is higher in windows directly overlapping tRNA genes compared to that of non-tRNA overlapping windows in arbitrary flanking sequences (+/-5kb). This difference in CpG density between tRNA loci and other regions of the genome is a potential source of bias if age related DNA methylation changes vary with CpG density which they may as baseline DNA methylation levels also vary with CpG density.

4.4.3.4 Neonate and Centenarian Whole Genome Bisulfite Sequencing

DNA methylation calls were downloaded from GEO: GSE31263 and intersected with tRNA genes using bedtools v2.17.0 [231].

4.4.3.5 Sample pooling and EPIC array

An Illumina Infinium DNA methylation EPIC array ((C) Illumina) and targeted bisulfite sequencing of select tRNA gene loci were performed. Here DNA was extracted from whole blood and combined into 8 pools from unrelated individuals at 4 time-points with 2 pools at each time-point. The individual were age matched at approximately 4, 28, 63, and 78 year timepoints (Table 4.1) Using the EPIC array DNAm age was estimated using the Horvath clock [168] and blood cell-type-composition with the Houseman method [272].

Table 4.1: Summary information on participants in each pool.
Pool Mean Age Sex Min Age Max Age n
Pool 1 4.07 Male 3.99 4.38 20
Pool 2 4.09 Female 3.99 4.36 20
Pool 3 28.07 Female 25.87 29.80 25
Pool 4 28.23 Female 26.05 30.01 25
Pool 5 63.40 Female 62.80 63.80 25
Pool 6 63.26 Female 62.70 63.70 25
Pool 7 77.96 Female 75.50 80.50 25
Pool 8 77.22 Female 74.40 80.10 25

4.4.3.6 Targeted Bisulfite Sequencing

tRNA loci were selected for targeted sequencing which exhibited changes and DNAm with age along with closely related tRNAs in which changes were not observed. Primer design was performed using ‘methPrimer’ [237] (Primer sequences are provided in appendix 6.7). A total of 84 tRNA loci were targeted in 2 rounds of sequencing, 79 subsequently generated reliable results post-QC. The targeted tRNAs covered a total of 723 CpGs with a median of 8 CpGs per tRNA (range 1-13), data passing QC was generated for 458 CpGs, median 6 (range 1-9) per tRNA.

Initial QC for readcount, base call quality and base composition was caried out with FastQC [229] and multiqc [362] for combined visualisation of QC outputs. Quality based trimming was performed with Trim Galore [363] and target specific primer trimming with cutadapt [364] and custom perl 5 scripts. Alignment and methylation calling was performed with Bismark (v0.20.0) [236] making use of bowtie2 [365].

The alignment was performed against both the whole hg19 genome and just the tRNAome +/- 100bp to assess the possible impact of off-target mapping. Mapping to the whole genome did produce purported methylation calls at a larger number of loci than mapping just to the tRNAome (683,783 vs 45,861 respectively). Introducing a minimum coverage threshold of 25 reads dramatically reduced this and brought the number of sites into line with that in the tRNAome set (36,065 vs 33,664 respectively) suggesting a small number of ambiguously mapping reads. All subsequent analysis was performed using the alignment to just the tRNAome with a minimum coverage of 25 reads.

Pairwise differential methylation analysis of the tRNA genes at the different time points was performed using RnBeads [366] with limma [367] and a minimum coverage of 25 reads. Linear regression was used to predict age from DNA methylation at the targeted tRNA sites, permitting us to compare rates of increase with age. For the linear regression, only CpG sites with more than 25 reads mapped to the regions of the genome targeted for amplification were used.

4.4.3.7 TwinsUK Illumina 450k array methylomes

Illumina Infinium DNA methylation 450k arrays ((C) Illumina) were also performed on TwinsUK participants, in 770 Blood-derived DNA samples which had matched MeDIP-seq data (Methods 2.1). These data were available for this analysis in a pre-processed form, Methylation ‘beta’ values subject to beta-mixture quantile normalisation (BMIQ) as previously described [209,107]. Cell-type correction was performed using cell-count data and the following model: lm(age ~ beta + eosinophils + lymphocytes + monocytes + neutrophils).

4.4.4 Chromatin Segmentation Data

Epilogos chromatin segmentation data [368,359] was downloaded for the tRNA gene regions +/- 200bp from https://explore.altius.org/tabix/epilogos/hg19.15.Blood_T-cell.KL.gz using the tabix utility. The data used was the ‘Blood & T-cell’ 15 State model based on segmentation of 14 cell-types. This data was manipulated and visualised with R and ggplot2. tRNAs were assigned a predominant chromatin state base on the state with the highest score over that tRNA gene.

4.4.5 Isolated Blood Cell Type Specific Data

Data from 7 cell-type fractions from 6 Male individuals was downloaded from GSE35069 [369] using GEOquery [370]. Five of the 6 top age hypermethylating tRNAs are covered by this array dataset.

4.4.6 Cancer and Tissue Specific Methylation Data

Data was downloaded from the TCGA (The Cancer Genome Atlas) via the GDC (genomic data commons) data portal [371] using the GenomicDataCommons R package. Data from foetal tissue [372,373] was downloaded from GEO (GSE72867, GSE30654). From the TCGA, samples were selected for which DNAm data was available from both the primary site and normal solid tissue, and for which an approximate age could be inferred (within one year). Selecting those probes overlapping tRNA genes yielded 73,403 data points across 19 tissues with an age range of 15-90yrs (median 63.4) (A complete list of TCGA sample used is provided in appendix 6.11)

4.4.7 Assaying tRNA expression in blood with MINTmap

Small RNA-seq data from sorted blood cell fractions [374] was used. (GSE100467) and the MINTmap [337] tRNA fragment alignment tool. This dataset covered 42 individuals aged 21-63. A customised MINTmap reference designed to include only fragments which unambiguously map to a single tRNA gene locus and which overlap the 5’ or 3’ end of the genomic tRNA sequence by at least one base with no mismatches was produced. This reference is intended to capture pre-tRNAs prior to processing and CCA addition operating under the assumption that the levels of pre-tRNAs will be informative about the amount of transcription taking place at the tRNA loci. This approach provides at most a many to one mapping of tRNA fragments to a tRNA genes.

Assaying the expression of tRNA genes presents numerous difficulties [331], and usually requires variants on standard RNA-seq protocols. This custom MINTmap reference build yielded 383 fragments mapping to 92 distinct tRNA loci in this data. To control quality, only fragments with more than 20 total instances in the dataset, and present in more than 20 individuals were considered.

The maximum length of a fragment was limited to 50nt, due to the read length of the small RNA-seq data.

4.4.8 Mouse RRBS Analysis

Methylation calls and coverage information resulting from RRBS performed by Petkovich et al. [375] were downloaded from GEO using GEOquery [370] GSE80672. These data from 152 mice covered 68 tRNA and 436 CpGs after QC requiring >50 reads per CpG and >10 data points per tRNA. 5 tRNAs were excluded for being located within blacklisted regions of mm10 [357]. After QC there were 58 tRNA genes and 385 CpGs. Simple linear modelling to predict age (in months) from methylation level at each tRNA and each CpG were performed in R.

4.5 Results

4.5.1 DNA Methylation of Specific tRNA Gene Loci Changes with Age

Due to tRNAs critical role in translation and evidence of their modulation in ageing and longevity-related pathways, these genes were interrogated for evidence of ageing-related DNA methylation changes. The discovery set was a large-scale peripheral blood-derived DNA methylome dataset comprising of 4,350 samples (see Figure 4.7).

Study Structure tRNAs differentially methylated with age initially identified in MeDIP-seq, validated (where covered in 450k array) and replicated in targeted bisulfite sequencing of pooled samples. Tissue specificity of these effects was explored in TCGA and foetal tissue data.

Figure 4.7: Study Structure tRNAs differentially methylated with age initially identified in MeDIP-seq, validated (where covered in 450k array) and replicated in targeted bisulfite sequencing of pooled samples. Tissue specificity of these effects was explored in TCGA and foetal tissue data.

This sequencing-based dataset had been generated by Methylated DNA Immunoprecipitation (MeDIP-seq) [127], which relies on the enrichment of methylated fragments of 200-500 bp to give a regional DNAm assessment (500 bp semi-overlapping windows, see Methods 4.4.3.1). In total the extended human tRNAome is comprised of 610 tRNAs and closely related sequences (gtRNAdb)(see Figure 4.8), though only 492 are autosomal and do not reside in blacklisted regions of the genome [357].

The genetic code as instantiated in the human tRNAome. The triplet genetic code leads to the incorporation of specific amino acids into an elongating protein via corresponding tRNAs. n is the number of tRNA genes which encode a given amino acid, the number in parentheses is how many of those may be pseudogenes based on their tRNAscan score [376], and the number in square brackets is the number in blacklisted regions [357]. There are a total of 610 tRNAs and closely related sequences in GtRNAdb [304], 416 of which are high confidence tRNAs, 116 of which are potential pseudogenes, and 107 are in blacklisted regions [357]. Notably 7 of the 61 non-STOP codons are missing from the human tRNAome therefore these codons are handled by wobble base matching (e.g. GCG Arg, ACC Gly). Also of note are the suppressor and selenocysteine tRNAs. The 20 methionine tRNAs are split equally between initiator methionine and internally incorporating methionine tRNAs, which are structurally distinct [299]. There are also 23 nuclear encoded mitochondrial tRNAs.

Figure 4.8: The genetic code as instantiated in the human tRNAome. The triplet genetic code leads to the incorporation of specific amino acids into an elongating protein via corresponding tRNAs. n is the number of tRNA genes which encode a given amino acid, the number in parentheses is how many of those may be pseudogenes based on their tRNAscan score [376], and the number in square brackets is the number in blacklisted regions [357]. There are a total of 610 tRNAs and closely related sequences in GtRNAdb [304], 416 of which are high confidence tRNAs, 116 of which are potential pseudogenes, and 107 are in blacklisted regions [357]. Notably 7 of the 61 non-STOP codons are missing from the human tRNAome therefore these codons are handled by wobble base matching (e.g. GCG Arg, ACC Gly). Also of note are the suppressor and selenocysteine tRNAs. The 20 methionine tRNAs are split equally between initiator methionine and internally incorporating methionine tRNAs, which are structurally distinct [299]. There are also 23 nuclear encoded mitochondrial tRNAs.

Due to the small size of these tRNAs (60-86bp, median 73bp, excluding introns which are present in ~30 tRNAs with sizes from 10-99bp, median 19bp), this fragment-based method enabled a robust examination of the epigenetic state of these highly similar sequences. This was supported by a mappability assessment. The median mappability score density for the tRNAome was 0.90 for 50mers when considering tRNA genes \(\pm500bp\) reflecting the regional nature of the MeDIP-seq assay Methods 4.4.2.2. In contrast the 50mer mappability density is 0.68 for the tRNA gene sequences alone without flanking sequences. Excluding the flanking region is representative of the mappability of reads generated using a technique such as whole-genome bisulfite sequencing. This is because there is no IP fragment so reads mapping to adjacent more mappable sequences do not convey information about the methylation state of sites in the same fragment but to which it is harder to map (see Figures 4.9 & 4.10).

Example of mappability data from the encode mappability tracks [358] for the initiator methionine tRNA genes.

Figure 4.9: Example of mappability data from the encode mappability tracks [358] for the initiator methionine tRNA genes.

Mappability score density of the tRNAome increases with read length and is greater when flanking regions (\(\pm500bp\)) are included. Mappability score density is computed as the area under the encode mappability tracks [358] over the length of the region.

Figure 4.10: Mappability score density of the tRNAome increases with read length and is greater when flanking regions (\(\pm500bp\)) are included. Mappability score density is computed as the area under the encode mappability tracks [358] over the length of the region.

21 genome-wide significant and 44 study-wide significant results were identified via linear regression, (p < \(4.34\times10^{-9}\) and \(8.36\times10^{-5}\), respectively; see Methods 4.4.3.2, batch corrected n=4350). Study-wide significance was calculated conservatively using the Bonferroni method for all 598 autosomal tRNAs. Two distinct significance thresholds are used here, the study-wide correcting for the number of tRNA loci and the genome-wide correcting for the total number of loci in the genome. This is intended to permit the framing of statistical questions about the tRNA genes relative to other tRNAs genes (study-wide) or relative to the rest of the genome (genome-wide). There was a strong directional trend with all results at both significance levels being due to increases in DNA methylation. Age-related changes in cell type proportion are strong in heterogeneous peripheral blood, and include a myeloid skew, loss of naive T cells and increases in senescent cells [377]. A subset of 3 genome-wide and 16 study-wide significant hypermethylation results remained significant even after correcting for potential cell-type changes by including lymphocytes, monocytes, neutrophils and eosinophil cell count data (n=3001, Listed in Table 4.2, Red in Figure 4.11). The 3 genome wide significant tRNAs are: tRNA-iMet-CAT-1-4, tRNA-Ile-AAT-4-1, and tRNA-Ser-AGA-2-6. tRNA-iMet-CAT-1-4 is located on chromosome 6. tRNA-Ile-AAT-4-1 and tRNA-Ser-AGA-2-6 are neighbours and are located on chromosome 17 within the 3’ UTR of CTC1 (CST Telomere Replication Complex Component 1). Going forward these most robustly corrected sets of 3 and 16 tRNA genes are referred to as the genome-wide and study-wide significant tRNA genes respectively.

Table 4.2: Study-wide significantly hypermethylating tRNAs in blood cell-type and batch corrected model MeDIP-seq. With Corresponding results in Twins UK 450k array (blood cell-type corrected) and targeted bisulfite sequencing results. Age models for array and targeted BiS-seq were calculated using all probes / CpGs overlapping the indicated tRNA gene. ‘Slope’ corresponds to the beta value for methylation in the linear model, orange colouring indicates hypermethylation and blue hypomethylation. p-values coloured such that low values are dark blue and high values are yellow. blank grey cells indicate missing data.
MeDIP
450k Array
Targeted BiS-seq
tRNA Window Slope p-value Slope p-value Slope p-value
tRNA-Gln-CTG-7-1 Chr1:147,800,750-147,801,250 0.84 2.60e-05 NA NA NA NA
tRNA-Glu-TTC-1-1 Chr2:131,094,500-131,095,000 1.11 4.64e-09 NA NA NA NA
Chr2:131,094,250-131,094,750 1.00 1.12e-07 NA NA NA NA
Chr2:131,094,750-131,095,250 1.00 3.28e-07 NA NA NA NA
tRNA-His-GTG-1-2 Chr1:146,544,500-146,545,000 0.92 1.38e-06 NA NA NA NA
tRNA-His-GTG-2-1 Chr1:149,155,750-149,156,250 1.05 2.98e-08 NA NA NA NA
Chr1:149,155,500-149,156,000 0.83 1.37e-05 NA NA NA NA
tRNA-Ile-AAT-10-1 Chr6: 27,251,500- 27,252,000 1.07 1.45e-08 NA NA 1.30 1.22e-03
Chr6: 27,251,750- 27,252,250 0.90 1.86e-06 NA NA 1.30 1.22e-03
tRNA-Ile-AAT-4-1 Chr17: 8,130,000- 8,130,500 1.19 2.98e-10 19.63 8.92e-06 -0.74 6.88e-04
Chr17: 8,130,250- 8,130,750 0.77 3.99e-05 19.63 8.92e-06 -0.74 6.88e-04
tRNA-Ile-TAT-2-2 Chr6: 26,987,750- 26,988,250 0.97 7.25e-07 4.16 1.17e-02 -0.60 3.84e-01
tRNA-iMet-CAT-1-4 Chr6: 26,330,500- 26,331,000 1.28 2.83e-11 13.01 6.07e-06 4.54 9.35e-04
Chr6: 26,330,250- 26,330,750 1.13 2.89e-09 13.01 6.07e-06 4.54 9.35e-04
tRNA-Leu-TAG-2-1 Chr14: 21,093,250- 21,093,750 1.04 9.38e-08 NA NA 2.49 8.77e-03
Chr14: 21,093,500- 21,094,000 0.94 8.50e-07 NA NA 2.49 8.77e-03
tRNA-Pro-AGG-2-2 Chr6: 26,555,500- 26,556,000 1.04 3.97e-08 NA NA NA NA
Chr6: 26,555,250- 26,555,750 1.01 9.58e-08 NA NA NA NA
tRNA-Ser-ACT-1-1 Chr6: 27,261,250- 27,261,750 0.97 3.53e-07 NA NA 0.66 1.45e-01
tRNA-Ser-AGA-2-6 Chr17: 8,129,750- 8,130,250 1.21 1.16e-10 20.87 6.72e-05 0.62 4.28e-02
Chr17: 8,130,000- 8,130,500 1.19 3.03e-10 20.87 6.72e-05 0.62 4.28e-02
tRNA-Ser-TGA-2-1 Chr6: 27,513,000- 27,513,500 0.90 3.58e-06 87.21 1.38e-04 -0.25 5.74e-01
tRNA-Val-AAC-1-2 Chr5:180,590,750-180,591,250 0.91 3.28e-06 NA NA NA NA
tRNA-Val-AAC-4-1 Chr6: 27,648,500- 27,649,000 1.07 1.25e-08 40.06 9.90e-03 NA NA
Chr6: 27,648,750- 27,649,250 0.95 4.31e-07 40.06 9.90e-03 NA NA
tRNA-Val-CAC-2-1 Chr6: 27,247,750- 27,248,250 0.85 2.33e-05 59.16 5.05e-06 NA NA
Human tRNAome overview. From the outside in: Chromosome ideograms scaled by the number of tRNA genes (total = 598), as excludes chromosome X (10), Y (0) and contig chr1_gl000192_random (2; see Methods 4.4.2). tRNA genes within 20kbp of one another are grouped with breaks inserted between these clusters. Radial grey lines represent the location of tRNA genes in the genome. \(-log_{10}(p-value)\) for the blood cell-type and batch corrected age model are shown for each window overlapping a tRNA gene in green. Mean methylation across all samples (n=3001) in RPM (reads per million base pairs) is shown in blue. Genome-wide significant cell-type & batch corrected (\(p < 4.34\times10^{-9}\)) tRNAs show in red. The 158 Loci covered by 213 probes on the 450k array which directly overlap a tRNA gene are shown with green triangles. The 84 loci targeted for bisulfite sequencing in this study are indicated in magenta. Mappability score density is computed as the area under the encode mappability tracks [358] over the length of the region.

Figure 4.11: Human tRNAome overview. From the outside in: Chromosome ideograms scaled by the number of tRNA genes (total = 598), as excludes chromosome X (10), Y (0) and contig chr1_gl000192_random (2; see Methods 4.4.2). tRNA genes within 20kbp of one another are grouped with breaks inserted between these clusters. Radial grey lines represent the location of tRNA genes in the genome. \(-log_{10}(p-value)\) for the blood cell-type and batch corrected age model are shown for each window overlapping a tRNA gene in green. Mean methylation across all samples (n=3001) in RPM (reads per million base pairs) is shown in blue. Genome-wide significant cell-type & batch corrected (\(p < 4.34\times10^{-9}\)) tRNAs show in red. The 158 Loci covered by 213 probes on the 450k array which directly overlap a tRNA gene are shown with green triangles. The 84 loci targeted for bisulfite sequencing in this study are indicated in magenta. Mappability score density is computed as the area under the encode mappability tracks [358] over the length of the region.

Due to the related nature of these twin samples, these data were also analysed in two subsets of n=1198 & 1206 by selecting one twin from each pair into the separate sets. This analysis also included correction for Batch and Blood Cell counts. Whilst in these smaller datasets no tRNAs were genome-wide significant, 5 and 7 tRNA genes, respectively, reached study-wide significance, with consistent hypermethylation. In these sets 5/5 and 6/7 of these were present in 16 study-wide significant tRNA genes.

Furthermore, a subset of samples with longitudinal data were examined (n=658 methylomes from 329 individuals, median age difference 7.6 yrs). At the nominal significance threshold (p < 0.05) this yielded a split of 41 hypermethylating tRNA genes and 22 hypomethylating tRNA genes. Of these hypermethylated tRNAs, 2 are in the previously identified genome-wide significant set of 3 (with tRNA-iMet-CAT-1-4 ranked 3rd by p-value) and 9 are in the study-wide significant set of 16. A full table with model results for all fitted models and all tRNA genes is provided in appendix 6.8.

4.5.3 tRNA Gene DNA Methylation in Other Tissues

Some tRNA gene expression has been shown to be highly tissue specific [378,325,326]. It follows that our observations of changes in DNAm with age in blood might be specific to that tissue. A mix of 450k and 27k array data from ‘solid tissue normal’ samples made available by TCGA (The Cancer Genome Atlas) and data from foetal tissue [372,373] were downloaded from GEO for use in this analysis (see Methods 4.4.6). The samples from TCGA range in age from 15-90 (n = 733). Only 43 tRNA genes had adequate data to compare across tissues in this dataset and 115 in the foetal tissue data.

4.5.3.1 tRNA Genes also Hypermethylate with Age in Solid Tissue

Only 2 of the 3 tRNA genes identified as genome-wide significant and a further 1 of the study-wide significant tRNA genes are present in the set of 45 tRNA genes in the TCGA data, thus limiting our ability to draw conclusions about the tissue specificity of these results. Solid tissue samples have a strong preponderance for low levels of methylation consistent with the active transcription of many tRNA genes and show slight increasing methylation with age but age accounts for very little of the variance (linear regression slope estimate = \(1.52\); \(R^2= 0.0002\); p-value \(1.34\times10^{-3}\) (Figure 4.24d). In a pan-tissue analysis it was found that 10 tRNA genes showed changes in DNAm with age, 9 of which were hypermethylation (p-value \(<1.1\times10^{-3}\)). One of these tRNA genes, tRNA-Ser-TGA-2-1 was also present in study-wide significant set of tRNA genes. Figures 4.22 & 4.23 illustrate minimal tissue specific differences. Interestingly, however, tRNA-iMet-CAT-1-4 and tRNA-Ser-AGA-2-6 appeared more variable in methylation state than many other tRNAs in the TCGA normal tissue samples (Figure 4.22) and indeed have the highest variance in DNA methylation across tissues (Figure 4.24c).

Mean Methylation of 43 tRNAs in 19 tissues. Possible pseudogene (tRNA-Asn-ATT-1-1) is shown in a separate cluster beneath the main heatmap [379].

Figure 4.22: Mean Methylation of 43 tRNAs in 19 tissues. Possible pseudogene (tRNA-Asn-ATT-1-1) is shown in a separate cluster beneath the main heatmap [379].

Mean Methylation of 115 tRNAs in 11 tissues. Possible pseudogenes are shown in a separate cluster beneath the main heatmap [379].

Figure 4.23: Mean Methylation of 115 tRNAs in 11 tissues. Possible pseudogenes are shown in a separate cluster beneath the main heatmap [379].

4.5.3.2 tRNA Gene Methylation in Cancer

A coarse grained examination of the tRNA DNA methylome in tumour samples with matched normal tissue data was performed, considering the 45 tRNA genes and 19 tissues for which data were available. Across multiple tissues it was found that both tumour and ‘normal’ samples have a strong preponderance for low levels of methylation consistent with the active transcription of many tRNA genes.

Tumour samples have a higher mean methylation, \(0.27\%\) \((95\% ci~ 2.28\times10^{-03}-\infty)\) greater than that of normal samples (Wilcoxon signed-rank test p-value = \(9.1\times10^{-20}\)), and a greater variance (Levene test p-value \(< 1\times10^{-3}\)) (Figure b). Interestingly visual inspection of methylation values in normal tissues (Figure b & d) suggests a slight bimodality peaking around 50% methylation that is not present in the tumour samples. These differences are small in absolute terms and have very wide error margins but are consistent with a picture of dysregulation in cancer cells [319].

Both Primary Tumour and Solid Tissue Normal samples show slight increasing methylation with age but age accounts for very little of the variance (linear regression slope estimates = \(1.65\), \(1.52\); \(R^2= 0.0006\), \(0.0002\); p-values \(1.20\times10^{-6}\), \(1.34\times10^{-3}\) respectively)(Figure d).

Consistent with Figure the tRNAs tRNA-iMet-CAT-1-4 and tRNA-Ser-AGA-2-6 which have the highest ranked variance in normal tissue and also rank highly in tumour samples (Figure c).

Global properties of tRNA methylation data for 45 tRNA genes across 19 tissues with matched normal and tumour samples from 733 cases in TCGA [372,373].

Figure 4.24: Global properties of tRNA methylation data for 45 tRNA genes across 19 tissues with matched normal and tumour samples from 733 cases in TCGA [372,373].

4.5.4 Expression of tRNAs in Blood with Age

Having observed specific tRNA gene isodecoders hypermethylating with age it followed that the expression of tsRNA in blood cell-types would be worth exploring to ascertain if the expression levels of such transcripts declined as DNA methylation levels of the loci which encode them increase. A bioinformatic approach was devised to attempt to assay tRNA transcription in order to use standard publicly available small RNA-seq datasets. A customised MINTmap [337] reference was designed to include only fragments which unambiguously map to a single tRNA gene locus and which overlap the 5’ or 3’ end of the genomic tRNA sequence by at least one base with no mismatches. This reference is intended to capture pre-tRNAs prior to processing and CCA addition operating under the assumption that the levels of pre-tRNAs will be informative about the amount of transcription taking place at the tRNA loci (see Methods 4.4.7). This custom MINTmap reference build yielded 383 fragments mapping to 92 distinct tRNA loci in this data. The lack of coverage of age hypermethylating tRNAs by uniquely attributable RNA-seq reads prevented us from drawing any strong conclusions about the relationship between DNAm changes and changes in tRNA transcription.

Using the original MINTmap reference optimised to detect tRNA fragments derived from mature tRNAs there were 5384 unique fragments derived from as many as 417 tRNA loci. However, the mapping between fragments and loci in this reference is many to many, with each tRNA gene able to give rise to many fragments and each fragment attributable to at least 1 and usually many tRNA genes. The examination of these fragments was limited to those with a length of greater than or equal to 40nt to capture reads more likely to be derived from mature tRNAs rather than tRFs or tRNA halves (Figure 4.26). 48 tsRNAs with nominally significant expression changes (p < 0.05) were identified, 8 increased and 40 decreased in abundance with age. For example 5 of 6 fragments showing significant age-related expression changes derived from the Gln-CTG family of tRNAs are decreasing with age (Figure 4.25). This is suggestive that expression of some tRNA genes may decline with age but this possibility is in need of additional tRNA expression data before it can be asserted with confidence.

tRNA fragments derived from the Gln-CTG family of tRNAs, selected as tRNA-Gln-CTG-7-1 is one of the 16 study-wide significant age-hypermethylating tRNA genes. Pane titles contain the MINTbase Plates, unique identifiers of the tRNA fragments [329].

Figure 4.25: tRNA fragments derived from the Gln-CTG family of tRNAs, selected as tRNA-Gln-CTG-7-1 is one of the 16 study-wide significant age-hypermethylating tRNA genes. Pane titles contain the MINTbase Plates, unique identifiers of the tRNA fragments [329].

4.5.4.1 MINTmap reference Fragment distribution

In the original MINTmap reference (Figure 4.26b) there are peaks at around 18nt, 22nt and 32nt. This is consistent with the expected tRNA fragment size distributions with ‘tRNA halves’ at 30-33nt and other tRFs at 18nt and 22nt. In this custom reference (Figure 4.26a) whilst there is still a peak at ~18nt, with suggestions of peaks near 22nt and 32nt the tRNA fragment length distribution is somewhat different from that of the standard MINTmap reference. There are larger peaks at ~28 and ~40nt consistent with the longer fragments expected given that this reference aimed to target fragments derived from pre-tRNAs not tRFs derived from mature tRNAs.

Comparison of the fragment size distributions between our custom reference A) and the original the MINTmap reference B).

Figure 4.26: Comparison of the fragment size distributions between our custom reference A) and the original the MINTmap reference B).

4.6 Discussion

This work has identified a previously unknown enrichment for age-related epigenetic changes within the tRNA genes of the human genome. This observation was strongly directional with increasing DNA methylation with age [381]. The MeDIP-seq dataset employed brought advantages in exploring this undefined terrain of the tRNAome. Firstly, being genome-wide it provides much increased access, as these regions are poorly covered by current arrays. Secondly, being a fragment-based regional assessment of DNA methylation, the individual but highly similar small tRNA genes can be surrounded by unique sequence.

It was determined by genome-wide permutation that this strong hypermethylation signal was specific to the tRNAome, and not merely driven by the underlying CpG density of these loci. A targeted BiS-seq experiment validated the defined nature of the tRNA change in an independent dataset, with a pooled sample approach, which may also be useful for other ageing-related targeted DNA methylome evaluations. Additionally, support was gained for these results from limited DNA methylation array data.

Whilst the changes in DNA methylation observed here are relatively small (i.e. 3.7% between 4-year to 78-year pools in tRNA-iMet-CAT-1-4), this is consistent with the effect size seen in the majority of positions differentially methylated with age in many other studies [120,382], except for the noted extreme outliers, such as EVOLV2 [383]. Furthermore, effect size cut-offs have distinct limitations, they do not generalise well between studies and they have been observed to remove a large fraction of true age-DMPs [384].

Additional exploration of what was driving this age-related phenomenon and its possible biological implications was undertaken. This result was observed in DNA from peripheral blood, a source with a heterogeneous cell-type population [225]. Moreover, there are well known age-related proportional changes in peripheral blood cell composition [377]. The TwinsUK MeDIP-seq and 450k array DNA methylation data included measured haematological values. Therefore, major cell type effects such as a myeloid skew were adjusted for, and distinct tRNAs were still significant. Although, a caveat to this is that one cannot exclude changes in minor specific sub-cell fraction types. However, that these age-related effects were strong enough to be observed in both a regional MeDIP-seq assessment, DNA methylation arrays with minimal coverage of the tRNA genes, and a pooled sequencing approach, implies that they not extremely subtle. Age-related tRNA gene DNA methylation changes were observed in the limited subset of mouse tRNA genes covered in publicly available RRBS data (~13%) and were able to identify tRNAs exhibiting DNA hypermethylation with age in this set. This suggests that age-related tRNA gene hypermethylation may not be unique to humans, and may extend at least across mammals.

The high number of hypermethylating tRNAs prior to cell-type correction pointed to the possibility that the epigenetic state of this small tRNAome fraction of genome could capture cell-type. tRNA gene DNA methylation was found to be capable of separating myeloid from lymphoid lineages. There also was some suggestion of more fine-grained blood cell-type signatures in tRNA DNAm, such as the separation of CD19+ B cells from CD4/8+ T cells. Ageing is also known to lead to an increase in senescent cells (e.g. CD8+ CD28- cells). Whether these epigenetic changes in the tRNAome uniquely represent these cell-types will require technical advances to enable future single cell DNA methylome analysis to accurately assess these regions. If further supported, the epigenetic state of these loci may aid the taxonomy of cell-type definition.

This signal within the tRNA families was observed to occur at specific Isodecoders. After correcting for major cell types, two tRNA genes, tRNA-iMet-CAT-1-4 and tRNA-Ser-AGA-2-6, were identified which had the most consistent hypermethylation across 3 different assays. Isodecoders expand in number with organismal complexity and the high prevalence in mammals has been suggested to be due to their additional regulatory functionality [385,334]. They also have distinct translational efficiency [386], which can also have consequences in human disease [327]. Furthermore, there is great complexity to the fragmentation of tRNA [340], with physiological processes such as stress shown to induce fragment production [387]. These resultant tsRNAs can feedback on protein synthesis by regulating ribosome biogenesis [338] and others have diverse regulatory functions such as targeting transposable element transcripts [339]. They are also observed to circulate in the blood in a cell-free fashion, and fragment levels can be modulated by ageing and calorie restriction [346]. The isodecoder specific nature of our findings frame a possible hypothesis for regulatory change with age and future work will be required to unravel this potential.

There was an inconsistent result for tRNA-Ile-AAT-4-1, it is covered by a MeDIP-seq 500bp window which exhibited genome-wide significant hypermethylation, but also partially overlaps tRNA-Ser-AGA-2-6 (Figure 3B). Whilst the 450k array probe overlapping tRNA-Ile-AAT-4-1 (cg06382303) appears to replicate this hypermethylation, it is a borderline case for exclusion flagged by Zhou et al. [223] due to non-unique 30bp 3’ subsequence. In the targeted Bisulfite sequencing data, tRNA-Ile-AAT-4-1 exhibited a loss of methylation. Therefore, this may suggest that the hypermethylation signal observed at this locus in the MeDIP-seq data could have been ‘pulled up’ by the neighbouring tRNA-Ser-AGA-2-6 hypermethylation signal. Of the 16 study-wide significant tRNA genes, only these two have a shared significant window, furthermore, in the expanded set of 44 only tRNA-Thr-AGT-1-2 could be similarly affected.

Codon usage frequencies have been claimed to be mostly invariant in the transcriptomes of a wide range of tissues, as well as across developmental time [378], Although, others have found the majority of isodecoders are transcribed in different cell types [303].

However, differences in the codon usage of genes with highly tissue-specific expression patterns have been observed [388]. Transcriptome codon usage frequency and corresponding tRNA gene expression levels have been claimed to vary with several other factors; notably, the replicative state of cells, as well as across developmental time [389]. Experimental stress-related states have revealed changes with an over-representation of codons that are translated by rare tRNAs [323]. Differences in the codon usage of genes with highly tissue-specifically expressed genes have been observed [390]. It has however been argued that these differences are substantially explained by variation in GC content [391], and the extent to which codon usage plays a role in regulation of translation remains contested. Changes in the epigenetic state of specific tRNAs could be modulating transcription efficiency or even codon availability in the ageing cell. tRNA gene dosage is quite closely matched to amino acid usage frequency in the human exome.

The location of tRNA-Ser-AGA-2-6 and tRNA-Ile-AAT-4-1 immediately downstream of CTC1 and of tRNA-Ile-AAT-4-1 within the 3’UTR of an alternate isoform of CTC1, which undergoes nonsense mediated decay raises the possibility that the gene body epigenetic regulation of CTC1 may impact on the state of these tRNA genes. CTC1’s function in telomere maintenance [392], DNA replication licensing [393], and it’s role in a rare progeroid condition [394] indicate that it has ageing-relevant biology. The possible relationship between the regulation of CTC1 and that of the tRNA genes downstream of it warrants further study.

Several transcription factors acting via TFIIIB [317] have a negative (the tumour suppressors p53 [315] and Rb [316]) or positive (the proto-oncogene c-Myc) influence [317]. Regulatory sequence in the flanking or the internal regions of tRNA genes do not explain tRNA expression variation [389]. Whilst DNAm is able to repress the expression of tRNA genes [320] in a plasmid expression system, the broader chromatin environment also affects tRNA transcription. Due to the co-ordinated nature of epigenomic modifications, it may also be revealing to evaluate ageing-related histone modification in these tRNA loci.

tRNA sequences themselves are under strong structural (both secondary and tertiary) [385] as well as functional constraint, which leads to an order of magnitude reduction in variation compared the background genomic mutation rate [303]. However, polymorphic tRNA could be another potential caveat to this work. Although, there is no significant population variation in, for example, tRNA iMet sequences in 1,000 Genomes data. Indeed, there are only 11 new isodecoder sequences with high confidence (tRNAscan scores \(\ge50\)) at >1% population frequency [303]. Despite strong purifying selection maintaining very low variation in tRNA gene bodies, tRNA genes are subject to high levels of transcription-associated mutagenesis (TAM) leading to elevated mutation rates over evolutionary time in their immediate flanking sequences [395]. There is also some evidence for tRNA copy number variation at specific loci, although this remains under-characterised [396,397]. Another potential cause considered was whether age-related somatic copy number increases could be occurring in these loci. Population or somatic copy number expansions could lead to increased methylated reads in MeDIP-seq without any epigenetic state change. However, this would not be consistent with the targeted and array BiS conversion methodologies, where the proportion of methylated to unmethylated reads would still be constant.

It is worth noting the parallels with known cancer and ageing epigenetic changes, and that tRNAs are also dysregulated in cancer [319], with proposed utility as prognostic markers [318]. Furthermore, the early replicating state of tRNA loci, potentially associated with high expression [398], may make them prone to hypermethylate, as is observed in early replicating loci in both cancer [399] and senescent cells [400]. Interestingly, tRNA gene loci may also play a role in local as well as large scale genome organisation [Hamdani2019; [354]]. tRNA gene clusters act as insulators [353] and have extensive long-range chromatin interactions with other tRNA gene loci [354]. The coordinated transcription of tRNAs at subnuclear foci and the B-box sequence elements bound by TFIIIC and not polIII may represent an organising principle for 3D-chromatin by providing spatial constraints [355]. Therefore, these tRNA epigenetic changes could contribute to the structural changes that are also observed in ageing [401].

A predominantly unmethylated state across fetal (Figure 4.23) and adult tissues (Figure 4.22) was observed at tRNA gene loci, consistent with the high rate of transcription at many tRNA gene loci. We suspect that the tRNA genes largely remain unmethylated through development and that the moderate increases in DNAm that are observed with age at these loci are being driven by changes arising primarily in older individuals. Distinct biological changes have been observed recently in aged individuals [402,403]. This would also be consistent with the lack of significant differences in the tRNA loci detected between the neonate and the 26-year-old adult in the Heyn et al. [161] data. This low baseline DNA methylation of the tRNA genes may also explain why the observed age related changes are predominantly hypermethylation. Whether this is driven by mechanisms, such as aberrant DNA methylation targeting of the tRNA loci or specific sub-celltype effects with age, will require further experimental investigation.

The attempt to estimate tRNA transcription by identifying fragments which may be derived from pre-tRNA sequences has serious limitations. It assumes that pre-tRNA levels reflect the amount of tRNA transcription, however, this may not be the case as pre-tRNAs are rapidly processed to mature tRNAs. The requirement for a read length of at least 40nt only has only a limited ability to distinguish full length tRNAs from tRNA derived small-RNAs, and the short read length of the data prevented the use of a higher threshold. Furthermore the limited number of reads unambiguously mapping to a specific tRNA locus limits the utility of this approach for inferring the effects of epigenetic changes at a specific locus on the expression at that locus. tsRNA abundance has been associated with locus specific tRNA gene expression, in some cases independent of mature tRNA levels [331]. This has important implications for the interpretation of these results given the multi-copy nature of genes like tRNA-iMet-CAT-1-4, as even if expression levels of mature iMet tRNAs are unaffected by changes in one copy’s DNA methylation, these changes could still influence the levels of particular tsRNAs derived from specific tRNA loci. The implications of these changes in DNAm levels at tRNA genes for biological ageing warrant further exploration.

In conclusion, due to the unique challenges that make the tRNAome difficult to examine it has remained epigenetically under-characterised despite its critical importance for cell function. This work directly interrogates the epigenetic DNA methylation state of the functionally important tRNAome, across the age spectrum in a range of datasets as well as methodologies and identified an enrichment for age-related DNA hypermethylation in the human tRNA genes.