3 Epigenome Wide Association Studies for Bone Health Outcomes in Umbilical Cord Blood and Tissue

3.1 Abstract

Long term bone health and fracture risk is strongly influenced by the peak bone mass that is attained in early adulthood, this has its origins in development from late pregnancy through early childhood. The availability of vitamin D to facilitate the uptake of calcium and phosphate is essential to healthy bone development, in addition to the role of vitamin D as a hormone mediating this process. Greater mechanistic understanding of this process is desirable to permit the design of more effective interventions to reduce the substantial health burden of osteoporotic fractures in the elderly. The risk of osteoporosis increases with age, and may be mitigated by improvements in bone health. Maternal vitamin D levels during pregnancy have previously been shown to impact on the methylation status of specific genes associated with vitamin D signalling and metabolism, such as RXRA. Long-term bone health outcomes have also been associated with both maternal vitamin D during pregnancy and methylation of the CDKN2A gene. To attempt to identify other loci, the epigenetic regulation of which is pertinent to the impact of maternal vitamin D levels on long-term bone health, Epigenome-Wide Association Studies (EWAS) were performed with DNA methylation data from the Illumina EPIC & 450k methylation bead chip arrays. We provisionally identify two CpGs whose DNA methylation state is associated with bone mineral content at 6 years of age (\(p < 2.52\times 10^{-8}\), n = 402) and periosteal circumference at 6 years of age (\(p < 4.24\times 10^{-8}\), n = 141) respectively. These associations are in need of replication in an independent cohort but present a starting point for further investigations.

3.2 Introduction

In England and Wales a survey of general practice data indicates that 53% of women and 21% of men will experience a bone fracture in their remaining lifetime from the age of 50 years [238] (Figure 3.1). Hip fractures in particular are associated with a reduction in survival of an estimated 10-20% with most deaths occurring in the first 6 months, vertebral fractures though approximately an order of magnitude less common carry a similar mortality risk [238,239]. The national osteoporosis foundation estimated the societal cost of osteoporosis at 58 Billion US$ for 2018 in the USA alone [240] with serious negative implications for the quality of life of those with this condition. Thus interventions to increase bone health and reduce the rate of osteoporosis as well as developing a mechanistic understanding of the development of this condition are of considerable medical interest.

Overall Fracture Incidence Increases With Age Data from 5 million adults in the General Practice Research Database 1988-1998. A Incidence of all fractures at any site by sex and age. B Fractures by sex and age at select sites, pelvis, Femur/Hip & Vertebra confer the greatest increased mortality rates and Radius/Ulna has the greatest sexual dimorphism. (Adapted from van Staa et al. [238].)

Figure 3.1: Overall Fracture Incidence Increases With Age Data from 5 million adults in the General Practice Research Database 1988-1998. A Incidence of all fractures at any site by sex and age. B Fractures by sex and age at select sites, pelvis, Femur/Hip & Vertebra confer the greatest increased mortality rates and Radius/Ulna has the greatest sexual dimorphism. (Adapted from van Staa et al. [238].)

Modelling work indicates that the largest determinant of osteoporosis risk is peak bone mineral density (BMD) with a 10% change in peak BMD leading to a predicted 13 year delay in the development of osteoporosis [241]. Though the baseline BMD is a better predictor of fracture risk, the rate of loss also independently predicts risk [242244]. Thus factors altering the peak bone mass will be important for modulating osteoporosis risk. Bone mass increases from intrauterine development through to a peak in early adulthood (Figure 3.2).

Bone Mass Over The Lifecourse Bone mass increases from intrauterine development through to a peak in early adulthood. Intervention in early life to modulate growth trajectory and increase peak mass may provide a higher starting point and delay bone loss in later life. (Reproduced from [245].)

Figure 3.2: Bone Mass Over The Lifecourse Bone mass increases from intrauterine development through to a peak in early adulthood. Intervention in early life to modulate growth trajectory and increase peak mass may provide a higher starting point and delay bone loss in later life. (Reproduced from [245].)

Calcium, phosphate and vitamin D are needed for bone development [246] and maintenance. Vitamin D is needed for sufficient intestinal absorption of calcium [247] and phosphate [248] in addition to the hormonal role it plays in the regulation of bone development. Calcium and vitamin D intake are modifiable factors and thus constitute a reasonable starting point for interventions to improve bone health. This raises the question of how and when they can best be deployed to maximise their impact on bone health.

Birth weight predicts bone mass in adulthood [249]. A longitudinal study with long term follow up at Helsinki University Central Hospital linked low childhood growth rate with greater fracture risk in later life [250]. Data from the Southampton Women’s Survey (SWS)[251] suggest that intervention in the interuterine period and infancy may afford the greatest opportunity to impact on long term bone health. Rate of growth in late pregnancy (19-34 weeks) and early post natal growth (up to ~2 years) predict bone mass at 4 years of age [252], and estimates of hip strength at 6 years [253]. The proportion of individuals crossing between tertiles of the distribution of growth in length decreases with age, indicating that the variability and thus potential malleability of growth is greatest during the period from 11 weeks gestation to ~2 years.

Observational data indicated a relationship between maternal vitamin D status and bone outcomes in children [254,255]. Prospective studies showed the effects of maternal vitamin D status on bone growth in newborns [256] and at 20 years of age [257]. MAVIDOS (Maternal Vitamin D Osteoporosis Study) is a randomised, double-blind, placebo-controlled trial of oral vitamin D supplementation (1000 IU cholecalciferol/day or placebo) from 14 weeks gestation in women with initial circulating 25(OH)-vitamin D levels of 25-100 nmol/l [207,208]. The primary outcome of the study was whole body bone mineral content, assessed by Dual Energy X-ray absorptiometry (DXA) within 14 days of birth, and follow up DXA at 4 years of age. MAVIDOS did not find significant differences in neonatal bone outcomes overall but there were significant differences between intervention and placebo for births taking place during the winter months. Winter births saw a mean difference in BMC between treatment groups of 5.5 g [95% CI 1.8-9.1]; p=0.004) and for BMD a mean difference 0.01 gcm-2 [0.00-0.02]; p=0.04 [208].

\(1,25(OH)_2-Vitamin~D\) has been implicated in the control of Plasma membrane \(Ca^{2+}\)-ATPases (PMCAs). The expression of the PMCA3 gene in the placenta has been associated with umbilical cord calcium concentration and intrauterine accrual of bone mineral content [258]. This provides part of a candidate mechanism linking intrauterine vitamin D availability to bone outcomes, but much remains to be elucidated about how maternal vitamin D status may impact bone development. Work in rat models has demonstrated the ability of maternal nutritional exposure during pregnancy to influence the epigenetic state and expression of genes in their offspring, specifically a change in DNA methylation was identified [259261]. Consequently, the DNA methylation status of target genes of interest for vitamin D and bone development related pathways have been examined [262].

Retinoid-X-receptor-alpha (RXRA), which forms a heterodimer with the vitamin D receptor is essential for the nuclear action of \(1,25(OH)_2-Vitamin~D\). Harvey et al. found that DNA methylation at four of ten CpG sites in the RXRA gene were significantly (\(p\le0.05\)) lower in the umbilical cord of offspring from cholecalciferol supplemented mothers compared to placebo with a mean difference in methylation of -1.98% (n=447, 95% CI -3.65 to -0.32, p=0.01). One CpG site in the gene is related to estimated free 25(OH)D levels [263].

Methylation of cyclin-dependent kinase inhibitor 2A (CDKN2A) in umbilical cord tissue has been implicated in bone cell activity mediating skeletal development and homeostasis. The CDKN2A locus has quite complex biology encoding two cell cycle inhibitors \(p14^{ARF}\) and \(p16^{INK4a}\) as well as the long non-coding RNA ANRIL which inhibits \(p16^{INK4a}\). Targeted analysis of nine CpGs in a 300bp stretch of the ANRIL promoter region of CDKN2A was carried out by Curtis et al.. Methylation at several of these sites showed an inverse correlation with bone size, mineral content, and mineral density at age four years [264].

Epigenetic states are a product of both genetics and environment. They exist on a continuum from complete or obligatory genetic determination through facilitative variation influenced by genetics to the agnostic to genetic state. When epigenetic state is obligatory it is effectively determined by the genetic state. When epigenetics is facilitative it has a genetic bias for a particular state but which is subject to environmental influences including the intracellular environment [265]. At the opposite extreme to a genetically obligatory epigentic state, the epigenetic state may be essentially agnostic to the genetic state and be determined entirely by environmental factors. This picture is complicated by the observation that genetics can not merely influence epigenetic state but also the extent to which that state will vary depending on environmental factors [266]. Epigenome-wide association studies have been performed on DNA methylation data for many traits including age (Section 1.5.1), smoking status [267], exposure to atmospheric particulate matter [268], and obesity [269]. Some robust and reproducible DNA methylation changes have been identified with EWAS, particularly for traits such as age and smoking. Unlike genome-wide association studies (GWAS) associations found by EWAS can be due to reverse causation [224]. That is to say, a genetic variant associated with a disease is very unlikely to have been caused by the disease state whereas an epigenetic association can be the consequence of a disease state. Whilst this can complicate the dissection of disease aetiology it can also be a boon in unpicking the biology that follows on from a particular disease or environmental exposure by revealing what biological networks are affected.

The previous work identifying DNA methylation changes at the RXRA [263] and CDKN2A/ANRIL [264] loci associated with childhood bone outcomes lead to an interest in establishing if DNA methylation changes were occuring at other loci. EWAS using the Illumina 450k and EPIC array technologies were performed for maternal Vitamin D status and childhood bone outcomes in data from the MAVIDOS and SWS cohorts.

3.3 Primary Questions

Are there DNA methylation changes associated with maternal vitamin D levels?

Are there DNA methylation changes associated childhood bone health markers?

What the DNA methylation changes with vitamin D and Bone outcomes at RXRA and CDKN2A can be observed?

3.4 Methods

3.4.1 Outline of EWAS

Three sets of EWAS analyses were performed: Phase I of the MAVIDOS EPIC arrays and comparison with 450k array results. Phase II of the MAVIDOS EPIC analysis with additional samples. Analysis of the SWS cord blood EPIC array data.

  1. MAVIDOS phase I, cord tissue, results (EPIC n = 140, 450k n = 60), EWAS outcomes:
  • Intervention Vs. Placebo
  • Bone Mineral Content at Birth (g), measured by DXA
  • Maternal circulating 25(OH)-vitamin D levels (nmol/l) at 34 weeks
  • Change in Maternal Vitamin D form 11 to 34 weeks
  1. MAVIDOS phase II, cord tissue, results (EPIC n = 237), EWAS outcomes:
  • 4 year Total BMC (g), without heads* , measured by DXA
  1. SWS cord blood, EWAS outcomes:
  • 8 year Total BMC (kg), without heads*, adjusted for sex and age, measured by DXA (n = 408)
  • 6 year Total BMC (kg), without heads*, adjusted for sex and age, measured by DXA (n = 402)
  • 6 year Periosteal circumference of the Tibia at 38% from the distal end (mm), measured by Peripheral Quantitative Computed Tomography (PQCT) (n = 141)
  • 6 year Cortical density of the Tibia at 38% from the distal end (mg/cm3), measured by Peripheral PQCT (n = 141)

*‘without heads’ indicates that bone mass contributed by the head is not included in the total as the head is not scanned during the DXA imaging.

DNA was extracted from cord tissue for the MAVIDOS studies by Dr Nevena Krstic at the Institute of Developmental Sciences, University of Southampton. Phenotypic data and batch information was provided in spreadsheets by Dr Millie Parsons at the MRC Lifecourse Epidemiology Unit, University of Southampton. All array data was provided as raw IDAT files. Analysis of the Illumina EPIC and 450k methylation arrays was carried out in the R statistical programming language (v3.5.2) using the meffil package [270], which was chosen as it is capable of performing functional normalisation in a more memory efficient fashion than alternatives such as minfi.

3.4.2 Functional Normalisation

Functional normalisation is an approach to removing unwanted variation associated with ‘batch’ effects such as the date on which a sample was analysed or which slide a sample is on which was developed by Fortin et al. [219]. This noise in the data masks the signal associated with the underlying biological effect of interest.

Functional normalisation makes use of control probes on the arrays which are designed to capture only technical variation as surrogates for the sources of unwanted variation. The control probes are processed into 42 summary measures, and principal components analysis is performed on the control probe summary matrices for all samples. The top m principal components (PCs) are used as the surrogates for technical variation going forward. The number, m, of PCs from the control probe summary matrices used is informed by the amount of residual variation remaining after normalisation. Picking the number of PCs which correspond to the last steep drop in residual variation is the approach recommended by the implementers of meffil, and Fortin et al. [219] recommend an m of 2 as performing well across a variety of analyses. (See supplementary material from [219] for details of the control probe summary process.)

The process used by functional normalisation is a variant on quantile normalisation. Instead of forcing the empirical marginal distributions of the samples to be identical at each site across arrays. To produce the normalised values it constructs a quantile function which only removes variation arising from surrogates for batch variation. This approach is effective even when comparing samples with large global differences in methylation levels such as between normal and cancer samples, but cannot overcome high degrees of confounding [219].

3.4.3 Genetically Confounded and Multi-mapping Probes

Probes from different locations in the genome with similar sequences, especially following the reduction in sequence complexity associated with bisulfite conversion, can be cross-reactive on the arrays leading to erroneous signals and are thus commonly excluded from analysis. DNAm is often strongly influenced by genetic factors, the effect is especially pronounced when variants alter the sequence at CpG sites themselves as the site can then no longer be methylated. Thus, sites that have common genetic variants at the probe site are also excluded from the analysis, as well as some sites whose methylation is known to be under strong genetic influence by common genetic variation.

43,254 probes on the EPIC array have been identified as multi-mapping, DNA binding to the probes may be derived from other locations in the genome invalidating these probes as a measure of methylation at their intended loci [222]. 12,510 probes were identified as having genetic variants at the CpG locus they are intending to assay, this can produce misleading results as mutant bases can resemble the products of bisulfite conversion [222]. 1,812 probes where found t overlap regions exhibiting haplotype-specific methylation associated with common non-SNP genetic variants (CNVs, Indels, STRs) and regional in-phase clustering of CpG-SNPs [107]. Zhou et al. [223] provided a list of probes which they recommend ‘masking’ from the 450k array due to multi-mapping issues, genetic variants overlapping the CpG sites, and other factors which may render results from these probes problematic.

In order to identify any potential additional sources of genetic confounding in the phase I MAVIDOS analysis, Probes with methylation values which cluster into distinct groups were identified using the ‘gap hunting’ method developed by Andrews et al. [226]. Such distinct clusters of methylation can arise from genetic variants which influence methylation levels being present in homozygous and heterozygous forms in the study population, see Figure 3.15. As the sensitivity and specificity of ‘gap hunting’ is limited, it is the advice of the authors not to exclude probes flagged by gaphunter() prior to performing EWAS. It is instead advised to check if any of the results appear in this list after the fact and examine the possibility of a genetic effect if they are. In phase II of the MAVIDOS analysis and in the SWS analyses only probes with specific technical QC issues were excluded prior to normalisation and EWAS. Flagging of problematic probes occurred after EWAS on any significant results.

3.4.4 EWAS Models

3.4.4.1 Cell-Type Correction

Cell-type composition is a known confounder in epigenetic studies [271,272]. Observed variation DNA methylation can be cell-type intrinsic, where changes in DNA methylation are not driven by changes in the cell-type composition of the tissue; or cell-type extrinsic, where changes in DNA methylation are due to changes in the cell-type composition of the population sampled (Figure 3.3). An established approach for addressing this potential source of confounding is to add terms to the regression model which reflect the cell-type composition of each sample. Cell-type composition can be ascertained through three main approaches: Direct cell count data; Estimating cell counts using the experimental data and models fitted on DNA methylation data from reference panels of known cell-type composition [272]; or reference free approaches which make use of mathematical methods to identify sources of confounding variation such as cell-type heterogeneity. Whilst there are several reference samples available for cord blood [273276], none were available for cord tissue at the time of the initial analysis, and consequently a reference free technique was used in phase I of the MAVIDOS analysis. In the analysis of the SWS cord blood data the “andrews and bakulski cord blood” reference panel [275] was used in meffil for cell-type composition estimation. Models were fitted using Surrogate Variable Analysis (SVA) [277] and independent Surrogate Variable Analysis (iSVA) [278]. The focus of these results is on SVA as minimal differences between the two methods were observed and SVA was recommended based on comparisons of the performance of various cell-type heterogeneity correction methods [279,280].

Diagrammatic representation of DNAm change arising from extrinsic or intrinsic changes in DNA methylation. Extrinsic changes are due to shifts in cell-type composition. Intrinsic changes in DNA methylation occur without changes in the proportions of cell-types. These two modes of change are of course not mutually exclusive and both can be occurring.

Figure 3.3: Diagrammatic representation of DNAm change arising from extrinsic or intrinsic changes in DNA methylation. Extrinsic changes are due to shifts in cell-type composition. Intrinsic changes in DNA methylation occur without changes in the proportions of cell-types. These two modes of change are of course not mutually exclusive and both can be occurring.

A reference panel for cord tissue samples was recently published [281]. This cell-type reference had not yet been integrated into the meffil R package used to perform the EWAS analyses so I created a fork of meffil including this reference panel in the required format, this can be installed directly from github. The code I used to add this reference to the package data before building the R package can be found in this gist. Cell-type correction based on this reference was used in phase II of the MAVIDOS analysis.

3.4.4.2 Structure of models fitted for each EWAS

By default EWAS in meffil are run with four different models:

  1. No covariates, attempting to predict methylation with the variable of interest.
  2. All covariates*, attempting to predict methylation with the variable of interest plus a user-supplied list of covariates.
  3. Surrogate Variables + all covariates, attempting to predict methylation with user-supplied covariates and surrogate variables generated from SVA.
  4. Independent Surrogate Variables + all covariates, attempting to predict methylation with user-supplied covariates and independent surrogate variables generated from iSVA.

*All covariates varies depending on the specific model and is detailed in the individual sections detailing each EWAS. Running EWAS with multiple models permits the effects of adding the various covariates on the results of the analysis to be seen.

3.4.5 Concordance of EPIC and 450k EWAS Results

In order to ascertain if the results from the EWAS in the 450k and EPIC arrays were producing similar sets of probes in the top ranking positions when ordered by p-value, the concordance (% overlap) between the top k probes, where \(k = 1 .. 100,000\) was calculated. Only probes in common between the two arrays were used and k was incremented in steps of 50.

3.5 Results

3.5.1 Combined MAVIDOS Summary data

Summary statistics for the variables used in the MAVIDOS EWAS models are provided in tables 3.1 & 3.2.

Table 3.1: Summary statistics for variables used in the SWS EWAS. Blood cell count information is estimated based on the methylation data as described in the methods.
Description mean median min max
Woman’s Age at Time of Birth 30.67 31.40 -58.30 41.95
EP: Woman’s BMI 26.28 25.04 17.54 41.68
True if Yes in Late (dcrsmok) or early (acrsmok) pregnancy , NA if NA in both, FALSE if both No or No &amp; NA 0.08 0.00 0.00 1.00
Obst: Gestational age (weeks) 40.16 40.29 34.57 42.57
B cells 0.03 0.03 0.01 0.11
CD4+ T-cells 0.01 0.00 0.00 0.07
CD8+ T-cells 0.01 0.01 0.00 0.08
Granulocytes 0.06 0.04 0.00 0.51
Monocytes 0.01 0.00 0.00 0.17
Natural Killer Cells 0.02 0.02 0.00 0.11
Endothelial 0.13 0.13 0.05 0.20
Epithelial 0.17 0.16 0.02 0.42
Stromal 0.58 0.62 0.14 0.86
cam: 34 week VitD 54.54 55.03 10.70 119.00
cam: 11 week VitD 44.83 44.95 14.40 90.20
4yr DXA: Subtotal BMC (g) 356.26 352.52 237.30 455.94
4yr DXA: Subtotal BMD (g/cm sq) 0.47 0.47 0.38 0.57
Table 3.2: Summary statistics for variables used in the MAVIDOS EWAS divided by the sex of the child. Blood cell count information is estimated based on the methylation data as described in the methods. n=375, Female n=170, Male n=205.
Sex Description mean median min max
F Woman’s Age at Time of Birth 31.14 31.36 19.25 41.78
EP: Woman’s BMI 26.41 24.85 18.80 41.16
True if Yes in Late (dcrsmok) or early (acrsmok) pregnancy , NA if NA in both, FALSE if both No or No &amp; NA 0.11 0.00 0.00 1.00
Obst: Gestational age (weeks) 40.08 40.29 34.57 42.14
B cells 0.03 0.03 0.01 0.07
CD4+ T-cells 0.01 0.00 0.00 0.07
CD8+ T-cells 0.01 0.01 0.00 0.07
Granulocytes 0.07 0.04 0.00 0.51
Monocytes 0.02 0.00 0.00 0.17
Natural Killer Cells 0.02 0.02 0.00 0.06
Endothelial 0.13 0.13 0.07 0.19
Epithelial 0.17 0.17 0.03 0.37
Stromal 0.56 0.61 0.14 0.79
cam: 34 week VitD 53.87 53.58 11.50 114.50
cam: 11 week VitD 45.35 45.53 14.55 79.60
4yr DXA: Subtotal BMC (g) 355.30 353.05 247.53 439.21
4yr DXA: Subtotal BMD (g/cm sq) 0.46 0.46 0.38 0.54
M Woman’s Age at Time of Birth 30.28 31.41 -58.30 41.95
EP: Woman’s BMI 26.18 25.14 17.54 41.68
True if Yes in Late (dcrsmok) or early (acrsmok) pregnancy , NA if NA in both, FALSE if both No or No &amp; NA 0.05 0.00 0.00 1.00
Obst: Gestational age (weeks) 40.22 40.43 35.86 42.57
B cells 0.03 0.03 0.01 0.11
CD4+ T-cells 0.01 0.00 0.00 0.07
CD8+ T-cells 0.01 0.01 0.00 0.08
Granulocytes 0.05 0.04 0.00 0.39
Monocytes 0.01 0.00 0.00 0.16
Natural Killer Cells 0.02 0.02 0.00 0.11
Endothelial 0.13 0.13 0.05 0.20
Epithelial 0.16 0.16 0.02 0.42
Stromal 0.60 0.64 0.14 0.86
cam: 34 week VitD 55.07 56.88 10.70 119.00
cam: 11 week VitD 44.39 44.10 14.40 90.20
4yr DXA: Subtotal BMC (g) 357.01 352.52 237.30 455.94
4yr DXA: Subtotal BMD (g/cm sq) 0.48 0.48 0.39 0.57

3.5.2 MAVIDOS phase I

DNA methylation at none of the CpGs were significantly associated with any of the four variables of interest for each EWAS performed in either the EPIC or 450k datasets. The concordance between the probes with the top-ranked p-values in common between the EPIC and 450k data was at the level of chance.

3.5.2.1 Whole array QC

3.5.2.1.1 EPIC arrays

The predicted sex (performed with meffil) of the samples generated using sex chromosome probe intensities was checked against that in the sample annotation and two mismatches were found. These were MAVIDOS IDs 206 and 63 and the associated arrays were excluded from further analysis, (Figure 3.4).

Predicted sex of each sample based on the sex chromosome copy numbers inferred from probe intensities for the EPIC array data. Mismatches between the predicted sex and that asserted in the sample annotation metadata are shown in red. Two predicted sex values differ from their annotations. Plot generated by meffil QC report.

Figure 3.4: Predicted sex of each sample based on the sex chromosome copy numbers inferred from probe intensities for the EPIC array data. Mismatches between the predicted sex and that asserted in the sample annotation metadata are shown in red. Two predicted sex values differ from their annotations. Plot generated by meffil QC report.

The dataset also contained four samples for which there were two technical replicates, only the first replicate from each was used (144 arrays run for 140 individuals). Array 201516310023 (MAVIDOS ID 95) was excluded as its median methylated signal was more than \(3\sigma\) from the expected value, (Figure 3.5). No samples were excluded for having a higher than expected proportion of undetected probes (proportion of probes with detection p-value > 0.01 is > 0.1) (Figure 3.6). No samples were excluded for having a high proportion of probes with low bead counts (proportion of probes with bead number < 3 is > 0.1), (Figure 3.7). In total 3 of the EPIC arrays were excluded from the analysis for failing quality control leaving an n=137.

Median methylated signal vs unmethylated signal per sample for the EPIC array data, solid red line indicates linear regression of median methylated signal vs median unmethylated signal with dotted red lines representing \(3\sigma\) from the expected mean. Samples outside the expected range are indicated in the legend. Plot generated by meffil QC report.

Figure 3.5: Median methylated signal vs unmethylated signal per sample for the EPIC array data, solid red line indicates linear regression of median methylated signal vs median unmethylated signal with dotted red lines representing \(3\sigma\) from the expected mean. Samples outside the expected range are indicated in the legend. Plot generated by meffil QC report.

Proportion of probes with detection p-values >0.01 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.6: Proportion of probes with detection p-values >0.01 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Proportion of probes with a bead count of < 3 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.7: Proportion of probes with a bead count of < 3 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

3.5.2.1.2 450k Arrays

There were no mismatches between predicted and annotated sex (Figure 3.8). There were not any samples with outliers in their methylated / unmethylated probe proportions (Figure 3.9). No samples were excluded for having a higher than expected proportion of undetected probes (proportion of probes with detection p-value > 0.01 is > 0.1), (Figure 3.10). No samples were excluded for having a high proportion of probes with low bead counts (proportion of probes with bead number < 3 is > 0.1), (Figure 3.11).

Predicted sex of each sample based on the sex chromosome copy numbers inferred from probe intensities for the 450k array data. No predicted sex values differ from their annotations. Plot generated by meffil QC report.

Figure 3.8: Predicted sex of each sample based on the sex chromosome copy numbers inferred from probe intensities for the 450k array data. No predicted sex values differ from their annotations. Plot generated by meffil QC report.

Median methylated signal vs unmethylated signal per sample for the 450k array data, solid red line indicates linear regression of median methylated signal vs median unmethylated signal with dotted red lines representing \(3\sigma\) from the expected mean. Samples outside the expected range would be indicated in the legend. Plot generated by meffil QC report.

Figure 3.9: Median methylated signal vs unmethylated signal per sample for the 450k array data, solid red line indicates linear regression of median methylated signal vs median unmethylated signal with dotted red lines representing \(3\sigma\) from the expected mean. Samples outside the expected range would be indicated in the legend. Plot generated by meffil QC report.

Proportion of probes with detection p-values >0.01 by sample for the 450k array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.10: Proportion of probes with detection p-values >0.01 by sample for the 450k array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Proportion of probes with a bead count of < 3 by sample for the 450k array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.11: Proportion of probes with a bead count of < 3 by sample for the 450k array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

3.5.2.2 Probe QC

3.5.2.2.1 Probe QC - EPIC Arrays

There were no outliers within the control probes Figure 3.12. 1,626 probes were excluded for having high background signal in a high proportion of samples (proportion of samples with detection p-value > 0.01 is > 0.1), (Figure 3.13). 162 probes were excluded for having low bead count in a high proportion of samples (proportion of samples with bead number < 3 is > 0.1), (Figure 3.14). Probes with poor technical quality were excluded from the analysis prior to functional normalisation.

Control probe signal by sample for each summary group for the EPIC data. Outliers would be circled in black. Plot generated by meffil QC report.

Figure 3.12: Control probe signal by sample for each summary group for the EPIC data. Outliers would be circled in black. Plot generated by meffil QC report.

Undetectable probes across samples for EPIC data. Manhattan plot showing proportion of samples (y) in which a given probe (x) is not distinguishable from background noise, i.e. a detection p-value of > 0.01. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.13: Undetectable probes across samples for EPIC data. Manhattan plot showing proportion of samples (y) in which a given probe (x) is not distinguishable from background noise, i.e. a detection p-value of > 0.01. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Low bead count probes across samples for EPIC data. Manhattan plot showing the proportion of samples (y) in which a given probe (x) has a bead count of <3. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.14: Low bead count probes across samples for EPIC data. Manhattan plot showing the proportion of samples (y) in which a given probe (x) has a bead count of <3. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Problematic probes identified by Pidsley et al. [222] and those overlapping the regions identified by Bell et al. [107] were excluded from subsequent analysis after functional normalisation. This, including the poor quality probes, is a total of 57,396 unique probes excluded from the analysis (~6.62% of the total number of probes). Gap hunter identified a further 77,398 probes (8.9% of all probes) which might be subject to genetic confounding (Figure 3.15).

An example of the DNAm distribution for a result from the gaphunter() function. This is an example chosen to best exemplify the sort of result which is strongly suggestive of a genetic variant with an impact on methylation status acting on this site. It is unrepresentative of typical results from gaphunter() in that the groups have a relatively even membership, many results have a small number of individuals in one or more groups making it hard to distinguish methylation outliers caused by rarer genetic variants from those with other causes.

Figure 3.15: An example of the DNAm distribution for a result from the gaphunter() function. This is an example chosen to best exemplify the sort of result which is strongly suggestive of a genetic variant with an impact on methylation status acting on this site. It is unrepresentative of typical results from gaphunter() in that the groups have a relatively even membership, many results have a small number of individuals in one or more groups making it hard to distinguish methylation outliers caused by rarer genetic variants from those with other causes.

3.5.2.2.2 Probe QC - 450k Arrays

509 probes were excluded for having high background signal in a high proportion of samples (proportion of samples with detection p-value > 0.01 is > 0.1), (Figure 3.17). 1037 probes were excluded for having low bead count in a high proportion of samples (proportion of samples with bead number < 3 is > 0.1) (Figure 3.18). There was one sample (MAVIDOS ID 2183) with an outlier within the control probes, a dinitrophenyl labelled staining control probe, thus it was not excluded as only outliers in dye bias and bisulfite conversion control probes were deemed sufficient grounds for excluding a sample (Figure 3.16, Detailed contol probe desciption [221] p222).

Control probe signal by sample for each summary group for the 450k data. Outliers would be circled in black. Plot generated by meffil QC report.

Figure 3.16: Control probe signal by sample for each summary group for the 450k data. Outliers would be circled in black. Plot generated by meffil QC report.

Undetectable probes across samples for 450k data. Manhattan plot showing proportion of samples (y) in which a given probe (x) is not distinguishable from background noise, i.e. a detection p-value of > 0.01. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.17: Undetectable probes across samples for 450k data. Manhattan plot showing proportion of samples (y) in which a given probe (x) is not distinguishable from background noise, i.e. a detection p-value of > 0.01. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Low bead count probes across samples for 450k data. Manhattan plot showing the proportion of samples (y) in which a given probe (x) has a bead count of <3. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.18: Low bead count probes across samples for 450k data. Manhattan plot showing the proportion of samples (y) in which a given probe (x) has a bead count of <3. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

All probes on the ‘general mask’ list from Zhou et al. [223] were excluded from the analysis following functional normalisation, leaving a total of 418,632 probes for subsequent analysis.

3.5.2.3 Functional Normalisation

An m of 6 was chosen for the EPIC arrays as this value produced the last steep drop in residual variation, see Figure 3.19. An m of 6 was chosen for the 450k arrays as this value produced the last steep drop in residual variation, see Figure 3.20.

Residual variation remaining after functional normalisation of the top 20,000 most variable probes with m PCs from the control probe summary matrices for the EPIC array samples (n=137), for M = methylated and U = unmethylated probes.

Figure 3.19: Residual variation remaining after functional normalisation of the top 20,000 most variable probes with m PCs from the control probe summary matrices for the EPIC array samples (n=137), for M = methylated and U = unmethylated probes.

Residual variation remaining after functional normalisation of the top 20,000 most variable probes with m PCs from the control probe summary matrices for the 450k array samples (n=60), for M = methylated and U = unmethylated probes.

Figure 3.20: Residual variation remaining after functional normalisation of the top 20,000 most variable probes with m PCs from the control probe summary matrices for the 450k array samples (n=60), for M = methylated and U = unmethylated probes.

3.5.2.4 EWASs

All EWAS performed below were also performed in exactly the same fashion for the 60 450k arrays, none of these results were significant and they are not included here.

3.5.2.4.1 Neonatal Bone Mineral Content

Figure 3.21 illustrates the distribution of Neonatal Bone Mineral Content, the outcome on which this EWAS was performed.

Distribution of Neonatal Bone Mineral Content (g) for individuals in the EWAS.

Figure 3.21: Distribution of Neonatal Bone Mineral Content (g) for individuals in the EWAS.

No CpGs fell below the Bonferroni corrected significance threshold for an association between DNA methylation at that locus and neonatal bone mineral content, Figure 3.22. Sex and sample age at DXA were included as covariates in the ‘all’ model. SVA generated 5 significant surrogate variables which were additionally used in the SVA model.

Results of EWAS for neonatal bone mineral content with SVA model. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGs are represented by the largest and least translucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

Figure 3.22: Results of EWAS for neonatal bone mineral content with SVA model. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGs are represented by the largest and least translucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

3.5.2.4.2 Intervention / Placebo

No CpGs fell below the Bonferroni corrected significance threshold for an association between DNA methylation at that locus and Intervention/placebo group status, Figure 3.23. Sex and sample age at DXA were included as covariates in the ‘all’ model. SVA generated 5 significant surrogate variables which were additionally used in the SVA model.

Results of EWAS for intervention/placebo group status with SVA model. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGs are represented by the largest and least translucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

Figure 3.23: Results of EWAS for intervention/placebo group status with SVA model. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGs are represented by the largest and least translucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

3.5.2.4.3 Maternal Vitamin D (34wks)

Maternal Vitamin D levels remain substantially overlapping between intervention and placebo groups at 34wks, see Figure 3.24. Thus maternal vitamin D at 34wks may prove a more useful variable to model than intervention/placebo status.

Maternal circulating 25(OH)-vitamin D levels (nmol/l) at 11 and 34wks gestation, supplementation with 1000 IU/day cholecalciferol began at week 14. Each participant is shown at both time points linked by a line to indicate the direction of change. The violin plots indicate the density of the distribution of vitamin D values at each time point with the \(25^{th}\), \(50^{th}\), \(75^{th}\) quantiles indicated with horizontal black lines. The colour indicates Intervention (Red) / Placebo (Blue) group

Figure 3.24: Maternal circulating 25(OH)-vitamin D levels (nmol/l) at 11 and 34wks gestation, supplementation with 1000 IU/day cholecalciferol began at week 14. Each participant is shown at both time points linked by a line to indicate the direction of change. The violin plots indicate the density of the distribution of vitamin D values at each time point with the \(25^{th}\), \(50^{th}\), \(75^{th}\) quantiles indicated with horizontal black lines. The colour indicates Intervention (Red) / Placebo (Blue) group

No CpGs fell below the Bonferroni corrected significance threshold for an association between DNAm at that locus and maternal vitamin D levels at 34wks gestation, (Figure 3.25). Sex and sample age at DXA were included as covariates in the ‘all’ model. SVA generated 5 significant surrogate variables which were additionally used in the SVA model.

Results of EWAS for maternal circulating 25(OH)-vitamin D levels (nmol/l) levels at 34wks gestation with SVA model. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGss are represented by the largest and least transleucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

Figure 3.25: Results of EWAS for maternal circulating 25(OH)-vitamin D levels (nmol/l) levels at 34wks gestation with SVA model. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGss are represented by the largest and least transleucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

3.5.2.4.4 Change in Maternal Vitamin D

Figure 3.26 illustrates the change in maternal vitamin D from 11 to 34 weeks gestation. No CpGs fell below the Bonferroni corrected significance threshold for an association between DNAm at that locus and change in maternal vitamin D from 11 to 34wks gestation, Figure 3.27. SVA generated 5 significant surrogate variables which were additionally used in the SVA model.

Distribution of the changes in maternal circulating 25(OH)-vitamin D levels (nmol/l) levels from 11 to 34 weeks gestation.

Figure 3.26: Distribution of the changes in maternal circulating 25(OH)-vitamin D levels (nmol/l) levels from 11 to 34 weeks gestation.

Results of EWAS for change in maternal circulating 25(OH)-vitamin D levels (nmol/l) levels from 11 to 34wks gestation with SVA model. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGs are represented by the largest and least translucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

Figure 3.27: Results of EWAS for change in maternal circulating 25(OH)-vitamin D levels (nmol/l) levels from 11 to 34wks gestation with SVA model. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGs are represented by the largest and least translucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

3.5.2.5 Concordance of EPIC and 450k EWAS results

Concordance between the rankings of CpGss would suggest that the EWAS may be capturing a ‘real’ signal that is simply below the significance threshold with the statistical sensitivity/power that is available in this dataset. The concordance between the EPIC and 450k datasets (Figures 3.28 & 3.29) appears to be at roughly the level expected by chance. This does not lend support to the possibility that there are associations between the variables of interest and DNAm that are beneath the current sensitivity of the study, it does not, however, rule this out. In the absence of any CpGs above the significance threshold and with poor concordance between the 450k and EPIC array p-value rankings no further analyses such as gene set enrichment and differentially methylated region (DMR) calling have been carried out at this time. Use of the two similar cell-free cell-type correction methods SVA and iSVA for the same dataset produce highly concordant results so are used as an example of how a highly concordant result would appear in the plots of these results.

Concordance between the top 100,000 probes in common between the EWASs run on the EPIC (n=137) and 450k (n=60) data sets. bmc = bone mineral content, cc = Intervention / Placebo, difVD = Change in Vitamin D from 11 to 34wks, ExHighConc = Example of High Concordance generated using SVA vs iSVA results for the 450k intervention/placebo EWAS. Dotted line denotes concordance expected by chance (intersects 50% at 387,511, the number of shared probes).

Figure 3.28: Concordance between the top 100,000 probes in common between the EWASs run on the EPIC (n=137) and 450k (n=60) data sets. bmc = bone mineral content, cc = Intervention / Placebo, difVD = Change in Vitamin D from 11 to 34wks, ExHighConc = Example of High Concordance generated using SVA vs iSVA results for the 450k intervention/placebo EWAS. Dotted line denotes concordance expected by chance (intersects 50% at 387,511, the number of shared probes).

Concordance between the top 10,000 probes in common between the EWASs run on the EPIC (n=137) and 405k (n=60) data sets. bmc = bone mineral content, cc = Intervention / Placebo, difVD = Chance in Vitamin D from 11 to 34wks, ExHighConc = Example of High Concordance generated using SVA vs iSVA results for the 450k intervention/placebo EWAS. Dotted line denotes concordance expected by chance (intersects 50% at 387,511, the number of shared probes).

Figure 3.29: Concordance between the top 10,000 probes in common between the EWASs run on the EPIC (n=137) and 405k (n=60) data sets. bmc = bone mineral content, cc = Intervention / Placebo, difVD = Chance in Vitamin D from 11 to 34wks, ExHighConc = Example of High Concordance generated using SVA vs iSVA results for the 450k intervention/placebo EWAS. Dotted line denotes concordance expected by chance (intersects 50% at 387,511, the number of shared probes).

3.5.3 MAVIDOS phase II

DNA methylation at none of the CpGs was significantly associated with bone mineral content at 4 years.

3.5.3.1 Whole Array QC

The predicted sex of the samples generated using sex chromosome probe intensities was checked against that in the sample annotation and 5 mismatches were found (Figure 3.30). Two samples did not have sex information, 2 had predicted sex discordant with their annotated sex and 1 was ambiguous these were excluded from further analysis.

Predicted sex of each sample based on the sex chromosome copy numbers inferred from probe intensities for the EPIC array data. Mismatches between the predicted sex and that asserted in the sample annotation metadata are shown in red. Two predicted sex values differ from their annotations. Plot generated by meffil QC report.

Figure 3.30: Predicted sex of each sample based on the sex chromosome copy numbers inferred from probe intensities for the EPIC array data. Mismatches between the predicted sex and that asserted in the sample annotation metadata are shown in red. Two predicted sex values differ from their annotations. Plot generated by meffil QC report.

Arrays: 201516310023 (mavid: 1490), 201516320022 (mavid: 1672), 201530430013 (mavvid: 4090), 201530430015 (mavid: 1903), & 202410280028 (mavid: 2078) were excluded as their median methylated signal was more than \(3\sigma\) from the expected value, (Figure 3.31). No samples were excluded for having a higher than expected proportion of undetected probes (proportion of probes with detection p-value > 0.01 is > 0.1) (Figure 3.32). No samples were excluded for having a high proportion of probes with low bead counts (proportion of probes with bead number < 3 is > 0.1), (Figure 3.7).

Median methylated signal vs unmethylated signal per sample for the EPIC array data, solid red line indicates linear regression of median methylated signal vs median unmethylated signal with dotted red lines representing \(3\sigma\) from the expected mean. Samples outside the expected range are indicated in the legend. Plot generated by meffil QC report.

Figure 3.31: Median methylated signal vs unmethylated signal per sample for the EPIC array data, solid red line indicates linear regression of median methylated signal vs median unmethylated signal with dotted red lines representing \(3\sigma\) from the expected mean. Samples outside the expected range are indicated in the legend. Plot generated by meffil QC report.

Proportion of probes with detection p-values >0.01 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.32: Proportion of probes with detection p-values >0.01 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Proportion of probes with a bead count of < 3 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.33: Proportion of probes with a bead count of < 3 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

3.5.3.2 Probe QC

There was one outlier within the control probes, in a non-critical specificity control probe for detecting non-specific methylation detection over an unmethylated background (Figure 3.12). 1,317 probes were excluded for having high background signal in a high proportion of samples (proportion of samples with detection p-value > 0.01 is > 0.1), (Figure 3.13). 220 probes were excluded for having low bead count in a high proportion of samples (proportion of samples with bead number < 3 is > 0.1), (Figure 3.14). Probes with poor technical quality were excluded from the analysis prior to functional normalisation.

Control probe signal by sample for each summary group for the EPIC data. Outliers would be circled in black. Plot generated by meffil QC report.

Figure 3.34: Control probe signal by sample for each summary group for the EPIC data. Outliers would be circled in black. Plot generated by meffil QC report.

Undetectable probes across samples for EPIC data. Manhattan plot showing proportion of samples (y) in which a given probe (x) is not distinguishable from background noise, i.e. a detection p-value of > 0.01. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.35: Undetectable probes across samples for EPIC data. Manhattan plot showing proportion of samples (y) in which a given probe (x) is not distinguishable from background noise, i.e. a detection p-value of > 0.01. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Low bead count probes across samples for EPIC data. Manhattan plot showing the proportion of samples (y) in which a given probe (x) has a bead count of <3. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.36: Low bead count probes across samples for EPIC data. Manhattan plot showing the proportion of samples (y) in which a given probe (x) has a bead count of <3. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

3.5.3.3 Functional Normalisation

An m of 6 was chosen as this value produced the last steep drop in residual variation, see Figure 3.37.

Residual variation remaining after functional normalisation of the top 20,000 most variable probes with m PCs from the control probe summary matrices for the EPIC array samples (n=237), for M = methylated and U = unmethylated probes.

Figure 3.37: Residual variation remaining after functional normalisation of the top 20,000 most variable probes with m PCs from the control probe summary matrices for the EPIC array samples (n=237), for M = methylated and U = unmethylated probes.

3.5.3.4 EWAS

3.5.3.4.1 Bone Mineral Content at 4 years

The model for this EWAS attempted to predict whole body (minus head) bone mineral content (g) (Figure 3.38) correcting for: Cell type composition (B-cells, CD4+T, CD8+T, Granulocytes, Monocytes, Natural Killer cells, Endothelial cells, Epithelial cells, Stromal cells), Mother’s age at birth, Sex, Mother’s BMI at 11 weeks gestation, whether the mother smoked during pregnancy, & Gestational Age. The EWAS was performed on 237 individuals. surrogate variable analysis identified 19 significant surrogate variables which were included in the model. No CpGs fell below the Bonferroni corrected significance threshold for an association between DNAm at that locus and bone mineral content, in any of the models including SVA (Figure 3.39).

Distribution of whole body (minus head) bone mineral content in grams at 4 years of age.

Figure 3.38: Distribution of whole body (minus head) bone mineral content in grams at 4 years of age.

Results of EWAS for whole body minus head bone mineral content (g) with SVA model (n = 237). Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGs are represented by the largest and least translucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

Figure 3.39: Results of EWAS for whole body minus head bone mineral content (g) with SVA model (n = 237). Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. Size and transparency of points increases with \(-log_{10}(p-value)\) such that the most significant CpGs are represented by the largest and least translucent points. x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(6.18\times10^{-8}~(0.05\div808,585)\).

3.5.3.5 RXRA & CDKN2A

None of the probes associated with the RXRA and CDKN2A loci exhibited genome wide significant changes in DNA methylation however nominally significant (p < 0.05) changes are observed at some of the probes within these genes in the MAVIDOS phase II analysis dataset. Table 3.3 lists these model results for maternal vitamin D levels and table 3.4 for bone mineral content and density. Appendix 6.13 provides the complete list of these model results not just the nominally significant results listed here.

Table 3.3: Results for modelling Maternal Vitamin D levels with umbilical cord tissue DNA methylation levels and covariates at probes associated with the RXRA and CDKN2A genes. vit D: maternal circulating vitamin D nmol/l
Timepoint Model type Gene position Probe Slope p-value
11 weeks Corrected CDKN2A Chr9: 21,968,233- 21,968,233 cg12840719 230.3170 0.0066
Chr9: 21,986,062- 21,986,062 cg00550721 -63.8120 0.0291
Chr9: 21,991,666- 21,991,666 cg13109095 -122.1244 0.0358
Chr9: 21,975,898- 21,975,898 cg10458809 27.5771 0.0365
Chr9: 21,971,256- 21,971,256 cg08686553 96.1148 0.0378
Uncorrected Chr9: 21,968,233- 21,968,233 cg12840719 180.7349 0.0147
Chr9: 21,986,062- 21,986,062 cg00550721 -50.0384 0.0411
Corrected RXRA Chr9:137,329,237-137,329,237 cg14344989 -92.4988 0.0005
Chr9:137,250,935-137,250,935 cg02059519 -106.0882 0.0007
Chr9:137,271,457-137,271,457 cg13312109 -74.2411 0.0007
Chr9:137,263,644-137,263,644 cg05829961 -106.9248 0.0019
Chr9:137,283,224-137,283,224 cg00622905 -108.5121 0.0021
Chr9:137,260,147-137,260,147 cg04900958 -99.1450 0.0033
Chr9:137,268,524-137,268,524 cg10122776 -141.6744 0.0035
Chr9:137,266,722-137,266,722 cg06831947 -109.8622 0.0036
Chr9:137,279,180-137,279,180 cg14051662 -82.8152 0.0040
Chr9:137,262,485-137,262,485 cg25619159 -90.9658 0.0043
Chr9:137,329,341-137,329,341 cg24164254 -88.6921 0.0068
Chr9:137,282,509-137,282,509 cg00735574 -95.1502 0.0105
Chr9:137,301,743-137,301,743 cg08868540 -130.8756 0.0105
Chr9:137,302,231-137,302,231 cg13413384 -92.9525 0.0107
Chr9:137,252,129-137,252,129 cg14236758 -89.7349 0.0114
Chr9:137,326,382-137,326,382 cg25300535 27.8782 0.0142
Chr9:137,271,104-137,271,104 cg01894436 -96.1754 0.0225
Chr9:137,272,766-137,272,766 cg13847322 -73.9350 0.0230
Chr9:137,299,685-137,299,685 cg00545196 -63.1364 0.0282
Chr9:137,268,074-137,268,074 cg14121282 -65.2194 0.0323
Chr9:137,257,228-137,257,228 cg10271868 -62.4176 0.0361
Chr9:137,298,350-137,298,350 cg01063003 -71.8198 0.0376
Chr9:137,258,920-137,258,920 cg13786567 -63.6464 0.0397
Chr9:137,265,865-137,265,865 cg14484045 -77.6335 0.0440
Chr9:137,270,186-137,270,186 cg13941235 -44.6552 0.0466
Chr9:137,225,311-137,225,311 cg13689699 -49.7538 0.0487
Uncorrected Chr9:137,266,722-137,266,722 cg06831947 -93.4263 0.0020
Chr9:137,250,935-137,250,935 cg02059519 -79.0829 0.0022
Chr9:137,271,457-137,271,457 cg13312109 -50.1030 0.0066
Chr9:137,268,524-137,268,524 cg10122776 -108.6295 0.0122
Chr9:137,262,485-137,262,485 cg25619159 -71.3247 0.0135
Chr9:137,279,180-137,279,180 cg14051662 -62.6430 0.0146
Chr9:137,260,147-137,260,147 cg04900958 -63.4042 0.0187
Chr9:137,326,382-137,326,382 cg25300535 24.7823 0.0230
Chr9:137,282,509-137,282,509 cg00735574 -56.3782 0.0273
Chr9:137,268,074-137,268,074 cg14121282 -43.7701 0.0299
Chr9:137,221,918-137,221,918 cg00639224 32.4576 0.0406
Chr9:137,272,766-137,272,766 cg13847322 -53.5243 0.0433
34 weeks Corrected CDKN2A Chr9: 21,975,622- 21,975,622 cg25339269 -472.4791 0.0081
RXRA Chr9:137,217,473-137,217,473 cg14204281 -258.5922 0.0259
Chr9:137,301,635-137,301,635 cg18566083 114.9198 0.0483
Chr9:137,215,364-137,215,364 cg04329455 99.9363 0.0489
Uncorrected Chr9:137,299,285-137,299,285 cg13865488 59.8832 0.0427
a Corrected: vit D ~ beta + wageb + Sex + awbmi + smokpreg + egest + Bcell + CD4T + CD8T + Gran + Mono + NK + Endothelial + Epithelial + Stromal, Uncorrected: vit D ~ beta, beta: methylation value
Table 3.4: Results for modelling bone mineral content and density at 4 years of age with umbilical cord tissue DNA methylation levels and covariates at probes associated with the RXRA and CDKN2A genes. Corrected: BMC/D ~ beta + wageb + Sex + awbmi + smokpreg + egest + Bcell + CD4T + CD8T + Gran + Mono + NK + Endothelial + Epithelial + Stromal, Uncorrected: BMC/D ~ beta, beta: methylation value, BMC/D Bone Mineral Content (g)/Density(gcm-2) at 4 years (DXA)`
Metric Model type Gene position Probe Slope p-value
BMC Corrected CDKN2A Chr9: 21,991,666- 21,991,666 cg13109095 -353.4675 0.0270
Chr9: 21,975,243- 21,975,243 cg16468520 1968.4218 0.0347
Chr9: 21,975,023- 21,975,023 cg26269171 906.1528 0.0468
Uncorrected Chr9: 21,975,243- 21,975,243 cg16468520 2174.9837 0.0117
Chr9: 21,975,495- 21,975,495 cg06955353 1041.1349 0.0239
Chr9: 21,975,023- 21,975,023 cg26269171 979.8917 0.0268
Chr9: 21,975,898- 21,975,898 cg10458809 72.1257 0.0339
Chr9: 21,975,305- 21,975,305 cg02008397 939.3506 0.0416
Chr9: 21,974,704- 21,974,704 cg13601799 388.9265 0.0462
Corrected RXRA Chr9:137,232,372-137,232,372 cg14462321 -232.0554 0.0130
Chr9:137,210,392-137,210,392 cg03416552 202.2096 0.0197
Chr9:137,299,436-137,299,436 cg14654324 136.6815 0.0383
Uncorrected Chr9:137,232,372-137,232,372 cg14462321 -206.8839 0.0050
Chr9:137,299,436-137,299,436 cg14654324 137.8739 0.0319
Chr9:137,329,237-137,329,237 cg14344989 -88.5216 0.0379
Chr9:137,291,076-137,291,076 cg02766323 -360.6465 0.0439
BMD Corrected CDKN2A Chr9: 21,975,243- 21,975,243 cg16468520 1.5214 0.0355
Uncorrected Chr9: 21,975,243- 21,975,243 cg16468520 1.5416 0.0259
Chr9: 21,974,704- 21,974,704 cg13601799 0.3137 0.0447
Corrected RXRA Chr9:137,232,372-137,232,372 cg14462321 -0.1705 0.0188
Chr9:137,268,524-137,268,524 cg10122776 -0.2328 0.0252
Chr9:137,299,436-137,299,436 cg14654324 0.1098 0.0319
Chr9:137,302,231-137,302,231 cg13413384 -0.1657 0.0335
Chr9:137,264,876-137,264,876 cg18483374 0.1447 0.0351
Chr9:137,210,289-137,210,289 cg01557614 0.1434 0.0457
Uncorrected Chr9:137,232,372-137,232,372 cg14462321 -0.1758 0.0029
Chr9:137,264,876-137,264,876 cg18483374 0.1344 0.0085
Chr9:137,268,524-137,268,524 cg10122776 -0.2471 0.0097
Chr9:137,299,436-137,299,436 cg14654324 0.1126 0.0287
Chr9:137,326,227-137,326,227 cg14265066 -0.2243 0.0327
Chr9:137,228,542-137,228,542 cg04875697 -0.2058 0.0438

The result that 25 of 27 probes with nominally significant changes in DNA methylation are for decreasing methylation with increased vitamin D is consistent with the lower methylation in the offspring of the supplemented mothers in the original publication on the RXRA changes [263], Figure 3.40. However at 34 weeks only 4 probes showed nominally significant changes and 3 of the 4 were increases in methylation with vitamin D, Figure 3.41.

RXRA gene associated probes showing nominally significant (p<0.05) changes in DNA methylation with maternal vitamin D at 11 weeks gestation

Figure 3.40: RXRA gene associated probes showing nominally significant (p<0.05) changes in DNA methylation with maternal vitamin D at 11 weeks gestation

RXRA gene associated probes showing nominally significant (p<0.05) changes in DNA methylation with maternal vitamin D at 34 weeks gestation

Figure 3.41: RXRA gene associated probes showing nominally significant (p<0.05) changes in DNA methylation with maternal vitamin D at 34 weeks gestation

The inverse relationship between DNA methylation and bone mineral content / density in the ANRIL promoter region of CDKN2A is not seen in the majority of CDKN2A probes which show nominally significant changes with bone mineral content / Density in this analysis. Of the 7 probes associated with BMC 1 shows a decrease (Figure 3.42) and both probes associated with BMD show an increase (Figure 3.43).

CDKN2A gene associated probes showing nominally significant (p<0.05) changes in DNA methylation with bone mineral density at 4 years

Figure 3.42: CDKN2A gene associated probes showing nominally significant (p<0.05) changes in DNA methylation with bone mineral density at 4 years

CDKN2A gene associated probes showing nominally significant (p<0.05) changes in DNA methylation with bone mineral density at 4 years

Figure 3.43: CDKN2A gene associated probes showing nominally significant (p<0.05) changes in DNA methylation with bone mineral density at 4 years

These results do not correspond to the same CpGs examined in the previous studies, either in the case of RXRA [263] or CDKN2A [264]. Thus these results cannot directly support or contradict these findings for the same specific CpG sites, making their interpretation somewhat challenging. In the case of RXRA these results appear somewhat supportive of the previous observation that there is lower methylation at higher vitamin D levels at least when considering only the 11 weeks gestation data. Whereas the CDKN2A results do not support the previous finding of an inverse relationship between DNA methylation and bone mineral content / density at the loci which they target.

3.5.4 Southampton Women’s Survey (SWS)

Summary statistics for the variables used in the EWAS models are provided in tables 3.5 & 3.6.

Table 3.5: Summary statistics for variables used in the SWS EWAS. Blood cell count information is estimated based on the methylation data as described in the methods. (Ob.) obstetric exam, (init.) initial survey.
Description mean median min max
Ob: Woman’s age at child’s birth 31.67 32.08 23.99 38.12
Init: Woman’s body mass index 24.82 24.29 17.40 40.17
Smoking in pregnancy 0.14 0.00 0.00 1.00
Ob: Gestational age - from LMP data, or U/S scan or Elaine 39.84 40.19 33.14 42.73
B cells 0.09 0.09 0.01 0.21
CD4+ T-cells 0.17 0.16 0.05 0.34
CD8+ T-cells 0.13 0.13 0.05 0.31
Gran 0.45 0.43 0.19 0.66
Monocytes 0.04 0.04 0.00 0.11
Natural Killer Cells 0.00 0.00 0.00 0.07
Red Blood Cells 0.14 0.11 0.04 0.39
8 yr DXA: Total BMC (kg), without heads,adjusted for sex and age 0.72 0.70 0.46 1.08
6 yr DXA: Total BMC (kg), without heads,adjusted for sex and age 0.55 0.55 0.39 0.73
6yr PQCT: 38% - Periosteal circumference (circular ring model) (mm) 52.35 51.86 44.10 64.35
6yr PQCT: 38% - Cortical density (mg/cm3) 1036.39 1040.60 949.90 1111.10
Table 3.6: Summary statistics for variables used in the SWS EWAS divided by the sex of the child. Blood cell count information is estimated based on the methylation data as described in the methods. n = 464, Female n=226, Male n=238. (Ob.) obstetric exam, (init.) initial survey.
Sex Description mean median min max
Female Ob: Woman’s age at child’s birth 31.97 32.23 23.99 38.12
Init: Woman’s body mass index 25.47 24.67 18.60 40.17
Smoking in pregnancy 0.14 0.00 0.00 1.00
Ob: Gestational age - from LMP data, or U/S scan or Elaine 39.87 40.21 33.14 42.73
B cells 0.10 0.09 0.01 0.21
CD4+ T-cells 0.17 0.17 0.05 0.34
CD8+ T-cells 0.12 0.12 0.05 0.31
Granulocytes 0.47 0.48 0.19 0.66
Monocytes 0.04 0.04 0.00 0.11
Natural Killer Cells 0.00 0.00 0.00 0.04
Red Blood Cells 0.12 0.10 0.04 0.32
8 yr DXA: Total BMC (kg), without heads,adjusted for sex and age 0.72 0.72 0.46 1.08
6 yr DXA: Total BMC (kg), without heads,adjusted for sex and age 0.56 0.56 0.39 0.73
6yr PQCT: 38% - Periosteal circumference (circular ring model) (mm) 52.69 51.99 44.74 64.35
6yr PQCT: 38% - Cortical density (mg/cm3) 1039.99 1041.40 949.90 1110.90
Male Ob: Woman’s age at child’s birth 31.27 31.80 24.17 37.73
Init: Woman’s body mass index 23.98 23.12 17.40 32.42
Smoking in pregnancy 0.14 0.00 0.00 1.00
Ob: Gestational age - from LMP data, or U/S scan or Elaine 39.80 39.79 35.77 42.14
B cells 0.09 0.09 0.05 0.18
CD4+ T-cells 0.17 0.15 0.06 0.33
CD8+ T-cells 0.14 0.13 0.07 0.26
Granulocytes 0.42 0.40 0.24 0.64
Monocytes 0.04 0.04 0.00 0.08
Natural Killer Cells 0.01 0.00 0.00 0.07
Red Blood Cells 0.16 0.13 0.04 0.39
8 yr DXA: Total BMC (kg), without heads,adjusted for sex and age 0.70 0.69 0.50 1.00
6 yr DXA: Total BMC (kg), without heads,adjusted for sex and age 0.53 0.52 0.42 0.67
6yr PQCT: 38% - Periosteal circumference (circular ring model) (mm) 51.89 51.83 44.10 62.47
6yr PQCT: 38% - Cortical density (mg/cm3) 1031.69 1030.10 961.00 1111.10

DNA methylation at two CpGs were significantly associated with total bone mineral content at 6 years and periosteal circumference at 6 years respectively. CpG cg26559250 located at Chr6:157,653,445-157,653,447 at the ZDHHC14 (zinc finger DHHC-type palmitoyltransferase 14) gene showed an increase of DNA methylation with increasing bone mineral content with a significance of \(2.52\times 10^{-8}\) CpG cg22570676 located at Chr19:2,527,492-2,527,494 at the GNG7 (G protein subunit gamma 7) gene showed an increase of DNA methylation with increasing periosteal circumference with a significance of \(4.24\times 10^{-8}\). Neither probe is flagged for known technical issues or genetic confounding [223].

3.5.4.1 Whole Array QC

The predicted sex of the samples generated using sex chromosome probe intensities was checked against that in the sample annotation and no mismatches were found (Figure 3.44). No samples were excluded for having mismatched Sex.

Predicted sex of each sample based on the sex chromosome copy numbers inferred from probe intensities for the EPIC array data. Mismatches between the predicted sex and that asserted in the sample annotation metadata are shown in red. Plot generated by meffil QC report.

Figure 3.44: Predicted sex of each sample based on the sex chromosome copy numbers inferred from probe intensities for the EPIC array data. Mismatches between the predicted sex and that asserted in the sample annotation metadata are shown in red. Plot generated by meffil QC report.

No samples were excluded for having a median methylated signal that was more than \(3\sigma\) from the expected value, (Figure 3.45). No samples were excluded for having a higher than expected proportion of undetected probes (proportion of probes with detection p-value > 0.01 is > 0.1) (Figure 3.46). No samples were excluded for having a high proportion of probes with low bead counts (proportion of probes with bead number < 3 is > 0.1), (Figure 3.47).

Median methylated signal vs unmethylated signal per sample for the EPIC array data, solid red line indicates linear regression of median methylated signal vs median unmethylated signal with dotted red lines representing \(3\sigma\) from the expected mean. Samples outside the expected range are indicated in the legend. Plot generated by meffil QC report.

Figure 3.45: Median methylated signal vs unmethylated signal per sample for the EPIC array data, solid red line indicates linear regression of median methylated signal vs median unmethylated signal with dotted red lines representing \(3\sigma\) from the expected mean. Samples outside the expected range are indicated in the legend. Plot generated by meffil QC report.

Proportion of probes with detection p-values >0.01 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.46: Proportion of probes with detection p-values >0.01 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Proportion of probes with a bead count of < 3 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.47: Proportion of probes with a bead count of < 3 by sample for the EPIC array data. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

3.5.4.2 Probe QC

There were no outliers within the control probes (Figure 3.48). 833 probes were excluded for having high background signal in a high proportion of samples (proportion of samples with detection p-value > 0.01 is > 0.1), (Figure 3.49). 127 probes were excluded for having low bead count in a high proportion of samples (proportion of samples with bead number < 3 is > 0.1), (Figure 3.50). Probes with poor technical quality were excluded from the analysis prior to functional normalisation.

Control probe signal by sample for each summary group for the EPIC data. Outliers would be circled in black. Plot generated by meffil QC report.

Figure 3.48: Control probe signal by sample for each summary group for the EPIC data. Outliers would be circled in black. Plot generated by meffil QC report.

Undetectable probes across samples for EPIC data. Manhattan plot showing proportion of samples (y) in which a given probe (x) is not distinguishable from background noise, i.e. a detection p-value of > 0.01. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.49: Undetectable probes across samples for EPIC data. Manhattan plot showing proportion of samples (y) in which a given probe (x) is not distinguishable from background noise, i.e. a detection p-value of > 0.01. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Low bead count probes across samples for EPIC data. Manhattan plot showing the proportion of samples (y) in which a given probe (x) has a bead count of <3. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

Figure 3.50: Low bead count probes across samples for EPIC data. Manhattan plot showing the proportion of samples (y) in which a given probe (x) has a bead count of <3. Black line indicates the exclusion threshold of 0.1. Plot generated by meffil QC report.

3.5.4.3 Functional Normalisation

An m of 12 was chosen as this value produced the last steep drop in residual variation, see Figure 3.51.

Residual variation remaining after functional normalisation of the top 20,000 most variable probes with m PCs from the control probe summary matrices for the EPIC array samples (n=464), for M = methylated and U = unmethylated probes.

Figure 3.51: Residual variation remaining after functional normalisation of the top 20,000 most variable probes with m PCs from the control probe summary matrices for the EPIC array samples (n=464), for M = methylated and U = unmethylated probes.

3.5.4.4 EWAS

For all the EWAS, blood cell-type counts were estimated using the Houseman method [272] and the cord blood cell-type reference panel from Bakulski et al. [275]. The Cell-types estimated were: B cells, CD4+ T cells, CD8+ T Cells, Granulocytes, Monocytes, Natural Killer cells, & Erythrocytes. In addition to the estimated blood cell counts all models included as covariates: Maternal Age at time of birth (years), Sex, maternal BMI at 11 weeks gestation, parity, whether or not the mother smoked during pregnancy, and gestational age.

EWAS for bone mineral content at 6 and 8 years would be expected to produce similar results given that BMC at these ages are highly correlated (Figure 3.52).

Bone Mineral content at 6 and 8 years of are correlated with an \(R^2 = 0.88\).

Figure 3.52: Bone Mineral content at 6 and 8 years of are correlated with an \(R^2 = 0.88\).

3.5.4.4.1 Total Bone Mineral Conent at 8 years

Figure 3.53, illustrates the distribution of bone mineral content at 8 years in the 408 individuals in this EWAS. Surrogate variable analysis identified 95 significant surrogate variables, this is likely an overestimate stemming from small amounts of variation remaining unaccounted for by the manual model thus the Manhattan plots based on the manual model were included. No CpGs fell below the Bonferroni corrected significance threshold (\(5.92\times10^{-8}\)) for an association between DNA methylation at that locus and total bone mineral content minus head at 8 years adjusted for age and sex, Figure 3.54.

Distribution of whole body (minus head) bone mineral content in kg (ttotbmcwhasa) at 8 years of age (n = 408), adjusted for sex and age, as measured by DXA.

Figure 3.53: Distribution of whole body (minus head) bone mineral content in kg (ttotbmcwhasa) at 8 years of age (n = 408), adjusted for sex and age, as measured by DXA.

Results of EWAS for whole body (minus head) bone mineral content in kg at 8 years of age (n = 408), adjusted for sex and age. The ‘all’ model results are shown here. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. The x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(5.92\times10^{-8}\).

Figure 3.54: Results of EWAS for whole body (minus head) bone mineral content in kg at 8 years of age (n = 408), adjusted for sex and age. The ‘all’ model results are shown here. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. The x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(5.92\times10^{-8}\).

3.5.4.4.2 Total Bone Mineral Conent at 6 years

Figure 3.55, illustrates the distribution of bone mineral content at 6 years in the 402 individuals in this EWAS. Surrogate variable analysis identified 97 significant surrogate variables, this is likely an overestimate stemming from small amounts of variation remaining unaccounted for by the manual model thus Manhattan plots based on the manual model have been included.

Distribution of whole body (minus head) bone mineral content in kg (stotbmcwhasa) at 6 years of age (n = 402), adjusted for sex and age, as measured by DXA.

Figure 3.55: Distribution of whole body (minus head) bone mineral content in kg (stotbmcwhasa) at 6 years of age (n = 402), adjusted for sex and age, as measured by DXA.

One CpG fell below the Bonferroni corrected significance threshold (\(5.92\times10^{-8}\)) for an association between DNA methylation at that locus and total bone mineral content minus head at 6 years adjusted for age and sex, Figure 3.56. This CpG was cg26559250 which is located at Chr6:157,653,445-157,653,447 adjacent to the ZDHHC14 (zinc finger DHHC-type palmitoyltransferase 14) gene. cg26559250 was significant (p = \(2.52\times 10^{-8}\), increase of 1.46% per kg) in the ‘all’ model and was also Bonferroni significant in the uncorrected model. However, cg26559250 was not significant in the SVA or iSVA models suggesting that it may be attributable to batch or cell-type effects.

Results of EWAS for whole body (minus head) bone mineral content in kg at 6 years of age (n = 402), adjusted for sex and age. The ‘all’ model results are shown here. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. The x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(5.92\times10^{-8}\).

Figure 3.56: Results of EWAS for whole body (minus head) bone mineral content in kg at 6 years of age (n = 402), adjusted for sex and age. The ‘all’ model results are shown here. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. The x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(5.92\times10^{-8}\).

3.5.4.4.3 Periosteal Circumference at 6 years

Figure 3.57, illustrates the distribution of periosteal circumference at 38% from the distal end of the tibia at 6 years (mm) in the 141 individuals in this EWAS. Surrogate variable analysis identified 37 significant surrogate variables, this is likely an overestimate stemming from small amounts of variation remaining unaccounted for by the manual model thus Manhattan plots based on the manual model have been included.

Distribution of periosteal circumference at 38% from the distal end of the tibia (mm) at 6 years of age (s3peric), adjusted for sex and age, as measured by PQCT (n = 141).

Figure 3.57: Distribution of periosteal circumference at 38% from the distal end of the tibia (mm) at 6 years of age (s3peric), adjusted for sex and age, as measured by PQCT (n = 141).

One CpG fell below the Bonferroni corrected significance threshold for an association between DNA methylation at that locus and periosteal circumference at 38% from the distal end of the tibia at 6 years (mm) adjusted for age and sex, Figure 3.58. This CpG was cg22570676 which is located at Chr19:2,527,492-2,527,494 at the GNG7 (G protein subunit gamma 7) gene. cg22570676 was significant (p = \(4.24\times 10^{-8}\), increase of 0.370% per mm) in the ‘all’ model and was also Bonferroni significant in the uncorrected model. However, cg22570676 was not significant in the SVA or iSVA models suggesting that it may be attributable to batch or cell-type effects.

Results of EWAS for periosteal circumference at 38% from the distal end of the tibia (mm) at 6 years of age (n = 141), adjusted for sex and age. The ‘all’ model results are shown here. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. The x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(5.92\times10^{-8}\)

Figure 3.58: Results of EWAS for periosteal circumference at 38% from the distal end of the tibia (mm) at 6 years of age (n = 141), adjusted for sex and age. The ‘all’ model results are shown here. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. The x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(5.92\times10^{-8}\)

3.5.4.4.4 Cortical Denisty at 6 years

Figure 3.59, illustrates the distribution of cortical density at 38% from the distal end of the tibia at 6 years (\(mg~cm^{-3}\)) in the 141 individuals in this EWAS. No CpGs fell below the Bonferroni corrected significance threshold (\(5.92\times10^{-8}\)) for an association between DNA methylation at that locus and cortical density at 6 years, Figure 3.60. Surrogate variable analysis identified 37 significant surrogate variables, this is likely an overestimate stemming from small amounts of variation remaining unaccounted for by the manual model thus Manhattan plots based on the manual model.

Distribution of cortical density at 38% from the distal end of the tibia (\(mg~cm^{-3}\)) at 6 years of age (s3crtden), as measured by PQCT (n = 141).

Figure 3.59: Distribution of cortical density at 38% from the distal end of the tibia (\(mg~cm^{-3}\)) at 6 years of age (s3crtden), as measured by PQCT (n = 141).

Results of EWAS for cortical density at 38% from the distal end of the tibia (\(mg~cm^{-3}\)) at 6 years of age (n = 141). The ‘all’ model results are shown here. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. The x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(5.92\times10^{-8}\)

Figure 3.60: Results of EWAS for cortical density at 38% from the distal end of the tibia (\(mg~cm^{-3}\)) at 6 years of age (n = 141). The ‘all’ model results are shown here. Bidirectional Manhattan plot on which \(-log_{10}(p-value)\) is plotted on the y axis and the sign of this value represents the direction of change. The x axis represents chromosomes and position thereupon. Red line indicates the significance threshold of \(5.92\times10^{-8}\)

3.6 Discussion

EWAS for 9 outcomes were carried out across 3 sets of samples from MAVIDOS and SWS using the EPIC and 450K array platforms. No significant results were found in either the first or second phase of the MAVIDOS analysis. Two possible results for bone outcomes at 6 years were identified in the SWS data but did not remain in models including surrogate variables for possible confounding effects. The effect observed at CpG cg26559250, adjacent to the ZDHHC14 gene, in EWAS for bone mineral content at 6 years was not seen at the 8 year time point. Furthermore both this result and the finding for CpG cg22570676 and periosteal circumference were of small effect sizes 1.46% per kg, 0.370% per mm respectively. Consequently these results should be treated with considerable caution, they are in need of replication before they can be considered reliable. No Bonferroni significant DNA methylation changes at the CDKN2A and RXRA loci which have been previously associated with maternal vitamin D and bone phenotypes were identified, despite the presence of 95 and 75 probes annotated as being in the vicinity of these genes respectively.

Whilst the calculation of power for EWAS is a complex and rather understudied problem [282284] it is possible to achieve some approximations using Cohen’s methods [285]. To achieve the modest goal of 80% power for a small effect size (\(r^2 = 0.02\)) in a linear regression analysis (F-test) with 7 covariates at a significance level suitable for the EPIC array of p = \(5.92\times10^{-8}\) an n of 2607 is needed. Seven was the smallest number of covariates used in an SVA model in these analyses, models in the SWS analyses had 13 manually specified variables. When considering what is for EWAS a very large effect size (\(r^2 = 0.15\)) with the 13 covariates used in the SWS models it is possible to achieve 80% power for an n of 374. This is an effect size in line with the effects of smoking on DNA methylation at some loci [286,287]. This would make the two largest EWAS performed here for BMC at 8 and 6 years (n = 408, n = 402 respectively) powered only to find large effect sizes with just over 80% probability. The most generous set of parameters (80% power, \(r^2 = 0.15\), p = \(5.92\times10^{-8}\), & 2 covariates) yield an n of 259, more realistic numbers (90% power, \(r^2 = 0.02\), p = \(5.92\times10^{-8}\), & 13 covariates) yield an n of 3370. EWAS have identified biologically relevant changes associated with environmental exposures in DNA methylation with magnitudes of less than a single percentage point, and percentage changes in the low single digits are not uncommon in EWAS [287]. This makes all of the EWAS performed here underpowered to identify small DNA methylation changes which might be expected to occur. A collaboration with colleagues at the MRC-IEU, University of Bristol is underway to perform a meta-analysis to include these data with similar results from other cohorts in order to increase the power of these analyses. The covariates included in the models for the second phase of MAVIDOS analysis and the SWS analysis are matched to those being used by our collaborators to maximise the comparability of our results.

Given that the effect of maternal vitamin D on neonatal bone mass appears to be seasonal, with only babies born in the winter months showing statistically significant benefits of supplementation [208], it would be interesting to perform seasonally stratified EWAS were sufficient numbers available to do so with reasonable power. An initial step might be to ascertain in birth season has a significant interaction with DNA methylation state when predicting maternal vitamin D levels or bone outcomes in EWAS models.

Attempting to identify small changes in the overall DNA methylation state of complex populations of cells like blood and umbilical cord tissue which are associated with phenotypes such as circulating maternal vitamin D is a technically challenging undertaking. This work provides two candidates for further analysis for associations between DNA methylation and bone health outcomes at 6 years of age. Furthermore, these results have contributed to a larger meta analysis of other studies based on the DNA methylation array data in additional populations with greater power to detect associations with metrics of bone health.