A fresh look at the data
With other Brown researchers, including Crawford and Samuel Pattillo Smith, Ramachandran started working on developing statistical techniques that moved beyond individual mutations to include genes and pathways.
It’s not as if this information did not exist; over the past two decades, funding agencies and biobanks around the world have made enormous investments to generate large-scale datasets of genotypes, exomes and whole-genome sequences from diverse human ancestry, which are then merged with medical records and quantitative trait measurements. However, the researchers explained, analyzes of such datasets are usually limited to the GWA association analyzes that assume a direct correlation between mutations and traits.
The researchers studied 25 traits in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan, as well as 44,348 individuals from the PAGE Consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian / Alaska Native individuals. They performed statistical tests of association at the mutation, gene and pathway level for the 25 quantitative traits.
They identified 1,000 gene-level associations that are genome-wide significant in at least two ancestry groups across these 25 traits, as well as pathway associations in European, East Asian and Native Hawaiian groups. A majority of these would not have been identified using GWA alone, the researchers said.
“Instead of focusing on the single mutation statistical tests – GWA – we are basically opening up a larger retinue of tests that can look for patterns at the gene level or the biologically annotated pathway level,” said Pattillo Smith, a computational biology Ph.D. . candidate in Ramachandran’s lab. For a long time, scientists have been so focused on the effect of individual mutations that a lot of valuable information is being ignored in GWA studies or going unreported in resulting publications – especially in ancestry populations that have smaller cohorts because the test at the mutation level is incredibly sensitive to a number of confounding factors. One of the benefits of aggregating across mutations to the level of a region or gene is that you can kind of smooth those things over and be more robust in your detection of the genome-to-trait relationship. ”
The researchers were aiming for what’s called “biological interpretability,” Ramachandran said, “which is how we could deploy these methods in a way to analyze biobanks to their full extent, and take advantage of all the information they have to offer.”
Applying unbiased methodology to biased data sets
In the paper, the researchers discuss how biobanks are heavily skewed toward people who self-identify as having European ancestry, noted Crawford, an assistant professor of biostatistics at Brown affiliated with the Center for Computational Molecular Biology. A hidden gem of this new research, Crawford said, is that it shows how developing sophisticated statistical methods can help overcome limitations like an underrepresented sample of non-European ancestry groups.
“You do not have to wait until the number of people from other ancestry groups is equal to the number of people self-identifying as European,” Crawford said. “In fact, even if more data is generated, the same imbalance is likely to be perpetuated. In the meantime, statistical methods at higher scales of genes and pathways can still help us gain insight into genetic architecture that can be applied in a beneficial way to these underrepresented ancestry groups. This methodology can help us use data more equitably, right now. ”
In a field like genomics, the stakes are high, Ramachandran said.
“It’s really important to us that we understand trait architecture better so that we can make steps towards providing effective therapies for everyone, from every ancestry group.”
The following institutions also contributed to this research: University of North Carolina, Chapel Hill; University of Southern California, Los Angeles; Rutgers University; Fred Hutchinson Cancer Research Center; the Icahn School of Medicine at Mount Sinai, NY; the University of Colorado, Denver; Johns Hopkins University; and Microsoft Research New England.
This research was supported in part by the US National Institutes of Health (R01 GM118652, R35 GM139628, NIH T32 GM128596).