Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants

Fore, Ruby and Boehme, Jaden and Li, Kevin and Westra, Jason and Tintle, Nathan (2020) Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants. Frontiers in Genetics, 11. ISSN 1664-8021

[thumbnail of pubmed-zip/versions/1/package-entries/fgene-11-591606/fgene-11-591606.pdf]

Text
pubmed-zip/versions/1/package-entries/fgene-11-591606/fgene-11-591606.pdf - Published Version
Download (2MB)

Official URL: https://doi.org/10.3389/fgene.2020.591606

Abstract

Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease.

Item Type:	Article
Subjects:	Open Article Repository > Medical Science
Depositing User:	Unnamed user with email support@openarticledepository.com
Date Deposited:	23 Jan 2023 07:25
Last Modified:	27 Apr 2024 13:21
URI:	http://journal.251news.co.in/id/eprint/326

Actions (login required)

: View Item