Pinto, Brendan J, O'Connor, Brian, Schatz, Michael C, Zarate, Samantha, Wilson, Melissa A (February 2023) Concerning the eXclusion in human genomics: The choice of sex chromosome representation in the human genome drastically affects number of identified variants. (Public Dataset) (Submitted)
Preview |
PDF
2023_Pinto_Concerning_the_eXclusion_in_human_genomics.pdf - Submitted Version Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (241kB) | Preview |
Abstract
Over the past 30 years, a community of scientists have pieced together every base pair of the human reference genome from telomere-to-telomere. Interestingly, most human genomics studies omit more than 5% of the genome from their analyses. Under ‘normal’ circumstances, omitting any chromosome(s) from analysis of the human genome would be reason for concern—the exception being the sex chromosomes. Sex chromosomes in eutherians share an evolutionary origin as an ancestral pair of autosomes. In humans, they share three regions of high sequence identity (~98-100%), which—along with the unique transmission patterns of the sex chromosomes—introduce technical artifacts into genomic analyses. However, the human X chromosome bears numerous important genes—including more “immune response” genes than any other chromosome—which makes its exclusion irresponsible when sex differences across human diseases are widespread. To better characterize the effect that including/excluding the X chromosome may have on variants called, we conducted a pilot study on the Terra cloud platform to replicate a subset of standard genomic practices using both the CHM13 reference genome and sex chromosome complement-aware (SCC-aware) reference genome. We compared quality of variant calling, expression quantification, and allele-specific expression using these two reference genome versions across 50 human samples from the Genotype-Tissue-Expression consortium annotated as females. We found that after correction, the whole X chromosome (100%) can generate reliable variant calls—allowing for the inclusion of the whole genome in human genomics analyses as a departure from the status quo of omitting the sex chromosomes from empirical and clinical genomics studies.
Item Type: | Paper |
---|---|
Subjects: | bioinformatics bioinformatics > genomics and proteomics > genetics & nucleic acid processing bioinformatics > genomics and proteomics organism description > animal bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes organism description > animal > mammal > primates > hominids organism description > animal > mammal > primates > hominids > human organism description > animal > mammal organism description > animal > mammal > primates |
CSHL Authors: | |
Communities: | CSHL labs > Schatz lab |
SWORD Depositor: | CSHL Elements |
Depositing User: | CSHL Elements |
Date: | 22 February 2023 |
Date Deposited: | 29 Sep 2023 17:30 |
Last Modified: | 10 Jan 2024 20:54 |
PMCID: | PMC9980147 |
Related URLs: | |
Dataset ID: | |
URI: | https://repository.cshl.edu/id/eprint/41067 |
Actions (login required)
Administrator's edit/view item |