Alignment and clustering of phylogenetic markers--implications for microbial diversity studies

White, J. R., Navlakha, S., Nagarajan, N., Ghodsi, M. R., Kingsford, C., Pop, M. (March 2010) Alignment and clustering of phylogenetic markers--implications for microbial diversity studies. BMC Bioinformatics, 11 (3). p. 152. ISSN 1471-2105

Navlakha_2010_BMCbio.pdf - Published Version

Download (2MB) | Preview
DOI: 10.1186/1471-2105-11-152


BACKGROUND: Molecular studies of microbial diversity have provided many insights into the bacterial communities inhabiting the human body and the environment. A common first step in such studies is a survey of conserved marker genes (primarily 16S rRNA) to characterize the taxonomic composition and diversity of these communities. To date, however, there exists significant variability in analysis methods employed in these studies. RESULTS: Here we provide a critical assessment of current analysis methodologies that cluster sequences into operational taxonomic units (OTUs) and demonstrate that small changes in algorithm parameters can lead to significantly varying results. Our analysis provides strong evidence that the species-level diversity estimates produced using common OTU methodologies are inflated due to overly stringent parameter choices. We further describe an example of how semi-supervised clustering can produce OTUs that are more robust to changes in algorithm parameters. CONCLUSIONS: Our results highlight the need for systematic and open evaluation of data analysis methodologies, especially as targeted 16S rRNA diversity studies are increasingly relying on high-throughput sequencing technologies. All data and results from our study are available through the JGI FAMeS website

Item Type: Paper
Additional Information: BMC bioinformatics
Uncontrolled Keywords: Cluster Analysis DNA, Bacterial/genetics Genetic Markers/*genetics *Genetic Variation *Phylogeny RNA, Bacterial/genetics RNA, Ribosomal, 16S/genetics Sequence Alignment
Subjects: bioinformatics > genomics and proteomics > analysis and processing > alignment processing
bioinformatics > genomics and proteomics > annotation > phylogenetic tree annotation
bioinformatics > genomics and proteomics > alignment > sequence alignment
bioinformatics > computational biology
CSHL Authors:
Communities: CSHL labs > Navlakha lab
Depositing User: Matthew Dunn
Date: 24 March 2010
Date Deposited: 08 Nov 2019 14:44
Last Modified: 08 Nov 2019 14:44
PMCID: PMC2859756
Related URLs:

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving