Analytical guidelines for co-fractionation mass spectrometry obtained through global profiling of gold standard Saccharomyces cerevisiae protein complexes

Pang, C. N. I., Ballouz, S., Weissberger, D., Thibaut, L. M., Hamey, J. J., Gillis, J., Wilkins, M. R., Hart-Smith, G. (August 2020) Analytical guidelines for co-fractionation mass spectrometry obtained through global profiling of gold standard Saccharomyces cerevisiae protein complexes. Mol Cell Proteomics, 19 (11). pp. 1876-1895. ISSN 1535-9476

URL: https://pubmed.ncbi.nlm.nih.gov/32817346/
DOI: 10.1074/mcp.RA120.002154

Abstract

Co-fractionation mass spectrometry (CF-MS) is a technique with potential to characterise endogenous and unmanipulated protein complexes on an unprecedented scale. However this potential has been offset by a lack of guidelines for best-practice CF-MS data collection and analysis. To obtain such guidelines, this study thoroughly evaluates novel and published Saccharomyces cerevisiae CF-MS datasets using very high proteome coverage libraries of yeast gold standard complexes. A new method for identifying gold standard complexes in CF-MS data, Reference Complex Profiling, and the Extending 'Guilt-by-Association' by Degree (EGAD) R package are used for these evaluations, which are verified with concurrent analyses of published human data. By evaluating data collection designs, which involve fractionation of cell lysates, it is found that near-maximum recall of complexes can be achieved with fewer samples than published studies. Distributing sample collection across orthogonal fractionation methods, rather than a single high resolution dataset, leads to particularly efficient recall. By evaluating 17 different similarity scoring metrics, which are central to CF-MS data analysis, it is found that two metrics rarely used in past CF-MS studies - Spearman and Kendall correlations - and the recently introduced Co-apex metric frequently maximise recall, while a popular metric - Euclidean distance - delivers poor recall. The common practice of integrating external genomic data into CF-MS data analysis is also evaluated, revealing that this practice may improve the precision and recall of known complexes but is generally unsuitable for predicting novel complexes in model organisms. If studying non-model organisms using orthologous genomic data, it is found that particular subsets of fractionation profiles (e.g. the lowest abundance quartile) should be excluded to minimise false discovery. These assessments are summarised in a series of universally applicable guidelines for precise, sensitive and efficient CF-MS studies of known complexes, and effective predictions of novel complexes for orthogonal experimental validation.

Item Type: Paper
Additional Information: 1535-9484 Pang, Chi Nam Ignatius Ballouz, Sara Weissberger, Daniel Thibaut, Loïc M Hamey, Joshua J Gillis, Jesse Wilkins, Marc R Hart-Smith, Gene Orcid: 0000-0003-3907-0367 Journal Article United States Mol Cell Proteomics. 2020 Aug 18:mcp.RA120.002154. doi: 10.1074/mcp.RA120.002154.
Uncontrolled Keywords: Bioinformatics Chromatography Protein complex analysis Protein-Protein Interactions* Saccharomyces cerevisiae Yeast* co-fractionation mass spectrometry protein correlation profiling
CSHL Authors:
Communities: CSHL labs > Gillis Lab
Depositing User: Matthew Dunn
Date: 18 August 2020
Date Deposited: 30 Nov 2020 16:37
Last Modified: 30 Nov 2020 16:37
PMCID: PMC7664123
Related URLs:
URI: https://repository.cshl.edu/id/eprint/39762

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving