 Tim Wehr
|
The previous installment of this column (1) surveyed the challenges in obtaining high quality results in bottom-up proteomics,
the sources of variability in proteomics experiments, and the difficulty in comparing results obtained from different laboratories
using different sample preparation procedures, different instrument platforms, and different bioinformatic software. Five
organizations were identified that have programs in place for standardizing proteomics workflows. These are the Association
of Biomolecular Research Facilities (ABRF), the Biological Reference Material Initiative (BRMI), Clinical Proteomic Technology
Assessment for Cancer (CPTAC), the Fixing Proteomics Campaign, and the Human Proteome Organization (HUPO). At the time of
writing, the HUPO Test Sample Working Group had completed a collaborative study on protein identification but the results
were not published until after the column had gone to press (2). This installment of "Directions in Discovery" will review
the results of the study, as they clearly reveal the sources of variability in bottom-up proteomics and point to the road
ahead in standardizing proteomics workflows.
The HUPO Test Sample
The HUPO sample consisted of 20 human proteins in the mass range of 32–110 kDa. To create the sample, candidate sequences
were selected from the open reading frame collection and the mammalian gene collection, expressed in E. coli, and purified using preparative sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) or 2D high performance
liquid chromatography (HPLC) (anion-exchange and reversed-phase chromatography). Purity of the proteins was determined to
be 95% or greater by 1D SDS-PAGE. Quality and stability of the test sample was confirmed by mass spectrometry (MS) analysis.
All of the 20 proteins were selected to contain at least one unique tryptic peptide of 1250 ą5 Da, each with a different amino
acid sequence. This feature was designed to test for peptide undersampling derived from the data-dependent acquisition methods
used by most bottom-up LC–MS protocols.
Sample Distribution to CollaboratorsThe 20-component test sample was distributed to 27 laboratories selected for their expertise in proteomics techniques. Of
these, 24 were academic or industrial research laboratories or core facilities, while three were instrument vendors. Sample
recipients were instructed to identify all 20 proteins and all 22 unique peptides with mass 1250 ą5 Da and to report results
to the lead investigator of the Test Sample Working Group. Participants were allowed to use procedures and instrumentation
they routinely employed in their laboratories so that effectiveness of different workflows could be assessed. To minimize
variability in data matching and reporting, participants were requested to use the same version of the NCBI nonredundant human
protein database.
Initial Study Results
In the initial reports returned to the Test Sample Working Group, only seven of the 27 participating laboratories identified
all 20 proteins. The remaining 20 laboratories experienced a variety of problems. The first group (seven laboratories) reported
naming errors in the protein identifications. The second group (six laboratories) reported naming errors, false positives,
and redundant identifications. The remaining group of seven laboratories experienced several problems. These included trypsinization
problems, undersampling, incomplete matching of MS spectra due to acrylamide alkylation, database search errors, and use of
overly stringent search criteria.
Results for the peptide sequences were even more problematical; only one of the 27 laboratories reported detection of all
22 peptides. Six of the 22 peptides contained cysteine residues, which are modified in the reduction and alkylation steps
performed before trypsin digestion. Only three additional laboratories reported detection of any of the cysteine-containing
peptides. Several laboratories incorrectly reported 1250-Da peptides arising from contaminating proteins or missed trypsin
cleavage.