As establishment of the sequence of the human genome neared completion, it was recognized that the next task for the biological
science community would be to characterize the working products of the genome, which are largely proteins. The emerging field
of proteomics set as its goal the identification and measurement of all the proteins in a cell or tissue with the hope that,
in so doing, candidate proteins for disease biomarkers or drug targets would be found. This is proving to be a daunting task,
given that each of the approximately 25,000 genes can give rise to multiple protein products through splicing and introduction
of posttranslational modifications. An added complication is the huge range of protein concentrations — usually many orders
of magnitude — and the possibility that the most interesting ones are in low abundance.
Tim Wehr
A host of technologies and experimental approaches has been applied to address the proteomics problem. The most widely used
is the "bottom-up" approach. Proteins in an extract or lysate are digested with a proteolytic enzyme (typically trypsin),
separated by reversed-phase liquid chromatography (LC), and introduced online into the electrospray ionization source of a
mass spectrometer. Peptide ions are resolved in a precursor ion scan, and several of the most intense ions in the scan are
subjected to fragmentation, a process referred to as LC–tandem mass spectrometry (MS) or LC–MS-MS. The combination of parent
ion mass values and the fragment ion masses are submitted to a search engine, which matches the peptide and fragment masses
with entries in a protein database to produce a protein identification. Quantitative information on one, several, or all proteins
can be obtained by label-free methods such as spectral counting or peak intensity measurements, by introduction of stable,
isotopically labeled tags, or by inclusion of a heavy-labeled version of one or several peptides found in the protein ("proteotypical
peptides") (1). In the case of complex samples such as body fluids or cell lysates, the number of potential analytes can be
enormous. A cell typically expresses several thousand proteins in any given state, each protein can generate up to dozens
of peptides, and each peptide can appear in several charge states in the mass spectrum. Therefore, a single proteomic sample
could conceivably contain upwards of 500,000 species or more. To reduce the complexity of the analytical problem, samples
often are subjected to prefractionation before LC–MS using techniques such as 1D or 2D gel electrophoresis (2), in-solution
isoelectric focusing (3), or multidimensional high performance liquid chromatography (HPLC) (4).
Table I: Sources of variability in proteomic experiments
There are several limitations to the bottom-up approach when applied to complex samples. First, because of the wide dynamic
range in protein concentrations, and the inability of a mass spectrometer to sample all peptide ions, the approach is inherently
biased toward higher abundance proteins. Second, the large number of peptides in a complex digest combined with a limited
number of MS duty cycles in an analysis reduces sampling reproducibility for low-abundance peptides. Third, information about
peptide isoforms (for example, posttranslational modifications) can be lost. Finally, the large number of variables in the
bottom-up experiment can compromise the reproducibility in intralaboratory and interlaboratory analyses. These are summarized
in Table I.
Table II: Organizations for standardizing proteomics
Because of the large number of variables and lack of expertise in many laboratories, the quality of the data in early proteomics
studies was poor and the field developed a tarnished reputation due to poor reproducibility. Over the last several years,
efforts to standardize proteomics protocols and to provide reference samples have been mounted by several organizations. Five
of these organizations are listed in Table II with their URLs. This installment of "Directions in Discovery" will review each
of these organizations and programs they have initiated.