AJP - Lung Journal of Applied Physiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Am J Physiol Lung Cell Mol Physiol 287: L1-L23, 2004; doi:10.1152/ajplung.00301.2003
1040-0605/04 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (31)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Hirsch, J.
Right arrow Articles by Matthay, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hirsch, J.
Right arrow Articles by Matthay, M. A.

Invited Review

Proteomics: current techniques and potential applications to lung disease

Jan Hirsch,1 Kirk C. Hansen,2 Alma L. Burlingame,2 and Michael A. Matthay1

1Cardiovascular Research Institute and 2Mass Spectrometry Facility, Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94143


    ABSTRACT
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
Proteomics aims to study the whole protein content of a biological sample in one set of experiments. Such an approach has the potential value to acquire an understanding of the complex responses of an organism to a stimulus. The large vascular and air space surface area of the lung expose it to a multitude of stimuli that can trigger a variety of responses by many different cell types. This complexity makes the lung a promising, but also challenging, target for proteomics. Important steps made in the last decade have increased the potential value of the results of proteomics studies for the clinical scientist. Advances in protein separation and staining techniques have improved protein identification to include the least abundant proteins. The evolution in mass spectrometry has led to the identification of a large part of the proteins of interest rather than just describing changes in patterns of protein spots. Protein profiling techniques allow the rapid comparison of complex samples and the direct investigation of tissue specimens. In addition, proteomics has been complemented by the analysis of posttranslational modifications and techniques for the quantitative comparison of different proteomes. These methodologies have made the application of proteomics on the study of specific diseases or biological processes under clinically relevant conditions possible. The quantity of data that is acquired with these new techniques places new challenges on data processing and analysis. This article provides a brief review of the most promising proteomics methods and some of their applications to pulmonary research.

mass spectrometry; proteome; lung


PROTEOMICS IS THE INVESTIGATION of the protein content or the protein complement of the genome of a biological system, also termed the proteome (237, 255). The objective of proteome research is to identify and describe the complex responses of a biological system to different stimuli. A vast amount of information can be obtained from one set of experiments compared with the classic approach of observing concentration changes or modifications on the single protein level. The Nobel Prize for Chemistry in 2002 was shared among three scientists for the development of analytical methods for the study of biomolecules: Kurt Wüthrich for the nuclear magnetic resonance technique and John Fenn and Koichi Tanaka for development of the two ionization techniques that initiated the rapid evolution of biological mass spectrometry in the past decade, namely electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI).

In the first part of this review, the status of current proteomic techniques is outlined. The rapid evolution in mass spectrometry, which was initiated by the development of the ionization techniques MALDI and ESI, has led to significant improvements in the central step of a proteomics experiment, protein identification.

The subsequent application of these techniques to a large number of previously inaccessible categories of samples has in turn triggered progress in other crucial steps, namely protein separation techniques and the analysis of the resulting data. The classic proteomics approach of describing and comparing the protein content of a given sample has consequently been refined by the description of "posttranslational modifications" of the protein and widened by tools that allow quantitative comparison of two or more samples ("quantitative proteomics"). This article provides an outline of current techniques in both of these fields that will be used to investigate the lung proteome in the coming years.

The third part of this review focuses on the present status of the investigation of the lung proteome with specific examples from pulmonary studies that have evaluated bronchoalveolar lavage as well as other biological samples in a variety of acute and chronic lung diseases. Some future possibilities for lung research that may arise from the rapid progress occurring in proteome method development are also considered (3, 69, 103, 240).


    IMPLICATIONS OF THE HUMAN GENOME PROJECT
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
After the completion of the Human Genome Project, it has become evident that the complexity of organisms is only to a small part the result of direct gene expression from the genome (2, 76, 203), and it is clear that the simple concept of one gene-one protein is incorrect. One important reason for this is that the product of one gene can be transformed to a whole family of gene products (106, 118, 214), i.e., one gene can produce multiple mature mRNAs via alternative splicing and other mechanisms (195). Furthermore, the correlation between mRNA and protein concentrations has been demonstrated to be insufficient to predict protein expression levels from quantitative mRNA data, since protein levels are regulated by degradation as well (40, 90). Although the correlation between mRNA levels and protein abundance was very good for a limited number of highly abundant proteins, it was poor for proteins with lower expression levels (90). In this group, 30-fold differences in protein abundance were found for proteins with the same mRNA levels. Proteins from genes with very low expression levels could not be detected at all in this study with a current two-dimensional (2D) electrophoresis-mass spectrometry approach. Posttranslational modifications such as glycosylation, phosphorylation, and ubiquitination produce further variations by increasing the number of components from the standard 20 amino acid to more than 140 possible amino acid forms (125). These modifications undergo rapid changes and usually are not mutually exclusive (152).

Therefore, the study of the genome or even mRNA levels (the transcriptome) will reveal only a small spectrum of the response to a particular stimulus. Even from the diseases known to be based on specific genetic defects, only a very small number are likely to be monogenic, since cellular systems include complex interactions with a high level of redundancy (234). Conversely, the function of a large number of the protein products that are encoded by these genes is still unclear (25).

Direct investigation of the proteome provides a more complete representation of changes in the status of an organism. However, there exist several impediments to such an approach, the sheer complexity of the proteome being the most important one (Fig. 1). Diversity is another issue, since there are at least 250 different types of human cells, each of which contains at least 2,000–6,000 different primary proteins (33, 59), and posttranslational modifications will multiply this number (152, 165, 257, 258). It has been estimated that the different types of human cells may differ from each other in ~400 unique proteins (32). Another important factor is the dynamic range of concentrations of proteins, since one cell can contain between one and more than 100,000 copies of a single protein (32). Finally, the proteome of organisms is dynamic and changes with environment and with time (106).



View larger version (100K):
[in this window]
[in a new window]
 
Fig. 1. Graphical representation of the genome (left) and the proteome (right). Whereas there are ~26,000–31,000 protein-encoding genes (14), the total number of human proteins, including splice variants and essential posttranslational modifications, has been estimated to be close to one million (76, 254). The area of the circle that is within the reach of Leonardo da Vinci's Vitruvian Man corresponds to these images.

 
Many of the detection and recognition methods currently used in protein chemistry, such as antibody assays or enzyme activity measurements, have the capacity to detect only one protein at a time. Consequently, many investigations measure the response of one gene, protein, or pathway in the context of normal physiology or a pathological condition. For example, we have studied how aquaporin deletions influence fluid transport in the lung by studying aquaporin knockout mice under normal physiological conditions as well as during clinically relevant stresses, such as at the time of birth or during experimentally induced lung injury (225). A large part of our current understanding of biological function is based on this type of investigation. This approach will continue to be useful for a detailed understanding of living organisms. However, to study the interactions between the proteins identified with the same methods, a large number of consecutive experiments are necessary. This is not only time consuming, but the interpretation of results may be hampered by additional factors that are introduced by variabilities in experimental parameters, differences in cell material, and the time of measurement. Emerging proteomics methods have the potential to overcome many of these limitations.


    WHAT CAN BE MEASURED USING A PROTEOMICS APPROACH?
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
Proteomic investigation of a given cell or other biological system should ideally detect all proteins and their functional responses to a stimulus. Given this goal, it is not surprising that no approaches currently come close to achieving this (see below). However, despite the present limitations, a proteomics-based approach has the unique advantage to identify changes in protein patterns (clusters) between different states of the organism. Consequently, the screening for markers of disease has been one of the principal objectives in a large number of proteomics studies (18, 82, 83, 135, 138, 139, 173, 176, 228, 251, 252). This application uses a limited segment of the potential power of proteomics, which should be able to evaluate coherently the complex changes in the proteome (or a significant segment of it) in multifactorial diseases. This implies not only a gain in knowledge due to the massive increase in the data acquired from one set of experiments at one time point, but also provides additional information compared with conventional approaches by yielding insights into the complex interactions among different proteins and pathways (190). This type of discovery is difficult to accomplish with reductionist methods and should improve our understanding of complex pathologies, like sepsis or acute lung injury, which involve multiple and constantly interacting components of the immune system and signaling pathways (175).

There has been an expansion of proteomics into "functional proteomics," the correlation of changes in the proteome with different states of the organism. This field is currently expanding in several different dimensions. "Protein profiling techniques" take a global view at complex protein samples, such as plasma. Given the complexity of these samples, these techniques need to be streamlined to achieve high throughput. The resulting protein patterns have diagnostic value as biomarkers on their own and indicate directions for more specific investigations. The application of protein profiling to tissue samples provides a combination of spatial information and protein profiles. The current results clearly indicate that these techniques are a valuable complement to histology (38, 265). The continuing improvement in protein identification will provide further insights into pathological processes and will most likely be especially valuable in cancer research. The application of mass spectrometry technology to the evaluation of "protein modifications" further extends the scope of proteomic analysis in depth. The physiological responses of an organism are only to a small part represented by changes in protein concentrations; especially, rapid responses to stimuli are transmitted by the modification of existing proteins. In spite of this complexity, this emerging field has, therefore, a large potential for clinically relevant research. The development of quantitative proteomics has widened the applicability of these techniques beyond a purely descriptive study design. Novel techniques in this field, namely differential gel electrophoresis (DIGE) and isotope-coded affinity tagging (ICAT), allow the direct comparison of samples, e.g., of different disease states.


    CURRENT CHALLENGES
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
The main general issues that have impeded proteomics research in recent years include 1) difficulties in the detection of low-abundance proteins due to limitations in dynamic range, 2) identification of individual proteins within a complex biological sample, and 3) problems associated with the evaluation of all potentially useful information from the raw data. Typical samples in medical science are body fluids, such as plasma, urine, pleural fluid, bronchoalveolar lavage (BAL), pulmonary edema fluid, and cell lysates. These types of samples are complex mixtures of proteins with a dynamic range of protein concentrations of up to 10 orders of magnitude (2, 7, 214). The expression and modification changes in less abundant proteins ["low copy number proteins," 10–1,000 copies per cell (25)] may be the most interesting ones. Their visualization is frequently obscured by highly expressed proteins [housekeeping proteins, >10,000 copies per cell (25)].

For example, plasma and pulmonary edema fluid contain large amounts of albumin (30–50 mg/ml in plasma and 20–25 mg/ml in pulmonary edema fluid) but comparatively small quantities of cytokines such as TNF-{alpha} or IL-1{beta} (ng/ml to pg/ml range). Therefore, protein separation and purification techniques are key elements of proteome research that represent one of the major challenges (7, 15, 33).

Although the size of the proteome is unknown, the number of expressed proteins can be estimated from the open reading frames in a sequenced genome. It has been reported that 20% (1,484 proteins) from Saccharomyces cerevisiae (249) and >61% of the predicted proteome of Deinococcus radiodurans (145) could be identified by a current multidimensional chromatography-tandem mass spectrometric approach. These results indicate that identification of a significant part of the proteome of a cell is feasible.

Other common obstacles to proteomics are more dependent on the individual sample and the specific techniques. The validity of the results of a proteomic experiment is dependent on the initial sample, the purity of cell and protein isolation, and the subsequent sample fractionation steps. Salts, mucus, and other contaminants may require purification procedures that lead to loss of proteins of interest. The presence of proteases in samples can cause additional cleavages of the investigated proteins, complicating protein identification and quantitation. Ongoing cellular protein synthesis and posttranslational processing, by phospatases and kinases, for example, can influence the results as well.


    ANALYSIS METHODS
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
Initial approaches to investigate the proteome of cell lysates and body fluids were performed using 2D polyacrylamide gel electrophoresis (2D-PAGE) (34, 46, 61, 120). 2D-PAGE has been used in many studies to identify protein patterns in body fluids such as serum (7) or BAL fluid (55, 135, 138, 176). This technique is steadily improving and remains an essential part of many approaches to proteome analysis (201).

The rapid progress in mass spectrometry in the last decade has made it a key technique for the investigation of the proteome (2, 3, 29, 69, 85, 103, 150). Mass spectrometry can be used to identify proteins by providing the molecular mass to electric charge (m/z ratio) of molecular species in a sample. Due to the high accuracy of this method, which under some circumstances can detect peptides in the femtomole to attomole range with an accuracy of <10 parts per million (ppm) (45, 70), it is now possible to identify proteins by using search algorithms that interrogate public "protein databases," such as the nonredundant National Center for Biotechnology Information (NCBI) database, which can be accessed over the Internet.

Because the human genome is virtually known (243), every protein sequence can be predicted and included in these databases. Mass spectrometry is most often used as the identification technique after 2D-PAGE (46) or other separation techniques such as liquid chromatography (LC) (92, 93, 249).

Sample Preparation

Because many components of biological samples interfere with analysis, it is necessary to remove them before study. Insoluble substances can be removed by centrifugation. For 2D-PAGE and mass spectrometry, it is necessary to remove salts before analysis. This can be achieved by dialysis, size-exclusion filtering, protein precipitation, or reverse-phase chromatography (12, 54, 108). Frequently, abundant proteins such as albumin or immunoglobulins need to be removed first (7, 214). Complex samples need to be fractionated before analysis to obtain simpler subfractions and to decrease the dynamic range of components, if possible. For example, the dynamic range of concentrations in a plasma sample exceeds 10 orders of magnitude (7), whereas a current one-dimensional chromatography-mass spectrometry approach can only detect proteins in a dynamic range of approximately 4 orders of magnitude (7). Affinity purification is a powerful approach to reduce the complexity of a sample by specifically isolating individual proteins or "protein complexes" (15). These preparation steps are often more time consuming than the subsequent analysis steps and influence the sensitivity and discriminative power of mass spectrometry-based protein identification (108, 191).

Electrophoresis

Gel electrophoresis, especially 2D-PAGE (121, 178), has long been the major method for the investigation of the proteome. An overview on the most frequently used electrophoresis techniques is provided in Table 1. For visualization, proteins in the gel are stained using a variety of different methods. A synopsis of the most widely employed staining methods is given in Table 2. With the use of this method, gel maps of body fluids, such as human plasma (6, 149, 194) or BAL fluid (176, 252), have been published (see Fig. 2). The large number of spots in a 2D gel is partly due to posttranslational and proteolytic modifications of proteins; one protein may, therefore, be present in several locations in the gel (25). Although this phenomenon is potentially useful for the further analysis of these modifications, the increased number of spots for analysis can lead to additional effort, since >25% of the spots on one gel may be due to modified proteins (34) found elsewhere on the gel. The number of protein spots in complex samples makes computer-assisted image analysis necessary. Digital image analysis is also needed for quantitative information. There are several software suites for this purpose that are commercially available.


View this table:
[in this window]
[in a new window]
 
Table 1. Electrophoresis methods in proteomics

 

View this table:
[in this window]
[in a new window]
 
Table 2. Different staining methods for gel electrophoresis

 


View larger version (76K):
[in this window]
[in a new window]
 
Fig. 2. Silver-stained two-dimensional (2D)-PAGE image of human bronchoalveolar lavage (BAL) fluid samples. Isoelectric focusing was performed using pH 3–10 nonlinear immobilized pH gradient (IPG) strips; after equilibration, the second dimension was run on an Excel Gel XL 12–14%. A: gel pattern of a healthy subject. B: gel pattern of a patient with idiopathic pulmonary fibrosis (IPF). C: gel pattern of a patient with sarcoidosis. Gel spots were identified by matching with the human plasma reference maps and other published gel maps by NH2-terminal sequencing or by mass spectrometry. Bars and arrows indicate plasma proteins increased in the BAL fluid of patients with IPF (B) and sarcoidosis (C). Surfactant protein A (SP-A) is not present in the gel of the patient with IPF (B). Several small acidic proteins are upregulated in IPF and the matching spots in healthy controls are labeled by circles (108: cathepsin D, heavy chain; 172: epidermal fatty acid-binding protein (FABP-E); 174: cathepsin D, light chain; 179: intestinal trefoil factor; 183: FABP-E; 194: cathepsin D, light chain; 201, 202, and 206: calgranulin A; 207: saposin, D chain; 210: ubiquitin-like protein; 212: calcyclin; 216: calvasculin). MW, molecular weight. [From Noel-Georis et al. (176). Reprinted with permission from Elsevier.]

 
In modern proteomics, 2D-PAGE is most often used as a step before other protein detection techniques, especially mass spectrometry. However, although it has been shown that mass spectrometry can detect serially diluted, gel-embedded proteins down to the very low femtomole range (49, 70), generally 5–50 ng (corresponding to 100–1,000 fmol for a 50-kDa protein, an amount visible by silver staining) are considered necessary for successful mass spectrometry identification of proteins. Important reasons for this problem are the dynamic range of the current staining procedures (Table 2) and poor recovery of the peptides from the gels. It has also been shown that in 2D-PAGE, several classes of proteins are systematically underrepresented (Table 1); this limitation is relevant for many of the potential proteins of interest in pulmonary research (208). These shortcomings are constantly motivating efforts to improve 2D-PAGE and to find alternative methods to supplement or replace it (69).

Chromatography

Chromatography, especially LC, can be carried out as a purification step before or after 2D-PAGE (12, 163, 194). The progress in separation science has made this method a competitive alternative to electrophoresis. LC-LC-MS-MS (tandem mass spectrometry)-based techniques such as multidimensional protein identification technology (MudPIT) may have advantages over gel-based techniques in speed, sensitivity, reproducibility, and applicability to different samples and conditions (84, 144, 248, 249, 259, 260). The purification process of all LC techniques can be automated to a large extent (107, 137). The main shortcoming of this technique is the lack of quantitative information. The development of protein labeling techniques such as ICAT can overcome this disadvantage (see below).

Another interesting set of approaches to visualize changes in the proteome content of a sample are the protein profiling techniques (37, 164) (see below).


    MASS SPECTROMETRY
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
Traditionally, protein patterns in 2D-PAGE were identified by matching with a master 2D-PAGE pattern (e.g., SWISS-2DPAGE), with reference proteins (139, 141, 176, 252) or with Western immunoblots (83). Important progress in the identification of gel spots was made by the development of automated NH2-terminal (Edman) sequencing used in a large number of studies (4, 141, 251, 256). Mass spectrometry has rapidly replaced Edman sequencing for protein identification due to faster analysis times and much higher sensitivity (45, 46, 150). Mass spectrometry provides highly accurate measurements of the molecular weight and charge of the proteins or peptides in a sample. With the use of enzymatic digestion and peptide mass fingerprinting (see below), proteins can be identified even if they are truncated or posttranslationally modified. By adding a second mass analyzer (tandem mass spectrometry or MS-MS), the amino acid sequence of peptides can also be determined directly due to the fact that peptides fragment in a predictable fashion (22). After acquisition, the data are interrogated against protein sequence databases in an automated fashion (64, 217, 267) or interpreted manually (21, 46, 161).

Types of Mass Spectrometers

An overview of mass spectrometers currently being used for protein identification is provided in Table 3. The relatively soft ionization techniques of MALDI (117) and ESI (63) have made it possible to generate ions from large, nonvolatile analytes such as proteins without significant fragmentation. Both methods can be used to analyze proteins ≥100 kDa (2, 29). Their introduction in the late 1980s revolutionized the applicability of mass spectrometry to biomolecules and initiated an era of rapid progress that persists today (3).


View this table:
[in this window]
[in a new window]
 
Table 3. Overview of biological ionization techniques in mass spectrometry

 
There are several reasons for the popularity of MALDI mass spectrometers (Fig. 3) since their introduction in 1988 (117), which are summarized in Table 3. Recently introduced MALDI instruments include the MALDI-Qq-TOF (Q stands for quadrupole, TOF is time-of-flight mass analyzer) (218) and the MALDI-TOF-TOF (162). Both of these instruments are capable of analyzing the sequence of peptides by using two mass analyzers. Between the two TOF mass analyzers is a collision cell; peptide ions selected from the first mass analyzer are subject to collision with gas molecules resulting in vibronical activation, which induces dissociation processes. The second mass analyzer is used to measure the m/z ratio of the resulting fragment ions.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 3. Schematic representation of a matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) mass spectrometer. The sample is cocrystallized with the matrix on a metal slide (target) that is positioned in front of the ion source. A laser pulse irradiates each spot, causing a rapid excitation of the matrix and the ejection of matrix and analyte ions into the gas phase. The ions are accelerated by an electric field that directs them toward a mass detection unit. By reflection in an ion mirror, the ions are corrected for initial energy differences. The detector consists of an electron multiplier. Because all ions have the same kinetic energy, the travel time of the ions in the TOF analyzer is a measure for the mass-to-charge (m/z) ratio of the ions (i.e., lighter mass peptides/proteins will reach the detector earlier than heavier ones). By addition of a second TOF region and fragmentation of the peptide ions with collision gas (collision-induced dissociation), the amino acid sequence of the peptide ions can be delineated (tandem mass spectrometry or MS-MS spectrum). Uacc, acceleration voltage; Uvar, voltage applied to delayed extraction grid; URef, ion mirror and reflection voltage. [From Mann et al. (150). With permission, from the Annual Review of Biochemistry, Volume 70 © 2001 by Annual Reviews www.annualreviews.org.]

 
After its introduction (63), ESI (Fig. 4) soon established itself as an alternative to MALDI. To improve accuracy and deviate scanning of the second mass analysis step, a TOF analyzer has recently been used instead of the third quadrupole (Qq-TOF, Fig. 4) (150, 266). Other promising techniques are the protein profiling methods. Protein profiling is the rapid screening of samples by mass spectrometry with limited or no sample preparation. The resulting profile of m/z ratio peaks of different samples (which can be body fluids, cell lysates, or even tissue samples) can then be compared, and differences in the relative abundance of proteins can be identified. The samples are then further purified by chromatography and identified by techniques such as peptide fingerprinting or MS-MS. These techniques provide a complementary method to 2D-PAGE for protein visualization. For protein profiling, surface-enhanced laser desorption-ionization (SELDI) and imaging mass spectrometry (IMS) (30, 3739) are currently being evaluated (Table 3).



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 4. Mechanism of an electrospray mass spectrometer with adjacent quadrupole mass analyzer and TOF unit for protein sequencing (ESI-Qq-TOF). In ESI, proteins are solubilized and ionized at atmospheric pressure by pumping the solute through a capillary at high voltage. As the micromolar-sized droplets enter the mass spectrometer, solvent is removed by heat or energetic collisions with a gas and impart charge onto analyte molecules. After the ions have passed through the curtain gas and the focusing ring into the vacuum chamber, they are focused on the first quadrupole (q0) oscillating at radio frequency. The ions are then analyzed according to their molecular m/z ratio in an electric field at the quadrupole Q1 (MS). For sequence analysis (MS-MS), the ions are dissociated at the third quadrupole (q2). Subsequently, the ions enter the analyzer through a focusing grid and are corrected for initial energy differences in a reflector (ion mirror). MS-MS detection is performed by an electron multiplier at the end of the field-free drift region of the TOF analyzer. [From Mann et al. (150). With permission, from the Annual Review of Biochemistry, Volume 70 © 2001 by Annual Reviews www.annualreviews.org.]

 
Protein Identification By Mass Spectrometry

Mass measurements of the intact proteins provide a mass balance and rapid and valuable information on the protein profile of a sample. It is, however, not practical to attempt to identify a protein solely on the basis of its m/z ratio. This is mainly due to splice and sequence variation from database entries combined with a heterogeneous set of posttranslational modifications, which lead to variable differences in the molecular weight of a protein compared with the theoretical mass derived from the database. Therefore, additional strategies have been developed for protein identification, and these can be used separately or in combination.

"Peptide mass fingerprinting" is based on mass measurements of peptide fragments derived from a single protein. Before mass spectrometry, proteins are cleaved into peptides at specific, reproducible points in their amino acid sequence using chemical agents or proteases. A protein covalent modification will only be reflected in one or a few of the peptide mass values, whereas the rest will remain unchanged. Because of its highly reproducible cleavage on the COOH-terminal side of arginine and lysine residues, trypsin is the proteolytic enzyme used most often. With the use of this specificity, the anticipated mass values of all peptides in virtual digests of all proteins in the database are calculated. The protein identity is determined by comparing the measured peptide mass values with those calculated (45, 98, 110, 151, 208, 268). The reliability of peptide mass fingerprinting is dependent on: 1) the mass accuracy of the peptide measurements (45); 2) the number of matched vs. unmatched peaks in the spectrum; 3) the number of peptides that could be matched to a single protein; and 4) the number of proteins that are present in the digested sample, since random matches can occur at a level of confidence similar to real matches in complex mixtures. The decreased reliability of results using peptide fingerprinting with complex mixtures of proteins has been exacerbated by the massive increase in the size of the databases. Other potentially critical factors are the increased rate of false-positive matches and bias toward high-molecular-weight proteins, which yield a larger number of peptides and are, therefore, more likely to be matched by this technique than smaller proteins. Scoring systems included in the analysis software packages (see below) aim at compensating for these potential problems.

With the use of two sequential mass analyzers (tandem mass spectrometry or MS-MS), primary structural analysis of the amino acid sequence can be obtained (3, 22, 150, 161) by fragmenting one or more of the peptides (Fig. 5). Peptide fragmentation is achieved by preferential cleavage of the backbone bond of polypeptides upon collisional activation with a gas [collision-induced dissociation (CID)] (21, 161). Tandem mass spectrometry can be carried out using both ESI (e.g., ESI-triple-quadrupole or ESI-Qq-TOF) (42) and MALDI ionization (MALDI-TOF-TOF) (102, 162). Often, fragmentation spectra of only a few peptides are sufficient for unambiguous protein identification (45, 150).

Although sequence information can also be obtained with relatively inexpensive instruments using the metastable decay of some ions after desorption by MALDI (postsource decay), this time-consuming technique is rapidly being replaced by the faster and more sensitive tandem time-of-flight mass spectrometry (102, 150, 162, 266).

Protein Profiling Techniques

Protein profiling is the rapid screening of samples by mass spectrometry with limited or no sample preparation. The resulting profile of m/z ratio peaks of different samples (that can be body fluids, cell lysates, or even tissue samples) can then be compared, and differences in the relative abundance of proteins can be identified. The samples can then be further purified by chromatography and identified by techniques such as peptide fingerprinting or MS-MS. These techniques provide a complementary method to 2D-PAGE for protein visualization.

In SELDI (Table 3), proteins are retained on a protein chip array composed of various chromatographic, immunologic, or enzymatic surfaces and subsequently detected directly by time-of-flight mass spectrometry. In contrast to the metal sample target employed in MALDI mass spectrometry, in SELDI the probe surfaces play an active role in the extraction, structural modification, and presentation of the protein of interest from the sample. There are several different probe surfaces available, thus SELDI can be modified for use with proteins of different properties (164). Of the different SELDI applications in development today, surface-enhanced affinity capture is considered the most promising, with a reported 100-fold dynamic range (164). The special advantage of this technique is the possibility of high-throughput analysis. Protein chips may be useful in the discovery of new drug targets (271) and biomarkers (109, 164, 189, 193).

IMS utilizes MALDI-MS for the direct analysis of tissue samples (37) (Table 3). This is carried out by coating a slice of frozen tissue with crystallization matrix or by blotting the tissue on a target coated with C18 beads (30, 3739). Mass spectrometry generates ion images of samples providing the capability of mapping specific molecules to 2D coordinates on the original sample, thus giving spatial information on peptide/protein distributions (Fig. 5). (Fig. 6). This technique has been successfully applied to brain tumors (233) and non-small cell lung cancer (265); the latter study is described in more detail later in this article. This methodology will certainly continue to be increasingly utilized.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 5. A: MALDI-TOF mass spectrum of the unseparated tryptic digest of a spot in a 2D-PAGE gel. The spot was excised manually, proteolytically digested with trypsin, extracted, and loaded onto a MALDI 96-well target plate. The typical trypsin autolysis peak at m:z 842.51 can be seen clearly. The target was loaded into a MALDI-TOF-TOF mass spectrometer, and the spectra were acquired automatically. There are only a few components detected in this MS spectrum. B: ion fragmentation (MS-MS) spectrum of peak 1913.06. The resulting ions can be separated into y and b ions according to the retention of the charge on the COOH-terminal or the NH2-terminal fragment (111, 206). The data were automatically interrogated against the NCBI database using MS-TAG from the ProteinProspector suite of programs (45, 102). The obtained sequence, which could be attributed to a peptide fragment of serum amyloid A, is given. Amino acids are labeled according to the standard one-letter nomenclature.

 


View larger version (60K):
[in this window]
[in a new window]
 
Fig. 6. Protein profile obtained directly from transversal rat-brain section mounted on a MALDI target plate (B) and coated with matrix. The general outlines of the section as well as several features visible in the section have been delineated. The section was scanned by acquiring spectra at 74 x 75 points with a resolution of 180 µm. The spectra produced by 15 laser shots were averaged using an automated computer algorithm. An initial survey scan with data acquisition randomly across the section generated an average protein profile (A). More than 200 individual mass peaks were detected in a m/z ratio range up to 40,000. C–G give ion density maps obtained for different protein signals. Some protein peaks were very specific for a given brain region. For example, the density maps of the proteins detected at m/z 5,632 and m/z 18,393 were reported to be almost "negatives" of each other. [From Chaurand et al. (38). Reprinted with permission from Elsevier.]

 
Analysis of Protein Modifications

Posttranslational modifications play a crucial role in cell signaling and protein function (77, 152, 190). More than 200 different protein modifications have been described (125, 257, 258). Important posttranslational modifications include phosphorylation, acetylation, glycosylation, ubiquitination, and nitration (125, 152, 242). The analysis of posttranslational modifications on a proteome scale is still considered an analytical challenge (66, 69, 152, 159, 177, 229, 274); reasons for this are the fragility of the chemical bonds of many protein modifications upon sequencing by CID, signal suppression of negatively charged (phosphate-, sulfate-containing) molecules in the commonly used positive detection mode, and difficulty of obtaining full-sequence coverage (123). Moreover, most modifications are substoichiometric; therefore, modified peptides are frequently present at much lower levels than unmodified peptides (124, 269).

Phosphorylation is an important regulation mechanism of protein activity and signaling networks. It is crucial in protein kinase activation, cell-cycle progression, cellular differentiation, transformation, response, and adaptation of peptide hormones (47, 77, 154, 165). Approximately 30% of all mammalian proteins are phosphorylated at any given time (153). The more than 500 protein kinases and ~100 phosphatases have relatively wide substrate specificities and work in different combinations to achieve a variety of biological responses, which can make analysis of these complex networks challenging (47, 153, 154). Phosphopeptides are generally difficult to analyze by mass spectrometry. One reason for this is their negative charge, which reduces ion intensity (electrospray is generally performed in the positive mode). Other impediments include their presence at substoichiometric levels, their hydrophilicity, which interferes with reverse-phase chromatography, and other factors (8, 124, 153, 221, 269). Currently, phosphorylation is evaluated most often by labeling a previously defined protein with 32P-inorganic phosphate followed by 2D-PAGE and/or reverse-phase chromatography, which is a relatively complex, time-consuming procedure (124, 152, 153, 269). For example, in a recent comprehensive study (182), the regulatory mechanisms controlling the activity of 3-phosphoinositide-dependent protein kinase-1 (PDK1), which plays a central role in signal transduction pathways that activate phosphoinositide 3-kinase, were evaluated. With the use of site-directed mutants, phosphorylation on Tyr373/Tyr376 was shown to be important for PDK1 activity, whereas phosphorylation on Tyr9 had no effect. Other novel approaches to investigate phosphorylation include the 14N:15N labeling of immunoprecipitated phosphorylated peptides (79, 177), the phosphoprotein-isotope-coded affinity tag method (79, 80), the use of immobilized metal ion affinity chromatography to affinity capture phosphopeptides (95, 196, 238, 261), and the chemical transformation of phosphoserine and phosphothreonine residues into lysine analogs that are then cleaved with a lysine-specific protease to map sites of phosphorylation (123).

In response to various inflammatory stimuli, lung endothelial cells, alveolar and airway epithelial cells, and activated alveolar macrophages produce nitric oxide and superoxide, products that may react to form peroxynitrite. Peroxynitrite can nitrate and oxidize amino acids in various lung proteins, such as surfactant protein A (SP-A), and inhibit their function. It has been shown that the nitration and oxidation of a variety of alveolar proteins is associated with diminished function in vitro; in addition, both modifications have been identified in proteins sampled from patients with acute lung injury using immunoassays (132, 275). The selective nitration of tyrosine residues in different cytoplasmatic high-molecular-weight proteins and histone proteins in murine tumor cells by neutrophils has been demonstrated by Western blotting and mass spectrometry in vivo and in vitro (94). The authors found that histone nitration was relatively stable, making it a potentially useful marker for extended exposure of cells or tissues to nitric oxide-derived reactive species.

Novel methodologies for the evaluation of other protein modifications are available as well. N- and O-linked glycosylation occurs throughout the entire phylogenetic spectrum and plays key roles in reactions in the endoplasmic reticulum, Golgi apparatus, cytosol, and nucleus (53, 227). Glycosylation is present especially on proteins destined for extracellular environments (207); consequently, many therapeutic targets and clinical biomarkers are glycoproteins. For example, CFTR is an integral membrane glycoprotein that normally functions as a chloride channel in epithelial cells (210). The most common mutation in cystic fibrosis, {Delta}F508, results in mislocalization and altered glycosylation of CFTR. Moreover, altered fucosylation and sialylation of both membrane and secreted glycoproteins occur in cystic fibrosis, and the two major bacterial pathogens causing chronic infection in the cystic fibrosis lung, Pseudomonas aeruginosa and Haemophilus influenzae, have binding proteins that recognize these altered sites. For the investigation of protein glycosylation, mass spectrometry has been widely used (28, 53, 129) in the last years, especially the Qq-TOF instrument (Fig. 4) (35, 53, 232). In a recent study (270), glycoproteins were conjugated to a solid support by hydrazide chemistry, and glycopeptides were labeled with stable isotopes. Subsequently, the formerly N-linked glycosylated peptides were specifically released using peptide-N-glycosidase F and identified and quantified by MS-MS. The methodology has been used to investigate plasma membrane and serum proteins.

A rapidly evolving part of functional proteomics is the investigation of specific protein complexes (67, 68, 264). Protein complexes can be isolated from complex mixtures by affinity extraction techniques such as direct antibody coprecipitation (5) or indirect tagging of the bait protein with an epitope that is then recognized by an antibody using tandem affinity purification tags. (72, 205). Chemical cross-linking can be used to prevent the loss of components from the protein complex during precipitation (213). Affinity purification techniques for the analysis of protein complexes have been reviewed (15, 264). The resulting isolated complexes are subsequently analyzed by mass spectrometry. A more general approach is the comprehensive identification of proteins in macromolecular complexes after separation by liquid chromatography (144).

Quantitative Proteomics

With the use of tandem mass spectrometry, the sequence of one peptide can be sufficient to identify an entire protein. This simplification of protein identification has triggered the development of methods that aim at increasing throughput by performing protein separation and identification in one suite of experiments (87). Because cutting out individual gel spots from a 2D gel is a very time-consuming procedure, many recently introduced approaches use chromatography for sample separation. These techniques either couple LC directly to ESI-MS-MS or robotically spot the chromatographically separated fractions to a MALDI target. However, 2D-PAGE provides quantitative information that has only been obtained to a very limited extent from mass spectrometry-based methods. The lack of quantitative results is obviously a serious shortcoming that would limit a LC-mass spectrometry approach to a purely descriptive study design. The use of isotope ratio mass spectrometry (IRMS) is one method being used to close this gap.

Currently, the most advanced IRMS technique is the ICAT technology (89). In an ICAT experiment, the reduced cysteine residues of proteins are labeled differentially. The two different tags consist of an iodoacetamide group that reacts with the free cysteine, a biotin tag that can be used for affinity purification of labeled peptides, and a linker region containing the different isotopic labels. The light version and the heavy version differ in eight protons within the linker region of the ICAT reagent that have been substituted with eight deuterons in the heavy version. The two samples can be discriminated by mass spectrometry according to this mass difference of 8.0 Da (89). After being labeled, the two samples are pooled and digested with trypsin. The tagged peptides are then extracted with an avidin-containing column. Because only cysteine-containing peptides are evaluated, the complexity of the sample is reduced by more than one order of magnitude (89). The frequency of cysteine residues in proteins varies slightly from species to species and averages ~1% (27). In yeast, ~9% of all theoretically possible peptides after tryptic digestion contain cysteine (89).

A disadvantage of ICAT is that no absolute concentrations of proteins are measured and that comparisons of the expression of two different proteins are not possible. Another shortcoming is the low-sequence coverage, since only cysteine-containing peptides are labeled. The applicability of ICAT to the analysis of posttranslational modifications or protein isoforms is therefore limited (186). This restriction of ICAT to cysteine-containing peptides can be partially overcome by separate analysis of the unlabeled peptides that are not captured in the affinity chromatography step. However, quantitative information will not be available in this case unless a corresponding ICAT-labeled peptide is identified for the same protein (144). Another potential problem is that the differentially labeled peptides can separate from each other during the chromatography process because deuterium affects the retention time in reverse-phase chromatography. Consequently, they may be ionized at separate time points and eventually in different fractions, which can lead to different quantitation intensities (272). In addition, the ICAT tag is relatively large, which may interfere with the detection of large peptides (186). Furthermore, the dynamic range for the quantification of different expression levels of one protein is relatively small (~10-fold) (9, 89), which is inferior compared with fluorescent dyes (186). Some of these limitations can be overcome by using a newly introduced cleavable ICAT reagent. The new reagent utilizes 13C with a mass difference of 9 Da between the heavy and the light marker. The advantages are a smaller tag (227 Da compared with the 442 Da of the original ICAT), which interferes less with the analysis of larger peptides, a mass difference that can easily discriminate a peptide with two ICAT labels (2 cysteine residues) from the common oxidation of methionine, and a reduction of CID fragmentation byproducts, which improves the quality of the resulting mass spectra (93).

The ICAT technique has successfully been employed for the labeling of membrane protein extracts in prostate and breast tumor cell lines (10). Another recent study (220) compared differences in the expression of protein patterns between rat cells that did or did not contain the myc oncogene. These authors reported expression differences among functionally related proteins in myc-positive cells, such as induction of protein synthesis pathways, upregulation of anabolic enzymes, and reduction of proteases, and changes in the levels of adhesion molecules, of actin network proteins, and Rho pathway proteins that correlated with the known qualities of myc-positive cells. Another interesting application of ICAT was a comparison of the microsomal fraction of cells from the human myeloid cell line HL-60 with and without the induction of differentiation by phorbol 12-myristate 13-acetate; the authors identified and quantified 491 proteins. One example of quantitative analysis of alveolar type II cells using the cleavable ICAT technique from our research is given in Fig. 7 (93). The method is an active area of research and development (86, 92, 223, 224).



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 7. Comparison of 2 different samples with the isotope-coded affinity tagging (ICAT) method. These mass spectra from the pooled membrane fractions of pulmonary alveolar type II cells from rats with and without ventilator-induced lung injury (low- and high-molecular-weight isotope envelope, m/z at 755.34 and 759.85, respectively) were identified as lysozyme 1 precursor by sequence analysis using tandem mass spectrometry as shown in Fig. 5B. The mass difference between the 2 differentially labeled forms of the peptide is the result of labeling with the cleavable ICAT reagent (see text). Because the ion is double charged, the actual difference between the low and the high peaks (circled numbers) is only 4.5 m/z units (amu). The 2 monoisotopic peaks have different intensities, which is correlated to the expression of the corresponding protein in the 2 differentially ventilated animals. The difference can be quantified by calculating the ratio of the areas under the 2 peaks.

 
A comparison of the coverage of the known 80 ribosomal proteins from the 80S mammalian ribosome by ICAT and 2D-PAGE showed that 35 could be found by ICAT (92, 186), whereas a highly elaborate 2D-PAGE system specifically tailored to the detection of ribosomal proteins was able to detect 55 proteins. A standard 2D-PAGE approach found only two ribosomal proteins (71, 186). ICAT and 2D-PAGE are different methodologies that have different biases and that frequently detect different segments of the proteome of the same sample (186). The only study that has used a combination of the two methods made use of the observation that proteins labeled with light and heavy forms of the ICAT reagent comigrate during 2D gel electrophoresis. Therefore, two or more labeled samples can be analyzed concurrently in the same gel (223), which may be useful for the quantitative and qualitative analysis of differentially expressed or posttranslationally modified proteins. For protein quantification, a larger number of gels might be necessary compared with a 2D-PAGE approach with DIGE (75). Protein modifications can lead to the presence of one protein in several different spots on the gel; to compare samples, it is, therefore, either necessary to run three gels (1 with each sample separate and 1 with the samples combined) or to quantify all spots on the gel containing the combined samples using ICAT (223).


    DATA ANALYSIS AND INTERPRETATION
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
Proteomics and Bioinformatics

Given the complexity of the proteome, an adequate proteomics approach requires the identification of thousands rather than several or a few proteins at a time (76). Therefore, bioinformatics plays a key role in proteomic studies and is often the rate-limiting step (183, 246). The data obtained from mass spectrometry must be interpreted by interrogation against protein databases, the quality of which is crucial for protein identification. Both peptide masses and peptide sequence information can be used for protein identification. There are several protein databases readily available over the Internet that differ in the frequency with which they are updated and the amount of redundancy. Currently, the most complete and most frequently updated database is provided by NCBI, which is a combination of several databases, including Swiss-Prot and Owl. Consequently, this database also contains the most redundancy of protein entries.

Several software packages are available for the analysis of mass spectrometry data. They interrogate the obtained peptide or sequence data against the protein databases and rank the results according to a scoring system [often-used scoring algorithm, Molecular Weight Search (MOWSE) (181)]. Software packages include Mascot from Matrix Science (London, UK; http://www.matrixscience.com) (192), ProFound from Rockefeller University (http://prowl.rockefeller.edu) (273), ProteinProspector, a software suite developed at the University of California, San Francisco (http://prospector.ucsf.edu) (45), the SEQUEST algorithm developed at the University of Washington (http://thompson.mbt.Washington.edu/sequest) (60), and others (2). Each of these programs provides additional utilities; for example, ProteinProspector includes additional tools for the interpretation of mass spectrometry, MS-MS, and ICAT data (at present not included in the public Internet version) as well as a batch mode for repetitive tasks and other analysis tools.

Another bioinformatics challenge is the analysis and description of the large amount of information into a comprehensive model. This includes the development of methods for data comparison between different research groups (183) and the integration of gene ontologies (10).


    INVESTIGATING THE LUNG PROTEOME
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
Proteomics research often focuses on the investigation of either body fluids or specific cell types. Because the lung is the site of several different biological processes, the interpretation of proteome experimental results must take into account potential contamination from pathogens as well as the contributions of the different cell types in the lung.

During the development of proteomics over the last two decades, there have been numerous attempts to apply proteomic methodologies to pulmonary medicine. These shall be briefly reviewed in this section.


    EXPERIMENTAL DESIGNS
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
Classic, reductionist studies, e.g., an ELISA or a Western blot analysis, will most often provide relatively simple answers to the initial scientific question, e.g., the presence of a specific protein or a concentration change. Moreover, since the researcher has to decide beforehand on which antibodies to use, there is a need for a specific hypothesis of the potential reactions. On the other hand, the investigator will most likely only find answers to questions conceived of beforehand. This is not the case in proteomics experiments. These studies are likely to provide answers even if these questions have not been thought of in the initial scientific question or hypothesis. Thus proteomics experiments have the advantage that the results are less biased by the theories or beliefs of the investigator and only limited by the sensitivity of the method. For this reason, proteomics results have a high potential to give rise to new discoveries and generate new hypotheses. Moreover, proteomics experiments are less likely then reductionist methods to mask eventual weaknesses in the initial experimental design under the cover of an apparently simple, clear-cut result. To avoid bias, the number of parameters should, therefore, be reduced as far as possible and the experimental approach should contain a well-defined scientific question. Ideally, the controls should differ from the study group in only one parameter. Another issue in the design of a proteomics study is the choice of the proper sample. The sensitivity of all current proteomics methods is one to two orders of magnitude less than the sensitivity of a Western blot analysis, and, due to the overall approach, there is much less possibility of tailoring the experimental setting to a specific protein. To avoid masking of the proteins of interest by other proteins of higher abundance, a sample with as little complexity as possible should be chosen, special care should be taken to avoid contamination during sample preparation, and appropriate protein removal and extraction methods should be considered. Samples for proteomics experiments should be easy to standardize, and the concentration of salts and other contaminants should be as low as possible, since concentration and purification steps further downstream will always result in protein loss. In the following paragraphs, we will review several different approaches to the lung proteome.


    THE PROTEOME OF BAL FLUID
 TOP
 ABSTRACT
 IMPLICATIONS OF THE HUMAN...
 WHAT CAN BE MEASURED...
 CURRENT CHALLENGES
 ANALYSIS METHODS
 MASS SPECTROMETRY
 DATA ANALYSIS AND INTERPRETATION
 INVESTIGATING THE LUNG PROTEOME
 EXPERIMENTAL DESIGNS
 THE PROTEOME OF BAL...
 ALTERNATIVES TO BAL
 FUTURE DIRECTIONS
 REFERENCES
 
Evaluation of BAL fluid has been useful in diagnosis and research of several inflammatory lung diseases, including emphysema, pulmonary fibrosis, cystic fibrosis, pulmonary transplantation, and acute lung injury. Early proteome investigations of BAL fluid done to investigate alveolar proteinosis resulted in a 2D-PAGE database of normal BAL fluid published in 1979 (18). In this study, as well as a subsequent study of BAL fluid from smokers and nonsmokers (17, 18), by pattern matching, most of the proteins found in BAL fluid could be identified as serum proteins. The authors found 23 serum derived-proteins, which accounted for 97% of the protein content of normal BAL fluid. The study identified significant differences in the BAL proteome of smokers, who had increased levels of IgG, C4, and C3 and decreased {alpha}2-thioglycoprotein, {alpha}1-acid glycoprotein, and Gc-globulin. In 1990, Lenz and colleagues (136) published a method for 2D-PAGE of BAL fluid from dogs and then compared protein patterns in BAL fluid proteins from patients with idiopathic pulmonary fibrosis, sarcoidosis, and asbestosis with normal controls (135). In idiopathic pulmonary fibrosis, the spot intensity of one surfactant-associated protein, SP-A, was decreased, whereas in sarcoidosis, the immunoglobulins (IgG, IgA) were increased. Another group of protein spots with a molecular weight of 55 kDa and one spot with a molecular weight of 12 kDa were identified. Compared with normal samples, the number and intensity of low-molecular-weight proteins were significantly increased in patients with asbestosis and, in some cases, in patients with idiopathic pulmonary fibrosis and with sarcoidosis.

At the time of this early proteomics research, many of the characterized spots could not be identified. Although the results of these studies provided the first information for a basic understanding of the protein composition of BAL fluid, the value of these results for clinical medicine was limited. Since then, gradual progress in staining and imaging techniques and improvements in standardization have made it possible to identify the most abundant proteins and refine the information on proteomic changes in different disease states. In 1995, Lindahl and coworkers (142) evaluated the BAL fluid proteome in patients after occupational exposure to irritating chemicals. They defined >1,000 protein spots. Plasma proteins were identified by pattern matching. After occupational exposure, 14 protein spots were increased, and one spot decreased by a factor of more than 3 compared with the levels before exposure and in healthy individuals. Subsequently, the same group found higher levels of basic proteins in smokers than in nonsmokers, whereas subjects exposed to asbestos had increased amounts of several high-molecular-weight and basic proteins (138). The results of protein identification showed lower levels of albumin and higher levels of immunoglobulins in smokers than in nonsmokers, whereas the levels of transferrin were higher in asbestos-exposed subjects. Further progress in the proteomic analysis of BAL fluid was boosted by the development of the SWISS-2D-PAGE database containing compiled maps of human BAL fluid (139, 251, 252). The current master gel of BAL proteins encompasses >1,200 spots visualized by silver staining (Fig. 2) (176). Information is available on changes in 2D-PAGE protein patterns of BAL for smoking (17, 135, 138, 139, 141, 143, 176, 252), sarcoidosis (135, 138, 139, 176, 251, 252), idiopathic pulmonary fibrosis (135, 138, 139, 176, 251, 252), lupus erythematosis (251), Wegener's granulomatosis (251), hypersensitivity pneumonitis (135, 138, 139, 176, 252), lipoid pneumonia (251), chronic eosinophilic pneumonia (251), alveolar proteinosis (18), bacterial pneumonia (251), other infections, malignancies and immunosuppression (82, 173), cystic fibrosis before and after {alpha}1-antiprotease treatment (83), and asbestosis (251).

The application of narrow-range immobilized pH gradient (IPG) strips can further increase the resolution of 2D-PAGE (208). Interestingly, the improvement in protein spot detection has been shown to be more significant for the protein spots present exclusively in BAL (55%) than for the spots present in both BAL and serum. This finding suggests that many of the BAL fluid-specific proteins, which are likely to be of pulmonary origin, are low-abundance proteins.

Improvements in protein identification increased the clinical relevance of 2D-PAGE studies. Three years after their initial stu