
Nuclear Membrane Proteins with Potential Disease Links Found
by Subtractive Proteomics
Eric C. Schirmer, Laurence Florens,*
Tinglu Guan, John R. Yates, III,
Larry Gerace
To comprehensively identify integral membrane proteins of the
nuclear envelope (NE), we prepared separately NEs and organelles
known to cofractionate with them from liver. Proteins detected
by multidimensional protein identification technology in the
cofractionating organelles were subtracted from the NE data
set. In addition to all 13 known NE integral proteins, 67
uncharacterized open reading frames with predicted membrane-spanning
regions were identified. All of the eight proteins tested
targeted to the NE, indicating that there are substantially
more integral proteins of the NE than previously thought.
Furthermore, 23 of these mapped within chromosome regions
linked to a variety of dystrophies.
Department of Cell Biology, Scripps Research Institute, La Jolla,
CA 92037, USA.
* Present address: Stowers Institute
for Medical Research, Kansas City, MO 64110, USA.

To whom correspondence should be addressed. E-mail: lgerace@scripps.edu
(L.G.); jyates@scripps.edu
(J.R.Y.)
Many diseases have been linked to the nuclear envelope (NE),
the membrane structure that forms the boundary of the nuclear
compartment (1, 2).
The NE contains three distinct functional domains: the
outer membrane, a specialized region of the endoplasmic reticulum
(ER) that shares properties with rough and smooth ER; the
inner membrane, which is lined by the nuclear lamina, a
polymer of intermediate filament-type lamin proteins associated
with a number of integral membrane proteins; and the nuclear
pore complexes (NPCs), which regulate nucleo-cytoplasmic transport
of proteins and RNAs. Two integral membrane proteins are localized
to the NPC in mammals (3), but the
number specific to the inner nuclear membrane is unknown:
It includes at least 11 proteins and their splice variants
(1). No proteins specific to the outer
membrane have yet been described.
To identify integral proteins of the NE, we took advantage of
recent advances in high-throughput shotgun proteomics using
multidimensional protein identification technology (MudPIT)
(4, 5), by which
the coupling of tandem mass spectrometry with multiple
liquid chromatography steps allows analysis of the enormous
number of peptides generated by direct digestion of a
complex biochemical fraction. Eluting peptides are first measured
in the ion trap mass spectrometer, then ions are isolated
and fragmented by collision-induced dissociation (CID)
with the helium bath gas, and the resulting product ions
are measured. The fragmentation pattern often yields
amino acid sequence information, allowing protein identification
from a single unique peptide, thus increasing sensitivity.
Avoiding prior separation by polyacrylamide gel electrophoresis
removes its chemical and physical biases and the need
to solubilize membrane proteins for the analysis (6).
To enrich for NE-specific proteins, we employed a "subtractive
proteomics" approach (fig. S1). A microsomal membrane (MM)
fraction can be prepared devoid of NEs because intact
nuclei sediment readily, yet it contains the membranes
that contaminate isolated NEs (e.g., mitochondrial membranes)
and that are shared between peripheral ER and the NE.
Thus, NE-specific proteins were determined by subtracting
the proteins present in MM fractions from those of the
NE fractions after proteomic analysis.
NEs and MMs isolated from rodent liver (Fig. 1A)
(7, 8) were extracted
with 0.1 M NaOH to enrich for transmembrane proteins in
the pellet (fig. S2). Four times more MMs than NEs were analyzed
to increase representation of minor ER proteins. Separately,
NEs were extracted with salt and detergent to identify integral
proteins more closely associated with the lamin polymer. Although
this fraction is expected to contain more intranuclear contaminants,
computational sequence analysis should separate those with
predicted transmembrane regions. Proteins in all three
pellets were proteolytically cleaved, and the complex
peptide mixtures were separated by sequential salt steps
followed by acetonitrile gradients to slowly release
peptides into the mass spectrometer. Over 30,000 peptides
were analyzed, yielding 2391 separate protein identifications
between the three fractions (table S1).
Fig. 1. (A) Schematic of method. Mouse NEs and
MMs were both extracted with 0.1 M NaOH to enrich for transmembrane
proteins. Separately, rat NEs were extracted with salt and
detergent (25 mM Hepes, pH = 7.5; 400 mM NaCl; and 1% ß-octylglucoside)
to enrich for proteins that have tight associations with the
lamin polymer. These three fractions were analyzed by MudPIT
(22). (B) Proteins identified
in the various fractions. (Left) A primary-color-scheme Venn
diagram indicates separately identified proteins in each fraction
and overlap between fractions. Circle areas equal set protein
counts. Protein identifications were generated by searching
spectra from tandem mass spectrometry against a database of
106,360 human, rat, and mouse sequences. The recent addition
of 25,000
sequences to the rat database suggests that rat proteomic
analyses are now viable. (Right) A paradigm for focus on novel
transmembrane proteins: blue, total protein hits; green, proteins
remaining after subtraction of the MM fraction; yellow, previously
uncharacterized proteins (i.e., hypothetical ORFs); red, hypothetical
transmembrane proteins. Transmembrane sequences were predicted
with the use of Tmpred (12, 13).
However, we used a higher stringency, restricting the data
set to proteins with scores greater than 1000 in one direction
and 1900 cumulative, on the basis of scores for previously
characterized integral NE proteins. The final two protein
sets yielded a total of 67 previously unknown putative integral
NE proteins. [View
Larger Version of this Image (23K GIF file)] |
The logic of the subtractive approach was supported by the presence
of all previously identified integral NE proteins in the NaOH-extracted
NEs (table S2), and their absence from the NaOH-extracted
MMs. Furthermore, no lamins (the most abundant NE-specific
proteins) were recovered in the MMs. All but two of the
known integral NE proteins also appeared in the salt-
and detergent-extracted NEs. The absence of LUMA and
nurim may indicate a less stringent association with
the lamin polymer. All 31 known core NPC proteins (9)
(table S3) were identified in the salt- and detergent-extracted
fraction: Thus, we conclude that our identification approach
is essentially comprehensive.
The dynamic range of MudPIT enabled identification of 1830 separate
proteins in the salt- and detergent-extracted NEs, 566 proteins
in the NaOH-extracted NEs, and 652 proteins in the NaOH-extracted
MMs (Fig. 1B, left, and tables S4 to S6).
Forty-one percent of the proteins in the NaOH-extracted
NE fraction also appeared in the MMs (Fig.
1B), readily eliminating them through the subtractive approach.
Many proteins remaining in the two NE fractions were known
chromatin proteins and transcription factors, some of which
(histones, HP1, and barrier to autointegration factor) have
been shown to bind NE proteins (10). Some
proteins were eliminated because they were known components
of contaminating organelles such as mitochondria. However,
certain ER proteins also have specific functions in the
NE (11): Thus, some of those that
we have dismissed may subsequently prove to be NE proteins.
Nonetheless, we restricted our focus to the 337 uncharacterized
open reading frames (ORFs) unique to the two NE fractions.
The transmembrane prediction algorithm, TMPred (12),
predicted that 34% of those in the NaOH-extracted NEs
are integral membrane proteins (13).
Predicted integral proteins from the salt- and detergent-extracted
NEs were also considered, because some would be expected
to have very strong interactions with lamins. Together, both
fractions contained 67 previously unknown potential integral
NE proteins (Fig. 1B, right, and tables
S7 and S8).
To test whether this number is a realistic estimate of previously
unknown nuclear integral membrane proteins, we selected a
representative sample to characterize their ability to
target to the NE in transiently transfected cells. Eight
cDNAs were recovered, representing a range of sizes (112
to 674 residues), numbers of predicted transmembrane
segments (one to five), and numbers of peptide hits (a
crude estimate of abundance).
All eight proteins tested were targeted to the NE (Fig.
2 and fig. S3). Varying amounts of protein also accumulated
in the ER and/or in cytoplasmic aggregates, but this
is commonly observed for known integral NE proteins when
exogenously overexpressed (fig. S3), presumably because
binding sites at the NE become saturated. Because all
were fused to an N-terminal epitope tag, the retention
of the tag suggests that they are type 2 membrane proteins
or polytopic with a cytoplasmic N-terminus, as seen for
all previously identified integral NE proteins. The eight proteins
whose NE-targeting has been confirmed have been assigned the
prefix "NET" for nuclear envelope transmembrane protein.
Fig. 2. Localization of five previously unknown putative
nuclear transmembrane proteins. cDNAs recovered from a human
liver library were inserted behind a cytomegalovirus promoter
and a N-terminal-encoded hemagglutinin-epitope tag and transiently
transfected into HeLa or COS7 cells. Cells were first preextracted
by three washes with 1% triton x-100, followed by formaldehyde
fixation (22). Asterisks indicate
proteins that map to chromosome regions linked to dystrophies.
During the course of this study, NET56 was separately identified
and named Dullard (23); however,
its subcellular localization was not determined. For galleries
of micrographs and cells not preextracted, see fig. S3. [View
Larger Version of this Image (21K GIF file)] |
Preextraction with detergent before fixation removes most NE
proteins that are not tightly associated with the insoluble
lamin polymer. After this treatment, five of the eight proteins
remained at the nuclear rim, arguing that they are normally
concentrated at the NE (Fig. 2). Nonetheless,
it remains possible that some have functions in both
the NE and ER yet failed to appear in the MM fraction.
The three putative transmembrane proteins that were not
retained after detergent preextraction may normally be
concentrated in the outer nuclear membrane or, alternatively,
may be weakly associated with the lamina.
The NE targeting of all eight proteins tested argues that most
of the remaining 59 are also integral NE proteins. Thus, the
13 integral proteins identified before this study likely represent
only a minor fraction of the total. We postulate three reasons
why we identified such a large number of proteins as compared
to an earlier comparative proteomic analysis (14)
that identified LUMA and Unc-84A: avoidance of losses
from gel extractions, the sensitivity of tandem mass
spectrometry, and the use of whole tissue instead of
a cell line. The latter would enhance identification
of cell-type specific proteins, because liver contains
hepatocytes, Kupffer cells, a sinusoidal epithelia, perisinusoidal
lipocytes, an endothelial vasculature, and muscle cells.
Indeed, we identified two muscle-cell integral NE proteins,
Syne-1 and Syne-2 (15, 16),
that were absent from the earlier study. Among the proteins
we identified are two (numbers 25 and 66) that contain
the LEM domain, named for its occurrence in the NE proteins
LAP2, emerin, and MAN1 (17), and one (number
9) that appears to be related to LAP1 through a gene duplication.
Twelve of the 67 proteins contained functional domains associated
with enzymatic activities such as phosphatases, acetyltransferases,
and glycosyltransferases (table S7). Thus, the subtractive
method is effective in identifying components of cellular
substructures and can be applied to any well-characterized
subcellular fractionation system where contaminating
fractions can be prepared free of the fraction of interest.
Thirteen human diseases, mostly dystrophies, have been associated
with mutations in NE proteins, including both lamins and lamin-binding
integral proteins (1, 2).
Yet 300
dystrophies remain for which a responsible gene has not
been identified, some of which have been partially mapped
to large chromosome territories. Five of the 67 proteins
in our rodent data set did not have apparent human homologs.
Of those remaining, 37% (23 genes) mapped within chromosome
regions linked to 14 of these dystrophies (Fig. 3).
Although any of these linked regions may contain hundreds
of genes, there are several compelling arguments that
our proteins make good candidates for disease links:
(i) NE proteins have been linked to eight dystrophies;
(ii) the genes we identified have an increased frequency
in loci linked to disease (37%) as compared to random
distributions (25%) (twice the frequency if dystrophies
only mapped to very large territories are excluded); and
(iii) there is a precedent for multiple interacting NE proteins
causing variants of the same disease [lamin A and emerin in
Emery-Dreifuss muscular dystrophy (18,
19)]. In this light, nine putative
integral NE proteins from our data set are located within
chromosome regions linked to three Charcot-Marie-Tooth disease
variants and two limb-girdle muscular dystrophy variants; both
diseases have variants caused by lamin A mutations (20,
21). Four of the proteins that map
to dystrophy-linked chromosome regions target to the
NE, and two of these are resistant to preextraction with
detergent, suggesting an association with the lamina.
Thus, it seems highly probable that some of these 62
human putative NE proteins will be linked to disease. We postulate
that the reason so many dystrophies have already been mapped
to the NE arises from the complex set of functions carried
out by the NE and from the intricate network of interacting
proteins on which NE organization depends.
Fig. 3. Possible association of previously unknown
integral nuclear membrane proteins with genetic diseases.
Chromosome locations of the genes encoding potential integral
NE proteins were determined with the use of Genbank's human
genome resources. Dystrophies mapped to large chromosome regions
were obtained from (24). Putative
integral NE proteins encoded within these regions are indicated
by connecting lines. The percentage of the total genes in
the genome within the disease-linked regions was calculated
with the use of Genbank resources to determine the random
probability of genes occurring within them. Proteins designated
NETs have confirmed NE localization. [View
Larger Version of this Image (59K GIF file)] |
References and Notes
| 1. |
B. Burke, C. L. Stewart, Nature Rev.
Mol. Cell. Biol. 3, 575 (2002).[CrossRef][ISI][Medline] |
| 2. |
H. J. Wormanm, J. C. Courvalin, Trends
Cell Biol. 12, 591 (2002).[CrossRef][ISI][Medline] |
| 3. |
S. K. Vasu, D. J. Forbes, Curr. Opin.
Cell Biol. 13, 363 (2001).[CrossRef][ISI][Medline] |
| 4. |
M. P. Washburn, D. Wolters, J. R. Yates
3rd, Nature Biotechnol. 19, 242 (2001).[CrossRef][ISI][Medline] |
| 5. |
D. A. Wolters, M. P. Washburn, J. R.
Yates 3rd, Anal. Chem. 73, 5683 (2001).[CrossRef][ISI][Medline] |
| 6. |
V. Santoni, M. Molloy, T. Rabilloud,
Electrophoresis 21, 1054 (2000).[CrossRef][ISI][Medline] |
| 7. |
N. Dwyer, G. Blobel, J. Cell Biol.
70, 581 (1976).[Abstract] |
| 8. |
L. Gerace, Y. Ottaviano, C. Kondor-Koch,
J. Cell Biol. 95, 826 (1982).[Abstract] |
| 9. |
J. M. Cronshaw, A. N. Krutchinsky, W.
Zhang, B. T. Chait, M. J. Matunis, J. Cell Biol. 158,
915 (2002).[Abstract/Free
Full Text] |
| 10. |
R. D. Goldman, Y. Gruenbaum, R. D. Moir,
D. K. Shumaker, T. P. Spann, Genes Dev. 16, 533
(2002).[Free
Full Text] |
| 11. |
S. Siniossoglou et al., Cell
84, 265 (1996).[ISI][Medline] |
| 12. |
TMPred is available online at www.ch.embnet.org/software/TMPRED_form.html. |
| 13. |
K. Hofmann, W. Stoffel, Biol. Chem.
Hoppe-Seyler 374, 166 (1993). |
| 14. |
M. Dreger, L. Bengtsson, T. Schoneberg,
H. Otto, F. Hucho, Proc. Natl. Acad. Sci. U.S.A. 98,
11943 (2001).[Abstract/Free
Full Text] |
| 15. |
E. D. Apel, R. M. Lewis, R. M. Grady,
J. R. Sanes, J. Biol. Chem. 275, 31986 (2000).[Abstract/Free
Full Text] |
| 16. |
Q. Zhang et al., J. Cell Sci.
114, 4485 (2001).[ISI][Medline] |
| 17. |
F. Lin et al., J Biol. Chem.
275, 4840 (2000).[Abstract/Free
Full Text] |
| 18. |
G. Bonne et al., Nature Genet.
21, 285 (1999).[CrossRef][ISI][Medline] |
| 19. |
S. Bione et al., Nature Genet.
8, 323 (1994).[ISI][Medline] |
| 20. |
A. De Sandre-Giovannoli et al., Am.
J. Hum. Genet. 70, 726 (2002).[CrossRef][ISI][Medline] |
| 21. |
T. Kitaguchi et al., Neuromuscular
Disord. 11, 542 (2001).[CrossRef][ISI] |
| 22. |
Detailed materials and methods are available
as supporting material on Science Online. |
| 23. |
R. Satow, T. Chan, M. Asashima, Biochem.
Biophys. Res. Comm. 295, 85 (2002).[CrossRef][ISI][Medline] |
| 24. |
Dystrophies were obtained from www.neuro.wustl.edu/neuromuscular/
and www.periodicparalysis.org/. |
| 25. |
Thanks to J. Bednenko for critical reading
of the manuscript and R. Sadygov, D. Tabb, and J. Johnson for
expert computer programming. This work was supported by the
NIH with F32 GM19085 to E.C.S., GM28521 to L.G., and RR11823
to J.R.Y. |
Supporting Online Material
www.sciencemag.org/cgi/content/full/301/5638/1380/DC1
Materials and Methods
Figs. S1 to S3
Tables S1 to S8
18 June 2003; accepted 23 July 2003
10.1126/science.1088176
Include this information when citing this paper.
Volume 301, Number 5638, Issue of 5 Sep 2003, pp. 1380-1382.
Copyright
© 2003 by The American Association for the Advancement of Science.
All rights reserved.
|