0% found this document useful (0 votes)
21 views9 pages

A Tour of Structural Genomics (En)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views9 pages

A Tour of Structural Genomics (En)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

REVIEWS

A TOUR OF STRUCTURAL
GENOMICS
Steven E. Brenner
Structural genomics projects aim to provide an experimental or computational three-
dimensional model structure for all of the tractable macromolecules that are encoded by
complete genomes. To this end, pilot centres worldwide are now exploring the feasibility of
large-scale structure determination. Their experimental structures and computational models
are expected to yield insight into the molecular function and mechanism of thousands of
proteins. The pervasiveness of this information is likely to change the use of structure in
molecular biology and biochemistry.

The explosive growth of genetic sequence information three-dimensional fold7 (FIG. 1), although their sequences
has offered us comprehensive collections of the protein did not contain recognizable similarity8. (Modern
sequences found in many living organisms. Most of these sequence analysis, however, would now detect their simi-
are not experimentally characterized. Although half of larity.) Today, the literature is rich with celebrated cases
the proteins that are encoded in sequenced eukaryotic of homology inferred from structure, including the
genomes have computationally recognized homology to unexpected similarity between actin and the 70-kDa
at least one well-characterized domain1,2, functional heat-shock cognate protein9, the TopRim domain shared
interpretation of these matches is fraught with difficulty. between some topoisomerases, primases and nucleas-
Functional changes over evolutionary time3,4 and data- es10,11, and the highly similar constant and variable
base errors5 confound reliable computational prediction domains of immunoglobulins. Indeed, most evolution-
of the precise roles of newly discovered genes. Even pro- ary relationships cannot be detected from sequence12.
teins with recognized domains are often scattered with In addition, the three-dimensional structure of a
regions of unmatched sequence. So, most of the residues protein can yield direct insight into its molecular
in putative gene products lack any computational anno- mechanism. For example, the structure of the TATA-
tation, and there exists no general experimental approach box-binding protein (TBP) when it is bound to DNA
to directly ascertain their molecular role. provides not only a sense of how these molecules inter-
The challenge of understanding these gene products act in general, but also some fascinating clues about
has led to the development of functional genomics DNA-binding specificity. Furthermore, structural
methods, which collectively aim to imbue the raw understanding of recognition mechanisms in major
sequence with biological understanding. Structural histocompatibility complex molecules and T-cell recep-
genomics is one such approach, with unique promise to tors helped to make immunology comprehensible at a
reveal the molecular function6 of protein domains. molecular level13,14. Structural genomics efforts plan to
Department of Plant and Protein structure represents a powerful means of dis- extend structural insight to a broad repertoire of pro-
Microbial Biology, covering function, because structure is well conserved teins, using large-scale high-throughput techniques15–26.
University of California, over evolutionary time, and it therefore provides the While the term ‘structural genomics’ is sometimes
461A Koshland Hall, opportunity to recognize homology that is undetectable loosely used to encompass disparate large-scale efforts
Berkeley, California
94720-3102, USA.
by sequence comparison. This became apparent with the to determine protein structure, by international agree-
e-mail: brenner@ first two protein structures that were determined, ment it has come to have a relatively specific meaning
compbio.berkeley.edu because their common ancestry was clear from the (see link to the Airlie Agreement for ‘Agreed Principles

NATURE REVIEWS | GENETICS VOLUME 2 | OCTOBER 2001 | 8 0 1


© 2001 Macmillan Magazines Ltd
REVIEWS

a b

c d

e 4hhba VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF.DLS.....HGSAQVKGHGKKVA
1mbd_ VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKHGVTVL
4hhba DALTNAVAHVD..DMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR......
1mbd_ TALGAILKK.K.GHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Figure 1 | Structure similarity without sequence similarity. The first two protein structures that were solved — sperm-
whale myoglobin and horse haemoglobin — were recognizable as homologues even at low resolution, even though their
sequences were more different than similar. a | Papier mâché model of sperm-whale myoglobin. b | Baked and painted foam
model of horse haemoglobin. Modern representations of these structures clearly show the areas of structural similarity
(highlighted in red in c and d). c | Myoglobin (Protein Data Bank (PDB) code 1mbd)117. d | Human haemoglobin (PDB code
4hhb)118. e | Alignment of horse myoglobin and human α-haemoglobin sequences119 shows little sequence similarity. Photos
taken of the structures at the MRC Laboratory of Molecular Biology by S.E.B. Computer images were generated using
Rasmol120, Molscript121 and Raster3D122.

and Procedures’). In this more purist sense, structural however, the gamut of molecules suitable for large-scale
genomics is an effort to create a representative set of studies is likely to increase; one can already imagine what
experimental macromolecular structures, which will structural genomics of RNA might involve27, although
be augmented by computational methods to provide no such projects are underway at present. Moreover,
model structures for most tractable macromolecules. rather than solving the structures of all domains, the
Although this reflects a primary focus on surveying the general intent at present is to solve experimentally the
structures of different families, agreed goals of struc- structure of one representative domain from each family,
tural genomics include the study of biologically inter- and use computational comparative modelling to pro-
esting molecules, such as those from model organisms vide the COORDINATES for related proteins. In this way, cur-
and those with medical importance. In addition, struc- rent structural genomics is a conjoined experimental
tural genomics specifically aims to derive function and computational effort, which expects to provide a
from the structures. comprehensive repertoire of models of soluble globular-
Because structural genomics is in its infancy, its protein domains. This review outlines how proteins are
course might change over the next several years; indeed, selected for structural genomics and how they are exper-
the experiences of the current pilot centres will inform imentally characterized in a typical pilot centre, discusses
future directions. However, the relatively precise defini- some early results, and suggests what they might mean
tion of structural genomics includes several hints about for the future of the field.
the limitations and scope of the field. For example, struc-
tural genomics efforts often study individual protein The process
domains, rather than whole proteins or complexes, The principles of experimental structural genomics are
COORDINATES because domains are the fundamental units of protein largely the same as those for traditional structural biology,
A set of numbers that specify structure and evolution. For the time being, proteins and but differ in motivation, automation and scale. The key to
the X, Y and Z positions for
each atom in a protein.
other macromolecules that are not tractable for high- the success of this scientific venture is the ability to opti-
Together, they describe the throughput characterization will largely be left uncon- mize the structure-determination process, so as to reap
molecular structure. sidered by structural genomics efforts. Over time, economies of scale as centres increase their throughput.

802 | OCTOBER 2001 | VOLUME 2 www.nature.com/reviews/genetics


© 2001 Macmillan Magazines Ltd
REVIEWS

Choose targets
Y
N Choose another Cloning coding Y Other expression N
Abandon family member? sequences systems? Abandon

N N
Expression
Y Disseminate clones
Solubilize N
refolding, detergents, Soluble
metals, cofactors, etc. Y
Y
Purify
Disseminate proteins
Quality assurance/
biophysical analysis
Y
Identify and correct problem N
further purification, subclone, Likely to crystallize?
add metal or cofactor, other?
Y
N
Y Crystallization trials
NMR
N N
Microcrystals
Abandon Y
N Diffraction-quality
crystals
Y
N N Y
MIR search Contains methionine? Obtain SeMet crystals
Y N Y
N Y N
MIR data collection Phasing, model Y
MAD data collection
N building, refinement
Y
Deposit structure in PDB,
create homology models,
annotate structure

Figure 2 | Processes involved in high-throughput structural genomics using X-ray crystallography. N indicates that a
process has failed and Y that it has succeeded. (MIR, multiple isomorphous replacement; an alternative to multiple anomolous
dispersion (MAD) phasing for structure determination; NMR, nuclear magnetic resonance; SeMet, selenomethionine.) (Modified with
permission from REF. 16.)

Experimental structural genomics faces no single that allow trials in different expression systems28.
bottleneck to overcome: nearly every stage of the process Expressing high levels of soluble protein is a particular
needs to be refined and optimized. Moreover, many indi- challenge, so there is considerable interest in fusions
vidual proteins are expected to be intractable without between the target protein and green fluorescent protein
specialized extensive effort. Therefore, parallel studies on that fluoresce only when soluble and folded, therefore
related proteins are being relied on to increase the likeli- indicating folded proteins in solution29. Cell-free expres-
hood of readily solving a structure for a family of pro- sion systems hold great promise for improving yields
teins. The progress of individual protein targets through and allowing the production of toxic proteins30.
HIS-TAG the experimental process will be like a funnel, with many Another optimization is the use of hyperthermophilic
A series of histidine residues targets starting at the same time, and a fraction failing at proteins, which are easier to purify when expressed in
fused to a protein that aids each stage of the process. The slope of the funnel is MESOPHILIC hosts, as they are resistant to heat that will
protein purification because of dependent on the effort devoted at each step, which is, in denature most of the proteins of the host.
its strong binding to nickel
columns.
turn, a consequence of the specific motivations of the The expressed proteins might have their domain
particular structural genomics centre. boundaries identified by proteolysis and mass spec-
MESOPHILE Although the detailed processes of scaling up the trometry, and several groups subject samples to DYNAMIC
An organism that grows at procedures involved in structure determination are LIGHT SCATTERING to detect when proteins have formed
moderate temperature.
unique to each centre for structural genomics, several heterogeneously sized oligomers that are unlikely to
DYNAMIC LIGHT SCATTERING characteristics are shared among most centres (FIG. 2). crystallize. In some centres, the proteins are studied by
A technique for determining The experimental process begins with the cloning of a heteronuclear single-quantum coherence nuclear
apparent molecular size, in selected target sequences, frequently with recombina- magnetic resonance (HSQC NMR) experiment,
which laser light is shone on a tion-based vectors that allow the creation of many dif- because this technique gives insight into the ‘folded-
solution. Its scatter corresponds
to the diffusion rate and,
ferent constructs. These vectors incorporate different ness’ of a protein31,32. Any promising purified soluble
therefore, the size of the affinity tags, such as HIS-TAGS and glutathione-S-trans- proteins are then subjected to crystallization trials or
molecules in solution. ferase (GST), to aid purification, as well as promoters NMR experiments.

NATURE REVIEWS | GENETICS VOLUME 2 | OCTOBER 2001 | 8 0 3


© 2001 Macmillan Magazines Ltd
REVIEWS

Table 1 | Centres that are undertaking structural genomics projects


Centre Leader Key ideas Website Reference
Berkeley Structural Sung-Hou Kim Complete structural genomics of http://www.strgen.org/ –
Genomics Center M. genitalium and M. pneumoniae
Joint Center for Ian Wilson Large-scale automation; proteins http://www.jcsg.org/ –
Structural Genomics from T. maritima and C. elegans
Midwest Center for Andrzej Joachimiak Novel protein folds and technology http://www.mcsg.anl.gov/
Structural Genomics development
New York Structural Stephen Burley Yeast proteins with novel folds; http://www.nysgrc.org/ –
Genomics Research technology development
Consortium
Northeast Structural Gaetano Montelione Complementarity of NMR and http://www.nesg.org/ –
Genomics Consortium crystallography; coverage of
structure space
Southeast Collaboratory Bi-Cheng Wang Development of SAD technology; http://www.secsg.org/ –
for Structural Genomics P. furiosis, H. sapiens and
C. elegans proteins
TB Structural Thomas Terwilliger M. tuberculosis proteins; new folds; http://www.doe-mbi.ucla.edu/TB/ –
Genomics Consortium large-scale collaboration
Structure to Function Roberto J. Poljak Functional characterization of http://s2f.carb.nist.gov/ 123
H. influenzae proteins
Ontario Structural Aled Edwards High-throughput; experimental http://www.uhnres.utoronto.ca/proteomics/ 31
Proteomics Group target selection
Protein Folds Project Shigeyuki Yokoyama NMR of proteins from mouse http://www.rsgi.riken.go.jp/ 30
full-length cDNAs
Structurome Project Seiki Kuramitsu Complete structural genomics of http://www.rsgi.riken.go.jp/ 39
T. thermophilus HB8
Protein Structure Udo Heinemann Technology development; human http://userpage.chemie.fu-berlin.de/~psf/ 124
Factory proteins
StructuralGenomiX Tim Harris Company: structures relevant to http://www.stromix.com/ 125
medicine
Syrrx Wendell Wierenga Company: structures relevant to http://www.syrrx.com/ –
medicine
C. elegans, Caenorhabditis elegans; H. influenzae, Haemophilus influenzae; H. sapiens, Homo sapiens; M. genitalium, Mycoplasma genitalium; M. pneumoniae,
Mycoplasma pneumoniae; M. tuberculosis, Mycobacterium tuberculosis; P. furiosus, Pyrococcus furiosus; T. maritima, Thermotoga maritima; T. thermophilus, Thermus
thermophilus; NMR, nuclear magnetic resonance; TB, tuberculosis; SAD, single wavelength anomalous diffraction.

Several centres are investing in considerable automa- roughly half of the structural genomics effort in Japan
tion to allow parallel large-scale expression trials and use NMR39,30.
parallel crystallization trials (TABLE 1); for example, the The refinement of crystallographic structures has
Joint Center for Structural Genomics hopes to be able been reported to be the slowest step in structure deter-
SYNCHROTRON
to analyse up to 130,000 crystallization experiments per mination (S.-H. Kim, personal communication), and
A device that accelerates day33. To ensure optimal use of precious SYNCHROTRON the advent of highly automated structure-determina-
particles of atomic size through time, BEAMLINE AUTOMATION is crucial34. In addition, careful tion software for both crystallography40,41 and NMR42,43
an electric field; it is used to tracking of laboratory results and analyses can be used is therefore likely to have a marked effect on increasing
produce synchronous packets
to predict better which proteins will be most the speed of solution of structures.
of particles.
successful35; this information might then be fed into the
BEAMLINE AUTOMATION target-selection process to improve future results. Target selection: which proteins and how many?
Technologies to reduce human Crystallography has benefited from many tech- It would be desirable to have an experimental molecular
intervention on synchrotron nologies, including the brilliance of synchrotron radi- structure for every known protein, such as the ~600,000
beamlines, such as robots for
mounting and centring crystals
ation and its tunability for multiple anomalous dis- in the protein sequence databases SWISS-PROT and
in the X-ray beam. persion (MAD) PHASING36. Other improvements include TrEMBL44. However, practicalities dictate a compro-
charged coupled device detectors, as well as the mise, whereby a more modest number of structures are
MAD PHASING enhanced stability provided by cryocrystallography. solved, and these are used as templates for the compara-
(Multiple anomolous
NMR has seen similar advances, including cryogenic tive modelling of most soluble protein domains. A
dispersion). An approach to
determining the phases of a probes and higher-field magnets, as well as new tech- rough consensus indicates that it could be feasible for
crystal structure by relying on niques such as transverse relaxation-optimized spec- 10,000 structures to be experimentally solved over the
the anomalous scattering of troscopy (TROSY)32,37. Consequently, although early next decade45.
X-rays near the absorption edge plans for structural genomics focused primarily on Dennis Vitkup and colleagues have shown that
of the atom (such as selenium).
It allows determination of
crystallography, NMR has already proved to have this number of experimental structures is insufficient
phase from several sets of data great value for the field32,38. At this time, most centres to provide templates for high-quality models of all
collected from a single crystal. in the United States have NMR spectroscopists, and protein domains46. To determine how many structure

804 | OCTOBER 2001 | VOLUME 2 www.nature.com/reviews/genetics


© 2001 Macmillan Magazines Ltd
REVIEWS

Nonetheless, general features of the target selection are


Box 1 | Who is doing structural genomics? common to all centres53 (FIG. 3). First, proteins of inter-
There are seven comprehensive pilot centres and one programme project that are est are defined, and — to the extent possible — these
funded by the National Institute of General Medical Sciences (NIGMS), a component proteins are typically divided into their constituent
of the National Institutes of Health (NIH)33 (TABLE 1), and three more centres might domains, as individual structural modules are more
be funded from pending applications. Large centres for structural genomics have also conducive to high-throughput studies54. At this point,
been established in Japan, Germany and Canada. In addition to these main funded domains are identified primarily on the basis of visual
centres, there are smaller programmes underway in the above countries, as well as inspection of multiple-sequence alignments and com-
France, Sweden, Australia, Israel and China. Funding has also been approved for parison with well-described domains47,55–58, but many
programmes in Switzerland and Italy107. In the United Kingdom, the Wellcome Trust automated approaches are being developed to incorpo-
has proposed the formation of an industry-funded organization comparable with the rate alignment and other information. It has also been
SNP Consortium (Single Nucleotide Polymorphism Consortium), which would suggested that pairs of domains will be good targets26.
promptly release structures to the public108,109. Several companies are also involved in This is because, out of the huge number of possible
structural genomics, and two in particular — StructuralGenomiX and Syrrx — aim
domain combinations, only a limited number are
to solve numerous structures.
found to exist in individual proteins59,60. As these pairs
often adopt regular conformations61, solving the struc-
determinations are necessary to provide good three- tures of domain pairs should provide an understanding
dimensional models for all of the 1,626 non-mem- of typical domain interaction, and give clues about
brane families in the Pfam database47 (a collection of overall protein structures.
well-characterized protein-domain sequences), All of the prospective target domains are put through
Vitkup clustered the sequences into groups with a battery of computational tools; those proteins predicted
more than 30% identity. This produced 13,000 clus- to be membranous, unstructured or otherwise unsuitable
ters, each requiring a structure determination. Even are immediately removed from the pool as being
under the optimistic assumption that sequences out- intractable. Next, database searches are used, and proteins
side Pfam belong to similarly large families, extrapo- that can be computationally modelled by homology to
lation shows that 64,000 structure determinations known structures are also set aside. The remaining candi-
would be needed to provide structures for all soluble dates are all valid ‘structural genomics proteins,’ as they
domains. However, if the goal is relaxed to provide are thought to be tractable, and their experimental char-
models for 90% of all protein domains, then ~16,000 acterization will provide structural information that
structure determinations might suffice. Vitkup points could not have been predicted. Priority is assigned to
out that this reduction only holds if structural families of structural genomics proteins according to
genomics efforts are optimally coordinated to solve their desirable characteristics62, such as phylogenetic dis-
structures from the largest families. In practice, tribution63,64, family size46, likelihood of producing a new
smaller families will often be targeted because of their fold65,66 and functional relevance67.
identified biological or medical importance, consid- The selected families contain the original candidate
erably increasing the number of structures required. target, but often that protein will not be among those
The number of requisite structures can be reduced chosen for experimental characterization. Instead, in the
greatly by relaxing the requirements for the quality of selected families, individual proteins are chosen for study
the model; for example, only one structure would be on the basis of their suitability for experimental charac-
needed for each Pfam family if it were sufficient to terization, including features such as length, thermosta-
know the fold-type of each protein without building a bility, codon usage, ISOELECTRIC POINT (pI), ability to model
detailed coordinate model. The importance of cooper- other structures42 and suitability for MAD phasing. This
ation between structural genomics centres is also evi- is deliberate, with the goal of reducing experimental
dent, as in two instances already, independent groups effort. Indeed, following the ‘class-directed’ approach, in
have inadvertently solved the structures of homologous most cases, several homologous targets will be studied
proteins48–51. Although determining the structures of experimentally in parallel68,69. This is motivated by the
highly homologous proteins often has great value to expectation that one protein will fortuitously prove far
structural biology, it runs counter to the goals of struc- more tractable than the others, therefore justifying the
tural genomics. To avoid future duplicated efforts, the replicated effort at the early stages of the pipeline.
structural genomics community has agreed to a set of
TROSY principles and procedures for coordination52 (see link Function from structure
(Transverse relaxation- to the Airlie Conference on Structural Genomics), Elucidation of function from molecular structure is per-
optimized spectroscopy). A which includes the sharing of lists of target proteins. haps the most exciting, but also probably the least
nuclear magnetic resonance
At present, each structural genomics centre (BOX 1) understood aspect of structural genomics70–73. Until
technique that reduces the
deterioration of signal from chooses protein targets using its own distinct criteria. recently, only proteins with well-characterized functions
large proteins. It allows large The Ontario Structural Proteomics project, for exam- were candidates for structure determination. Structural
proteins to be studied in high- ple, aims to pursue those proteins that are experimen- genomics turns that logic on its head by using the struc-
field magnets. tally most tractable, whereas the Berkeley Structural ture to infer function. Although some basic principles
ISOELECTRIC POINT
Genomics Center is pursuing a nearly complete reper- for this process have been shown to be successful, the
The pH at which a protein has toire of proteins from two Mycoplasma spp. Several extent to which different approaches will prove valuable
zero net charge. centres focus on finding new topological protein folds. remains to be seen.

NATURE REVIEWS | GENETICS VOLUME 2 | OCTOBER 2001 | 8 0 5


© 2001 Macmillan Magazines Ltd
REVIEWS

a b domains, with considerable manual review and anno-


tation. The SCOP classification in particular is found-
ed on using structure, along with functional and
mechanistic information, to organize proteins accord-
ing to their distant evolutionary relationships.
It remains to be seen to what extent new experimen-
tal work from structural genomics reveals recognizably
homologous proteins. Extrapolations from historical
c d
structure determinations of proteins that could have
been candidates for structural genomics indicate that
~45% of structural genomics proteins would be
3 homologous to known proteins79, and that 25–28%
would have a new fold79,80. This trend seems to be
1 roughly followed: in a small sample of 32 such domains
that were recently solved, Teichmann and colleagues
2 report that 34% are homologous and that 37% adopt a
new fold, whereas the remainder are structurally similar
e f to those seen before, but are not evolutionarily related26.
In many cases, the homology that is inferred from
structure has allowed interesting functional assignments
3
to be made. For example, a hypothetical Saccharomyces
cerevisiae protein was found to be a triosephosphate iso-
1
merase (TIM) barrel, the active site of which looks like
alanine racemase, and preliminary studies indicate that it
2
does have that biochemical activity16. However, homolo-
Figure 3 | Target selection for structural genomics. a | Proteins in the realm of interest (in gy has not proved to be definitive; indeed, of the ten
this case a genome) are plotted as blue shapes in an arbitrary sequence space. Proteins of structures solved by Christendat and co-workers, in no
known structure are shown as stars, others as circles. b | Transmembrane proteins and those case did structurally inferred homology alone provide a
with low complexity are excluded, as indicated by a red cross. c | Homologues from other
robust functional prediction31. In several cases, common
organisms (different colours) are identified and family relationships are determined (ovals).
Families with a member of known structure are excluded, as indicated by a red cross.
ancestry inferred from structure has not reflected
d | Priority is assigned to families. In this case, a pervasive taxonomically diverse family is ranked common function; for example, Methanobacterium
highest. e | Two proteins in the highest-priority family are chosen (arrows); note that they are not thermoautotrophicum MTH538 closely resembles the
one of the original proteins of interest (blue), but they are homologous to such a protein. f | The Escherichia coli response regulator CheY, but could not be
solved structure is similar to, and homologous to, another structure that was previously known shown to have any related aspartate-kinase activity81.
(arrows). This means that all of the proteins in the two families are homologous (indicated by the Furthermore, two close homologues of unknown molec-
blue enclosure), and it might therefore be possible to make useful functional inferences.
ular function — YjgF from E. coli and YabJ from Bacillus
subtilis — were both found to be similar in structure to
chorismate mutase. However, the completely different
The key idea behind deducing function from struc- active sites precluded the possibility of these proteins
ture is that protein structure is better conserved than sharing chorismate-mutase function with their struc-
sequence, and structure therefore provides a way of turally similar homologue50,51. So, although structural
homology database searching that is more sensitive analysis failed to show the role of YjgF and YabJ, it was key
than sequence comparison. Hence, the logical first step in allowing the researchers to realize that their homology
in analysing a newly solved structural genomics protein did not reflect similar activity. Structure determination of
is a structure comparison with the Protein Data Bank M. thermoautotrophicum MTH1175 likewise showed
(PDB)74, a database of known structures, using any of structural similarity to E. coli RNaseH, but did not sup-
various popular tools55,75–78. However, none of these port a shared function between the two82. Because active
methods is guaranteed to find true matches in the data- sites can occur in different contexts and can change in
base, and any of them can report high scores for evolu- homologous proteins, several automated methods have
tionarily unrelated proteins. Moreover, structural been developed to seek similarity in active sites to predict
similarity alone is insufficient to determine whether function83–85 or specificity86,87; however, the application of
two proteins are homologous, because they could have these methods has not yet been described for the handful
evolved by convergence to have the same structure. of published structural genomics proteins.
As a consequence, it is also necessary to inspect the One of the more startling findings of structural
structures visually, and to provide expert judgement genomics is that structures can often be functionally
on whether there is similarity indicative of common interpreted even when their folds are novel. For example,
ancestry. The primary aids for this task are databases the discovery of a long, positively charged groove on the
such as SCOP56 (structural classification of proteins) surface of the mouse tubby protein allowed Boggon and
and CATH57 (class, architecture, topology and homol- colleagues to postulate that it is a DNA-binding
ogous superfamily). These provide comprehensive protein88. The structure also showed that all but one of
hierarchical classifications of all known protein the tubby mutations responsible for retinitis pigmentosa

806 | OCTOBER 2001 | VOLUME 2 www.nature.com/reviews/genetics


© 2001 Macmillan Magazines Ltd
REVIEWS

A fundamental limitation of structural genomics is


Box 2 | Where are the structural genomics data?
that it typically only provides clues about molecular
The Airlie Agreement on structural genomics52 specifies that, within 6 months of its function6, such as what a protein binds to or reacts with.
completion, each protein structure will be deposited in the Protein Data Bank110, a Understanding this molecular function gives only limit-
repository of all publicly solved structures. Structural and evolutionary relationships ed insight into the cellular role. This limitation is endem-
between these proteins can be found in the SCOP56 and CATH111 databases, whereas ic to homology-based methods and is therefore shared
Dali provides automated structure comparison55. with sequence comparison. Fortunately, many other
Lists of targets can be found on the websites of individual centres, as can functional genomics approaches, especially expression
information about protein production. Compendiums of targets and searching profiling, yield precisely complementary data: although
facilities can also be found in the PRESAGE database112 and on the
they cannot indicate the molecular action of a protein,
structuralgenomics.org website. The PRESAGE database also provides information
they provide clues about its role in a wider context, such
about structure predictions, such as those in ModBase113 and other fold
as in a signalling pathway or a cellular state.
predictions114,115,113,116.

Beyond structural genomics


type 14 are found in a small region of the groove, even Structural genomics will revolutionize biochemistry
though they are dispersed within the sequence. Some of and molecular biology, making pervasive the use of
these replace positive amino acids with neutral ones, three-dimensional structure information. Just as one
strengthening the hypothesis that surface charge is can expect to find sequences for most genes of interest
important. So, not only did three-dimensional structure in public databases, structural genomics promises to
provide insight into the molecular function of tubby, but offer a comparably comprehensive library of experi-
also it helped to explain disease-causing mutations. mental and computational models (BOX 2). These will
The E. coli YrdC protein was similarly found to have reveal new functions, indicate molecular mechanisms
a concave surface with positive electrostatic potential, and explicate mutations.
which led to experiments showing preferential RNA Despite its promise, current structural genomics will
binding89. In another instance of structure directly indi- not provide a perfect resource. Most membrane pro-
cating function, the Methanococcus jannaschii MJ0226 teins and RNA structures27 will probably be left
and B. subtilis Maf proteins established new structural unsolved for the time being, as will proteins without a
superfamilies, although their structure was reminiscent defined structure97–99. Moreover, although most impor-
of nucleotide-binding folds. Further tests showed that tant families will have representative structures, rare
MJ0226 hydrolyses non-standard nucleotides90,91. In unusual families with no known functional import are
these cases, the functional inference would have been unlikely to be characterized soon. Finally, although
missed by all tools available at present; only the expertise structural genomics focuses on a complete repertoire of
of the structural biologists allowed these functional static individual domains of proteins, it fails to capture
interpretations. their interactions, complexes and dynamics at present.
With surprising frequency, unexpected ligands iden- Even as structural genomics provides a solid foun-
tified in the crystal structure have also indicated the dation for the future of structural biology research, its
function of structural genomics proteins. Clues about limitations leave much exciting work to be done.
the molecular mechanism of the proteins MTH150 and Improvements in sequence analysis100 and comparative
MTH152 (from M. thermoautotrophicum), HI0139 modelling will yield disproportionate enhancements in
(from Haemophilus influenzae), and MJ0577 (from the number and quality of modelled structures.
M. jannaschii) were shown by their co-crystallization Likewise, building from the repertoire of known struc-
with NAD+, FMN (flavin mononucleotide), a selenium tures, computational methods using limited experi-
version of S-adenosyl-L-homocysteine, and ATP, respec- mental data101–103 and ab initio approaches104 should
tively31,92,93. In each case, binding was sufficiently strong help to fill in knowledge of domains beyond the
that the protein apparently scavenged the cofactor from resources of fully experimental approaches. The tech-
the original expression system. In further tests, MJ0577 nology developed for structural genomics is also
was found to hydrolyse ATP to ADP only in the presence expected to provide a watershed for studies of those
of extract from its source organism, M. jannaschii, indi- macromolecules not suited for high-throughput stud-
cating that it might be a molecular switch93. ies, by providing the means to rapidly explore several
Comparative modelling allows each experimentally expression constructs and screen through many purifi-
determined fold to provide structure information for a cation and crystallization protocols. It will also allow
family of related proteins94. The quality of the model for parallel studies of homologues, such as all human
can range from extremely good to virtually worthless, kinases, to understand their specificity. In addition,
depending on the intended use and the evolutionary structural genomics will provide a platform for detailed
distance between the template (solved structure) and studies on molecular dynamics and interactions105, and
the query, in large part because of problems with align- for the elucidation of large macromolecular complexes
ment95,96. Because the structural information of most by X-ray crystallography and electron microscopy106. In
proteins will be available only as a homology model, this way, even as structural genomics brings our knowl-
understanding the strengths and limitations of the com- edge of protein-domain structures near to completion,
parative modelling methods will be crucial for making it is a prelude to a still richer knowledge of molecular
informed use of structural genomics data. structure and function.

NATURE REVIEWS | GENETICS VOLUME 2 | OCTOBER 2001 | 8 0 7


© 2001 Macmillan Magazines Ltd
REVIEWS

1. Lander, E. S. et al. Initial sequencing and analysis of the Describes the determination of ten protein structures SCOP: a structural classification of proteins database for
human genome. Nature 409, 860–921 (2001). from M. thermoautotrophicum, using the principle of the investigation of sequences and structures. J. Mol. Biol.
2. Venter, J. C. et al. The sequence of the human genome. finding proteins that are most amenable to structural 247, 536–540 (1995).
Science 291, 1304–1351 (2001). characterization. The SCOP database is a comprehensive expert-
3. Devos, D. & Valencia, A. Practical limits of function 32. Montelione, G. T., Zheng, D., Huang, Y. J., Gunsalus, K. C. curated hierarchical evolutionary classification of
prediction. Proteins 41, 98–107 (2000). & Szyperski, T. Protein NMR spectroscopy in structural protein domains using structural information.
4. Todd, A. E., Orengo, C. A. & Thornton, J. M. Evolution of genomics. Nature Struct. Biol. 7, 982–985 (2000). 57. Pearl, F. M. et al. A rapid classification protocol for the
function in protein superfamilies, from a structural 33. Terwilliger, T. C. Structural genomics in North America. CATH Domain Database to support structural genomics.
perspective. J. Mol. Biol. 307, 1113–1143 (2001). Nature Struct. Biol. 7, 935–939 (2000). Nucleic Acids Res. 29, 223–227 (2001).
5. Brenner, S. E. Errors in genome annotation. Trends Genet. 34. Abola, E., Kuhn, P., Earnest, T. & Stevens, R. C. Automation An introduction to CATH, a largely automated
15, 132–133 (1999). of X-ray crystallography. Nature Struct. Biol. 7, 973–977 hierarchical classification of protein domain
6. Ashburner, M. et al. Gene ontology: tool for the unification of (2000). structures.
biology. The Gene Ontology Consortium. Nature Genet. 25, 35. Bertone, P. et al. SPINE: an integrated tracking database 58. Siddiqui, A. S., Dengler, U. & Barton, G. J. 3Dee: a
25–29 (2000). and data mining approach for identifying feasible targets in database of protein structural domains. Bioinformatics 17,
7. Perutz, M. F. et al. Structure of hæmoglobin. A three- high-throughput structural proteomics. Nucleic Acids Res 200–201 (2001).
dimensional Fourier synthesis at 5.5 Å resolution, obtained 29, 2884–2898 (2001). 59. Apic, G., Gough, J. & Teichmann, S. A. Domain
by X-ray analysis. Nature 185, 416–422 (1960). 36. Hendrickson, W. A. Synchrotron crystallography. Trends combinations in archaeal, eubacterial and eukaryotic
8. Kendrew, J. C. & Watson, H. C. Comparison between Biochem. Sci. 25, 637–643 (2000). proteomes. J. Mol. Biol. 310, 311–325 (2001).
amino-acid sequences of sperm whale myoglobin and of 37. Wider, G. & Wuthrich, K. NMR spectroscopy of large 60. Apic, G., Gough, J. & Teichmann, S. A. An insight into
human haemoglobin. Nature 190, 670 (1961). molecules and multimolecular assemblies in solution. Curr. domain combinations. Bioinformatics 17 (Suppl. 1),
9. Flaherty, K. M., McKay, D. B., Kabsch, W. & Holmes, K. C. Opin. Struct. Biol. 9, 594–601 (1999). S83–S89 (2001).
Similarity of the three-dimensional structures of actin and the 38. Prestegard, J. H., Valafar, H., Glushka, J. & Tian, F. Nuclear 61. Saha, S. et al. Solution structure of the LDL receptor EGF-
ATPase fragment of a 70-kDa heat shock cognate protein. magnetic resonance in the era of structural genomics. AB pair. A paradigm for the assembly of tandem calcium
Proc. Natl Acad. Sci. USA 88, 5041–5045 (1991). Biochemistry 40, 8677–8685 (2001). binding EGF domains. Structure 9, 451–456 (2001).
10. Aravind, L., Leipe, D. D. & Koonin, E. V. Toprim — a 39. Yokoyama, S. et al. Structural genomics projects in Japan. 62. Gerstein, M. Integrative database analysis in structural
conserved catalytic domain in type IA and II Nature Struct. Biol. 7, 943–945 (2000). genomics. Nature Struct. Biol. 7, 960–963 (2000).
topoisomerases, DnaG-type primases, OLD family 40. Adams, P. D. & Grosse-Kunstleve, R. W. Recent 63. Fischer, D. Rational structural genomics: affirmative action
nucleases and RecR proteins. Nucleic Acids Res. 26, developments in software for the automation of for ORFans and the growth in our structural knowledge.
4205–4213 (1998). crystallographic macromolecular structure determination. Protein Eng. 12, 1029–1030 (1999).
11. Berger, J. M., Fass, D., Wang, J. C. & Harrison, S. C. Curr. Opin. Struct. Biol. 10, 564–568 (2000). This paper describes interesting features of genes
Structural similarities between topoisomerases that cleave 41. Lamzin, V. S. & Perrakis, A. Current state of automated without homologues and the ability of structural
one or both DNA strands. Proc. Natl Acad. Sci. USA 95, crystallographic data analysis. Nature Struct. Biol. 7, genomics to elucidate their provenance.
7876–7881 (1998). 978–981 (2000). 64. Galperin, M. Y. Conserved ‘hypothetical’ proteins: new
12. Brenner, S. E., Chothia, C. & Hubbard, T. J. P. Assessing 42. Helgstrand, M., Kraulis, P., Allard, P. & Hard, T. Ansig for hints and new puzzles. Comp. Funct. Genomics 2, 14–18
sequence comparison methods with reliable structurally Windows: an interactive computer program for (2001).
identified distant evolutionary relationships. Proc. Natl Acad. semiautomatic assignment of protein NMR spectra. 65. Linial, M. & Yona, G. Methodologies for target selection in
Sci. USA 95, 6073–6078 (1998). J. Biomol. NMR 18, 329–336 (2000). structural genomics. Prog. Biophys. Mol. Biol. 73,
13. Bjorkman, P. J. et al. Structure of the human class I 43. Zimmerman, D. E. et al. Automated analysis of protein NMR 297–320 (2000).
histocompatibility antigen, HLA-A2. Nature 329, 506–512 assignments using methods from artificial intelligence. J. 66. Mallick, P., Goodwill, K. E., Fitz-Gibbon, S., Miller, J. H. &
(1987). Mol. Biol. 269, 592–610 (1997). Eisenberg, D. Selecting protein targets for structural
14. Wilson, I. A. & Garcia, K. C. T-cell receptor structure and 44. Bairoch, A. & Apweiler, R. The SWISS-PROT protein genomics of Pyrobaculum aerophilum: validating
TCR complexes. Curr. Opin. Struct. Biol. 7, 839–848 (1997). sequence database and its supplement TrEMBL in 2000. automated fold assignment methods by using binary
15. Blundell, T. L. & Mizuguchi, K. Structural genomics: an Nucleic Acids Res. 28, 45–48 (2000). hypothesis testing. Proc. Natl Acad. Sci. USA 97,
overview. Prog. Biophys. Mol. Biol. 73, 289–295 (2000). 45. Norvell, J. C. & Machalek, A. Z. Structural genomics 2450–2455 (2000).
16. Burley, S. K. et al. Structural genomics: beyond the human programs at the US National Institute of General Medical 67. Erlandsen, H., Abola, E. E. & Stevens, R. C. Combining
genome project. Nature Genet. 23, 151–157 (1999). Sciences. Nature Struct. Biol. 7, 931 (2000). structural genomics and enzymology: completing the
17. Domingues, F. S., Koppensteiner, W. A. & Sippl, M. J. The 46. Vitkup, D., Melamud, E., Moult, J. & Sander, C. picture in metabolic pathways and enzyme active sites.
role of protein structure in genomics. FEBS Lett. 476, Completeness in structural genomics. Nature Struct. Biol. 8, Curr. Opin. Struct. Biol. 10, 719–730 (2000).
98–102 (2000). 559–566 (2001). 68. Lewis, H. A. et al. A structural genomics approach to the
18. Gaasterland, T. Structural genomics: bioinformatics in the This paper predicts the number of structure study of quorum sensing. Crystal structures of three LuxS
driver’s seat. Nature Biotechnol. 16, 625–627 (1998). determinations necessary to provide three- orthologs. Structure 9, 527–537 (2001).
19. Kim, S. H. Shining a light on structural genomics. Nature dimensional models of all (or most) families of 69. Terwilliger, T. C. et al. Class-directed structure
Struct. Biol. 5, 643–645 (1998). proteins. determination: foundation for a protein structure initiative.
20. Mittl, P. R. & Grutter, M. G. Structural genomics: 47. Bateman, A. et al. The Pfam protein families database. Protein Sci. 7, 1851–1856 (1998).
opportunities and challenges. Curr. Opin. Chem. Biol. 5, Nucleic Acids Res. 28, 263–266 (2000). 70. Shapiro, L. & Harris, T. Finding function through structural
402–408 (2001). 48. Kim, K. K., Hung, L. W., Yokota, H., Kim, R. & Kim, S. H. genomics. Curr. Opin. Biotechnol. 11, 31–35 (2000).
21. Montelione, G. T. & Anderson, S. Structural genomics: Crystal structures of eukaryotic translation initiation factor 5A 71. Skolnick, J., Fetrow, J. S. & Kolinski, A. Structural
keystone for a Human Proteome Project. Nature Struct. from Methanococcus jannaschii at 1.8 Å resolution. Proc. genomics and its importance for gene function analysis.
Biol. 6, 11–12 (1999). Natl Acad. Sci. USA 95, 10419–10424 (1998). Nature Biotechnol. 18, 283–287 (2000).
22. Sali, A. 100,000 protein structures for the biologist. Nature A report of one of the first structural genomics 72. Thornton, J. M. From genome to function. Science 292,
Struct. Biol. 5, 1029–1032 (1998). proteins solved; it represented inadvertent duplication 2095–2097 (2001).
23. Shapiro, L. & Lima, C. D. The Argonne Structural Genomics of effort, as the same structure was independently 73. Thornton, J. M., Todd, A. E., Milburn, D., Borkakoti, N. &
Workshop: Lamaze class for the birth of a new science. solved in the next reference. Orengo, C. A. From structure to function: approaches and
Structure 6, 265–267 (1998). 49. Peat, T. S., Newman, J., Waldo, G. S., Berendzen, J. & limitations. Nature Struct. Biol. 7, 991–994 (2000).
24. Smith, T. A new era. Nature Struct. Biol. 7, 927 (2000). Terwilliger, T. C. Structure of translation initiation factor 5A 74. Berman, H. M. et al. The Protein Data Bank and the
The introduction to a supplement to Nature Structural from Pyrobaculum aerophilum at 1.75 Å resolution. challenge of structural genomics. Nature Struct. Biol. 7,
Biology devoted to structural genomics, which Structure 6, 1207–1214 (1998). 957–959 (2000).
contains 20 articles that address different aspects of 50. Sinha, S. et al. Crystal structure of Bacillus subtilis YabJ, a 75. Gibrat, J. F., Madej, T. & Bryant, S. H. Surprising similarities
the field. purine regulatory protein and member of the highly in structure comparison. Curr. Opin. Struct. Biol. 6,
25. Teichmann, S. A., Chothia, C. & Gerstein, M. Advances in conserved YjgF family. Proc. Natl Acad. Sci. USA 96, 377–385 (1996).
structural genomics. Curr. Opin. Struct. Biol. 9, 390–399 13074–13079 (1999). 76. Orengo, C. A. & Taylor, W. R. SSAP: sequential structure
(1999). 51. Volz, K. A test case for structure-based functional alignment program for protein structure comparison.
26. Teichmann, S. A., Murzin, A. G. & Chothia, C. Determination assignment: the 1.2 Å crystal structure of the YjgF gene Methods Enzymol. 266, 617–635 (1996).
of protein function, evolution and interactions by structural product from Escherichia coli. Protein Sci. 8, 2428–2437 77. Shindyalov, I. N. & Bourne, P. E. Protein structure
genomics. Curr. Opin. Struct. Biol. 11, 354–363 (2001). (1999). alignment by incremental combinatorial extension (CE) of
This review includes an analysis of 32 structural 52. Smaglik, P. Protein structure groups seek to draft common the optimal path. Protein Eng. 11, 739–747 (1998).
genomics proteins and presents lessons learned in ground rules. Nature 403, 691 (2000). 78. Subbiah, S., Laurents, D. V. & Levitt, M. Structural similarity
each case. 53. Brenner, S. E. Target selection for structural genomics. of DNA-binding domains of bacteriophage repressors and
27. Doudna, J. A. Structural genomics of RNA. Nature Struct. Nature Struct. Biol. 7, 967–969 (2000). the globin core. Curr. Biol. 3, 141–149 (1993).
Biol. 7, 954–956 (2000). 54. Kuroda, Y., Tani, K., Matsuo, Y. & Yokoyama, S. Automated 79. Brenner, S. E. & Levitt, M. Expectations from structural
28. Edwards, A. M. et al. Protein production: feeding the search of natively folded protein fragments for high- genomics. Protein Sci. 9, 197–200 (2000).
crystallographers and NMR spectroscopists. Nature Struct. throughput structure determination in structural genomics. Uses historical data to predict the fraction of new
Biol. 7, 970–972 (2000). Protein Sci. 9, 2313–2321 (2000). folds and new superfamilies to be discovered by
29. Waldo, G. S., Standish, B. M., Berendzen, J. & Terwilliger, T. 55. Dietmann, S. et al. A fully automatic evolutionary structural genomics.
C. Rapid protein-folding assay using green fluorescent classification of protein folds: Dali Domain Dictionary version 80. Koppensteiner, W. A., Lackner, P., Wiederstein, M. & Sippl,
protein. Nature Biotechnol. 17, 691–695 (1999). 3. Nucleic Acids Res. 29, 55–57 (2001). M. J. Characterization of novel proteins based on known
30. Yokoyama, S. et al. Structural genomics projects in Japan. An introduction to one of the most popular systems protein structures. J. Mol. Biol. 296, 1139–1152 (2000).
Prog. Biophys. Mol. Biol. 73, 363–376 (2000). for automatically comparing proteins of known 81. Cort, J. R., Yee, A., Edwards, A. M., Arrowsmith, C. H. &
31. Christendat, D. et al. Structural proteomics of an archaeon. structure. Kennedy, M. A. Structure-based functional classification of
Nature Struct. Biol. 7, 903–909 (2000). 56. Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. hypothetical protein MTH538 from Methanobacterium

808 | OCTOBER 2001 | VOLUME 2 www.nature.com/reviews/genetics


© 2001 Macmillan Magazines Ltd
REVIEWS

thermoautotrophicum. J. Mol. Biol. 302, 189–203 (2000). 97. Dunker, A. K. et al. Protein disorder and the evolution of 117. Phillips, S. E. & Schoenborn, B. P. Neutron diffraction
82. Cort, J. R., Yee, A., Edwards, A. M., Arrowsmith, C. H. & molecular recognition: theory, predictions and reveals oxygen–histidine hydrogen bond in oxymyoglobin.
Kennedy, M. A. NMR structure determination and observations. Pac. Symp. Biocomput. 473–484 (1998). Nature 292, 81–82 (1981).
structure-based functional characterization of conserved 98. Wootton, J. C. & Federhen, S. Analysis of compositionally 118. Fermi, G., Perutz, M. F., Shaanan, B. & Fourme, R. The
hypothetical protein MTH1175 from Methanobacterium biased regions in sequence databases. Methods Enzymol. crystal structure of human deoxyhaemoglobin at 1.74 Å
thermoautotrophicum. J. Struct. Funct. Genomics 1, 266, 554–571 (1996). resolution. J. Mol. Biol. 175, 159–174 (1984).
15–25 (2001). 99. Wright, P. E. & Dyson, H. J. Intrinsically unstructured 119. Bashford, D., Chothia, C. & Lesk, A. M. Determinants of a
83. Fetrow, J. S., Godzik, A. & Skolnick, J. Functional analysis proteins: re-assessing the protein structure–function protein fold. Unique features of the globin amino acid
of the Escherichia coli genome using the sequence-to- paradigm. J. Mol. Biol. 293, 321–331 (1999). sequences. J. Mol. Biol. 196, 199–216 (1987).
structure-to-function paradigm: identification of proteins 100. Schaffer, A. A. et al. Improving the accuracy of PSI-BLAST 120. Sayle, R. A. & Milner-White, E. J. RASMOL: biomolecular
exhibiting the glutaredoxin/thioredoxin disulfide protein database searches with composition-based
graphics for all. Trends Biochem. Sci. 20, 374 (1995).
oxidoreductase activity. J. Mol. Biol. 282, 703–711 (1998). statistics and other refinements. Nucleic Acids Res. 29,
121. Kraulis, P. J. Molscript: a program to produce both
84. Wallace, A. C., Borkakoti, N. & Thornton, J. M. TESS: a 2994–3005 (2001).
detailed and schematic plots of protein structure. J. Appl.
geometric hashing algorithm for deriving 3D coordinate 101. Fowler, C. A., Tian, F., Al-Hashimi, H. M. & Prestegard,
Crystallography 24, 946–950 (1991).
templates for searching structural databases. Application J. H. Rapid determination of protein folds using residual
122. Merritt, E. A. & Bacon, D. J. Raster3d: photorealistic
to enzyme active sites. Protein Sci. 6, 2308–2323 (1997). dipolar couplings. J. Mol. Biol. 304, 447–460 (2000).
85. Wei, L. & Altman, R. B. Recognizing protein binding sites 102. Potts, B. C. & Chazin, W. J. Chemical shift homology in molecular graphics. Methods Enzymol. 277, 505–524
using statistical descriptions of their 3D environments. proteins. J. Biomol. NMR 11, 45–57 (1998). (1997).
Pac. Symp. Biocomput. 4, 497–508 (1998). 103. Young, M. M. et al. High throughput protein fold 123. Eisenstein, E. et al. Biological function made crystal clear
86. Lichtarge, O., Bourne, H. R. & Cohen, F. E. An evolutionary identification by using experimental constraints derived — annotation of hypothetical proteins via structural
trace method defines binding surfaces common to protein from intramolecular cross-links and mass spectrometry. genomics. Curr. Opin. Biotechnol. 11, 25–30 (2000).
families. J. Mol. Biol. 257, 342–358 (1996). Proc. Natl Acad. Sci. USA 97, 5802–5806 (2000). 124. Heinemann, U. et al. An integrated approach to structural
87. Sowa, M. E. et al. Prediction and confirmation of a site In this work, cross-linking and mass spectrometry genomics. Prog. Biophys. Mol. Biol. 73, 347–362 (2000).
critical for effector regulation of RGS domain activity. were used to glean limited structural information, 125. Dry, S., McCarthy, S. & Harris, T. Structural genomics in
Nature Struct. Biol. 8, 234–237 (2001). sufficient to predict a protein fold. the biotechnology sector. Nature Struct. Biol. 7, 946–949
88. Boggon, T. J., Shan, W. S., Santagata, S., Myers, S. C. & 104. Simons, K. T., Strauss, C. & Baker, D. Prospects for ab (2000).
Shapiro, L. Implication of tubby proteins as transcription initio protein structural genomics. J. Mol. Biol. 306,
factors by structure-based functional analysis. Science 1191–1199 (2001). Acknowledgements
286, 2119–2125 (1999). 105. Wuthrich, K. Protein recognition by NMR. Nature Struct. This work is supported by NIH grants and a Searle Scholarship.
This paper predicts the DNA-binding function of Biol. 7, 188–189 (2000). S.E.B. is grateful to J.-M. Chandonia, L. Lo Conte and R. Peters for
tubby proteins on the basis of examination of the 106. Baumeister, W. & Steven, A. C. Macromolecular electron critical review of the manuscript.
surface electrostatics of the structure. microscopy in the era of structural genomics. Trends
89. Teplova, M. et al. The structure of the YrdC gene product Biochem. Sci. 25, 624–631 (2000). Online Links
from Escherichia coli reveals a new fold and suggests a 107. Heinemann, U. Structural genomics in Europe: slow start,
role in RNA binding. Protein Sci. 9, 2557–2566 (2000). strong finish? Nature Struct. Biol. 7, 940–942 (2000). DATABASES
90. Hwang, K. Y., Chung, J. H., Kim, S. H., Han, Y. S. & Cho, Y. 108. Butler, D. Wellcome discusses structural genomics effort The following terms in this article are linked online to:
Structure-based identification of a novel NTPase from with industry. . . but data release remains an open InterPro: http://www.ebi.ac.uk/interpro/
Methanococcus jannaschii. Nature Struct. Biol. 6, question. Nature 406, 923–924 (2000). TIM | TopRim
691–696 (1999). 109. Williamson, A. R. Creating a structural genomics LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink/
91. Minasov, G. et al. Functional implications from crystal consortium. Nature Struct. Biol. 7, 953 (2000). TBP | tubby
structures of the conserved Bacillus subtilis protein Maf 110. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids OMIM: http://www.ncbi.nlm.nih.gov/Omim/
with and without dUTP. Proc. Natl Acad. Sci. USA 97, Res. 28, 235–242 (2000). retinitis pigmentosa type 14
6328–6333 (2000). 111. Orengo, C. A. et al. The CATH database provides insights
92. Lim, K. et al. Crystal structure of YecO from Haemophilus into protein structure/function relationships. Nucleic Acids FURTHER INFORMATION
influenzae (HI0319) reveals a methyltransferase fold and a Res. 27, 275–279 (1999). Airlie Agreement:
bound S-adenosylhomocysteine. Proteins (in the press). 112. Brenner, S. E., Barken, D. & Levitt, M. The PRESAGE
http://www.nigms.nih.gov/news/meetings/airlie.html#agree
93. Zarembinski, T. I. et al. Structure-based assignment of the database for structural genomics. Nucleic Acids Res. 27,
Airlie Conference:
biochemical function of a hypothetical protein: a test case 251–253 (1999).
http://www.nigms.nih.gov/news/meetings/airlie.html
of structural genomics. Proc. Natl Acad. Sci. USA 95, 113. Sanchez, R. & Sali, A. ModBase: a database of
CATH: http://www.biochem.ucl.ac.uk/bsm/cath_new/
15189–15193 (1998). comparative protein structure models. Bioinformatics 15,
Dali: http://www.ebi.ac.uk/dali/
This paper reports that a bound ATP that was found 1060–1061 (1999).
in the solved structure indicated that this 114. Huynen, M. et al. Homology-based fold predictions for ModBase: http://pipe.rockefeller.edu/modbase/
hypothetical protein is a molecular switch. Mycoplasma genitalium proteins. J. Mol. Biol. 280, National Institute of General Medical Sciences (NIGMS):
94. Sanchez, R. et al. Protein structure modeling for structural 323–326 (1998). http://www.nigms.nih.gov
genomics. Nature Struct. Biol. 7, 986–990 (2000). 115. Rychlewski, L., Zhang, B. & Godzik, A. Functional insights Pfam: http://www.sanger.ac.uk/Software/Pfam/
95. Friedberg, I., Kaplan, T. & Margalit, H. Evaluation of PSI- from structural predictions: analysis of the Escherichia coli PRESAGE: http://presage.berkeley.edu
BLAST alignment accuracy in comparison to structural genome. Protein Sci. 8, 614–624 (1999). Protein Data Bank: http://www.rcsb.org/pdb/
alignments. Protein Sci. 9, 2278–2284 (2000). 116. Teichmann, S. A., Park, J. & Chothia, C. Structural SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/
96. Sauder, J. M., Arthur, J. W. & Dunbrack, R. L. Jr Large-scale assignments to the Mycoplasma genitalium proteins show SNP Consortium: http://snp.cshl.org
comparison of protein sequence alignment algorithms with extensive gene duplications and domain rearrangements. Structuralgenomics.org: http://www.structuralgenomics.org
structure alignments. Proteins 40, 6–22 (2000). Proc. Natl Acad. Sci. USA 95, 14658–14663 (1998). SWISS-PROT and TrEMBL: http://www.expasy.ch/sprot/

NATURE REVIEWS | GENETICS VOLUME 2 | OCTOBER 2001 | 8 0 9


© 2001 Macmillan Magazines Ltd

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy