|
Now scientists at the U.S. Department of Energy's
Brookhaven National Laboratory have written a computer program "to
sort the informational 'wheat' from the 'chaff,'" said Brookhaven
biochemist John Shanklin, who leads the research team. The program,
which is described in the open access journal BMC Bioinformatics*,
makes comparisons of groups of related proteins and flags individual
amino acid positions that are likely to control function.
Biochemists are interested in identifying "active
sites" -- regions of proteins that determine their functions -- and
learning how these sites differ between paralogs, proteins that have
different functions that arose from a common ancestor. The new program,
called CPDL for "conserved property difference locator," identifies
positions where two related groups of proteins differ either in amino
acid identity or in a property such as charge or polarity.
"Experience tells us that such positions are likely
to be biologically important for defining the specific functions of
the two protein classes," Shanklin said.
When the Brookhaven team used the program to scan
three test cases, each consisting of two groups of related but
functionally different enzymes, the program consistently identified
positions near enzyme active sites that had been previously predicted
from structural and or biochemical studies to be important for the
enzymes' specificity and/or function. "This suggests that CPDL will
have broad utility for identifying amino acid residues likely to play
a role in distinguishing protein classes," Shanklin said.
Scientists have already used such comparative
sequence analysis to identify protein active sites, and have also used
this knowledge to alter enzyme functions by switching particular amino
acid residues from one class of enzyme to turn it into the related but
functionally different class. But comparing sequences "manually" is
labor intensive, error prone, and has become impractical for those who
wish to take advantage of the increasing number of sequences in
protein databases, Shanklin said.
"Yet this growing data resource contains a wealth
of information for structure-function studies and for protein
engineering," Shanklin said. "We developed CPDL as a general tool for
extracting and displaying relevant functional information from such
data sets."
Also, since CPDL does not require that a protein's
structure be known -- just its amino acid sequence -- it can be
applied to studies of proteins that reside in the cell membrane, for
which it is notoriously difficult to determine a molecular structure. |