There’s nothing like a new pair of eyeglasses to bring fine details into sharp relief. For scientists who study the large molecules of life from proteins to DNA, the equivalent of new lenses has come in the form of an advanced method for analyzing data from X-ray crystallography experiments.
Reported in this week’s issue of the journal Science, the findings could lead to new understandings about the molecules that drive processes in biology, medical diagnostics, nanotechnology and other fields.
Like dentists who use X-rays to find tooth decay, scientists use X-rays to reveal the shape and structure of DNA, proteins, minerals and other molecules. As X-rays pass through the lattice of atoms, they reflect distinctive patterns, and scientists use those patterns to determine what atoms are present and how atoms are bonded to each other. However, some data are typically discarded because of concerns over quality. In particular, data derived from edge regions of the pattern — although very important for understanding the details of structure — are often overwhelmed by the random errors associated with measuring a weak signal in the midst of a lot of background noise.
Oregon State University biophysicist Andy Karplus and his colleague Kay Diederichs at the University of Konstanz in Germany have now proven that useful information can be gleaned from data that have up to about five times the noise levels that have previously been considered acceptable. “The criteria that have been used in the past are way too conservative,” said Karplus. “These data that people have been throwing out are actually good.”
The bottom line for crystallographers is the accuracy of their molecular models, those physical representations of the arrangement of atoms. The better the model, the better it will predict the pattern created by X-rays passing through a molecule, and the better it will be for guiding the development of new drugs and nanotechnologies that operate at the molecular scale. Although the first X-ray diffraction pattern was recorded 100 years ago and the first protein structures were determined 50 years ago, scientists have struggled to find statistical methods to connect data quality and the accuracy of their models.
The new method may be the most important conceptual advance in the past 20 years in how these data are used in modeling, the scientists said. In 1992, statistics were developed to ensure that models were not biased by randomness or “noise.” The new method carries that further by showing how data from parts of the measurement where noise becomes stronger can still provide information that makes the model more accurate. It also allows scientists to see directly where the model is limited by noise in the data and where the model is a better estimate of molecular structure than experimental data.
“The question is, ‘Where do we cut it off?’” said Karplus, whose research focuses on protein structure and stability. By adding data at incremental steps and showing how the model improved, Karplus and Diederichs showed that scientists had been cutting off their analyses too soon and discarding data that could sharpen their view of molecular structure.
“The big impact on the field will be that every structure determined from here on out will be a little more accurate because people won’t throw away data that are OK. If you have a crummy image of the protein, it will get a little sharper. If you have a good image of the protein, it will also get a little sharper,” added Karplus.
For example, he noted, some enzymes work in concert with water molecules embedded within their structure. However, it takes data at a certain level of detail (about 2.6 angstroms) to discern exactly where water molecules are suspended between the atoms of an enzyme. If X-ray data at that scale were being discarded, it could mean that the scientists are not able to conclusively demonstrate the presence of water and thus cannot properly understand how the enzyme works.
While the method will be an important step for X-ray crystallographers, Karplus and Diederichs think that other physical sciences may also find ways to benefit from this type of data quality analysis. They also discovered that one branch of science has been using this type of statistical analysis for many years. The field of psychometrics — the analysis of data from psychological tests — has used a similar technique called the “Spearman-Brown prophecy formula” to determine the minimum length of such tests.
Karplus and Diederichs have worked together off and on since 1985 when Karplus was an Alexander von Humboldt post-doctoral fellow in Germany. In 1997, they published a paper demonstrating that certain statistics used in analyzing X-ray crystallography data were misleading, but few crystallographers have adjusted their practices since that time. In 2011 during a sabbatical leave, Karplus visited with Diederichs in Germany to develop the new method. “Now that we know that very noisy data are useful, this will presumably enable still further improvements as it stimulates new software development to do a better job of handling such weak data,” said Karplus.
The paper is also the subject of a Perspectives piece in the same issue of Science by Phil Evans of the MRC Laboratory of Molecular Biology in Cambridge, England. The research was supported by grants from the National Institutes of Health and the Alexander von Humboldt Foundation.