Looking for Trouble

The reliability of a weather station is subject to the wanderings of wildlife. Frogs crawl into rainfall collection buckets. Insects build nests in air tubes. Rodents chew through wires. And that’s on top of damage from dust, high winds, ice and hail — or simple equipment failure.

Maintaining weather station networks is a labor-intensive enterprise, says Tom Dietterich, professor in the Oregon State School of Electrical Engineering and Computer Science and an expert in artificial intelligence. With a $1 million grant from the National Science Foundation, he aims to automate part of the process by detecting bad data through software.

IMG_1658 copy
Tadesse Zemicheal

Dietterich and Ph.D. student Tadesse Zemicheal, a graduate of the Eritrea Institute of Technology, are testing computer algorithms — instructions that treat data like the ingredients in a recipe — to identify faulty sensors in a network. Rapidly pinpointing stations that need to be repaired would enable technicians to prioritize those that are performing badly. Currently, every station in a network must be visited on a predetermined schedule to make sure things are running smoothly.

Teasing out faulty data automatically would also improve the reliability of weather reports. Normally, it takes teams of meteorologists to review the raw data generated by weather stations and to confirm that the information reflects actual conditions. “Our goal is to reduce the number of meteorologists you need to run a network,” says Dietterich. The challenge, he adds, is to understand how much of that expertise can be automated and how that knowledge can be used to make the system smarter.

Software versus Humans

So far, computers are no match for the professional judgments of trained meteorologists. Software generally takes one of two routes, says Dietterich. One is called “complex quality control.” It assumes that data will follow a predictable pattern and flags data that fall outside of an expected range. For example, solar radiation sensors should not indicate that the sun is shining at night. Likewise, a wind vane (shows wind direction) should not report values when the air is still. A thermometer should not show an unrealistic rise or fall in temperature

Software can also, in effect, play doctor. Such methods are known as “probabilistic quality control.”

“The idea is to treat bad sensor data like a medical diagnosis,” says Dietterich. “If you have a disease, what’s the probability that you observe a fever? Plus you may have a prior probability, which is the likelihood of having that disease, given your age, sex and so forth.”

IMG_1661 copy
Tom Dietterich

The problem, he adds, is that there is no list of sensor “diseases.” Moreover, new sensors fail in new, innovative ways.

Rain gauges, he says, are the worst. “They have to deal with a moving physical substance that has to get measured. They often have moving parts, and insects like to build houses in there or get inside. Leaves can fall into a collecting cone or dust or bird droppings. Each one of those manifests in some weird signal. Rain gauges are the most important variable but the most trouble-prone sensor. They are the Mount Everest of sensor detection.”

Apps for Sustainability

Zemicheal grew up in Asmara, Eritrea’s capital, where he studied computer engineering as an undergraduate. He came to Oregon State after searching for a program that applies artificial intelligence to sustainability.

When Dietterich received an inquiry from Zemicheal, the OSU professor had already begun collaborating with colleague and fellow OSU engineer John Selker, co-founder of the TransAfrican Hydrometeorological Observatory. Selker’s goal is to install 20,000 weather stations across sub-Saharan Africa. Dietterich was looking for a graduate student to bring expertise — both technical and cultural — to that endeavor.

When he was a student in Eritrea, Zemicheal was impressed by computer software with a practical benefit for farmers. It was a cellphone app that detects disease on cassava plants. “Farmers can take a picture of the leaves of a plant and upload it to the main server at the university. The server processes the image and can tell if there is a disease,” he says.

In his research at OSU, Zemicheal is using data from Earth Networks, a company that maintains weather stations around the world, and from Oklahoma’s Mesonet, which Dietterich calls “the world’s best weather network.” Zemicheal’s goal is to determine how well different computer algorithms (one from Monash University in Australia is called Isolation Forest) can distinguish good data from bad. The implications go far beyond knowing the weather, he says, and include the performance of any sensor network.

More Machines

In fact, adds Dietterich, separating reliable and faulty data is crucial in a world increasingly interwoven with machines. The Internet already connects more sensors and devices than it does people (what experts call the “Internet of Things”). Sensors monitor building security, show when fruit is ripe and warn of potential airplane collisions. As the amount of sensor-based information grows, being able to rapidly evaluate the quality of data will be critical to the reliability of such systems.

“As we move toward more and more sensors out there, our existing ways of doing data quality will not scale. Our ambition is that we can develop methods that can scale better in terms of requiring a lot fewer experts,” says Dietterich.

There will always be a need for human intervention, he adds. “When you’re dealing with life and death circumstances, you’re going to want to have humans in the loop, always. But sensors break. That’s where we come in.”

In addition to Zemicheal, Dietterich’s team includes Selker and Michael Piasecki, a specialist in informatics at the City College of New York.