By Rachel Robertson
Although data science and engineering aim to improve our lives — and have done so in many ways — the threat of misuse of our private data and of intrusion into our personal affairs is very real.
Vast amounts of information travel nearly instantaneously between giant server farms and our home computers and handheld devices. We rely on this data for banking, shopping, buying airline tickets, making dates and countless other daily activities. Institutions and businesses collect, analyze and store these data to better serve customers, improve health, perform political functions and much more.
Unencrypted data are vulnerable to attacks that can lead to significant consequences. In 2014, a hack of Sony’s computer system exposed company and personal secrets and destroyed information. An attack on an individual could be as benign as leaked email addresses but can lead to identity theft and major financial headaches. If medical or genomic databases are breached, sensitive personal information such as health status and predisposition to disease could be revealed.
“We need techniques that protect these data, and cryptography traditionally serves as the backbone of security systems to protect the information,” says Attila Yavuz, assistant professor of computer science.
Data in secure systems are encrypted using permutation and substitution operations (methods that scramble and distort data) again and again until the information becomes nearly indistinguishable from random bits. Decryption reverses these steps to restore the integrity of the original.
“We are only in the first stages of a society centered around massive data,” says Mike Rosulek, assistant professor of computer science. “I don’t think we yet understand all the implications of generating and storing it all. I hope it doesn’t take a catastrophic breach to make people realize that cryptographic precautions are necessary.”
Cryptography has come a long way since Alan Turing cracked the Enigma codes during World War II. Advances have made it possible to perform operations on encrypted data without decrypting it first and without leaking critical information.
“We call it the privacy-versus-data-utilization dilemma,” Yavuz says. “When we use strong encryption, accessing and analyzing these data become very difficult. Unless we can break this trade-off, it is really difficult for us to achieve both secure and usable information. So our objective is to fill this gap and create a system where we can search and analyze without compromising the data-analytics functionalities.”
Yavuz is currently working with Robert Bosch LLC to provide more security for data collected from the company’s medical devices. The research has demonstrated that his dynamic searchable algorithms can make encrypted queries on an encrypted dataset in two to 10 milliseconds per search, without decrypting it. The ultimate goal of his research is to integrate the algorithms into Bosch’s telemedicine database, so that practitioners can remotely access patient data while keeping it secure.
In contrast to Yavuz’s work, which advances applied cryptographic techniques for a specific purpose, Rosulek’s research focuses on theoretical cryptography — finding solutions that can be used to support any kind of computation on encrypted data.
Over the last couple of years, Rosulek has been working on what is known as “garbled circuits.” They are not physical circuits but a cryptographic domain where the computations on encrypted data are performed.
“You can think of a garbled circuit as a sealed box,” he said. “This box is like an isolation or containment chamber with gloves attached to it, so the scientist can reach in and manipulate what is inside. But the garbled circuit is a black box so the operations being performed are not visible.”
The idea for garbled circuits was first introduced in the 1980s, but it wasn’t until the early 2000s that people began implementing the techniques with real data. In the last decade, researchers have been working on finding ways to make these operations more efficient.
“Most modern processors have specialized hardware for cryptographic computations, so the computational cost is under control,” Rosulek says. “The bottleneck is the amount of information that has to be exchanged between the parties that are doing the computation. The improvements we make are new, clever ways to encode these encrypted data with all the guarantees of a garbled circuit.”
In two recent publications, Rosulek and colleagues demonstrate that, compared to other approaches, their new algorithms are 33 percent faster and have 33 percent less overhead in the amount of communication required for the computations.
“We also proved that by looking at all the known techniques, you can’t do better than our most recent work,” he adds. “So we have shown that our techniques are optimal until someone invents something totally new. It’s been fun, because it’s spurring people to think of different ideas.”
The challenge of protecting private data is not likely to diminish. The amount of data in the world is doubling every two years and, according to the International Data Corporation, will reach 44 zettabytes (44 trillion gigabytes) by 2020. Much of that will be personal information collected by the millions of smart devices in our homes and cars, or even attached to our bodies.
“One should never forget that it’s vitally important for us to be able to secure our information, because in the future, that will be the single most valuable thing that mankind will possess,” Yavuz says. “Researchers and every individual should realize the importance and value of the information that they have at their hands and try their best to keep it secure.”