Designing Novel Storage System for Better Management of Data from Scientific Instruments
ECE Associate Professor Devesh Tiwari received a DOE award for “End‐to‐end Object‐focused Software‐defined Data Management for Science.”
Northeastern computer scientists to design state-of-the-art ‘data management’ framework, meeting critical need in research community
When it comes to the world of scientific research, Sherlock Holmes once said it best. It’s all about: “Data! Data! Data!”
Indeed, the beloved Arthur Conan Doyle character understood the value of following the data—the basic building block of science, human advancement and, in effect, truth.
Today, scientific applications are producing an unprecedented amount of data, so much so that it’s prompting a demand for better data management techniques. As a result, practitioners and researchers in computer science are working on more sophisticated methods to store, process, secure, and ultimately leverage data in business and research settings.
That’s what Devesh Tiwari, associate professor of electrical and computer engineering, hopes to do with new funding from the U.S. Department of Energy. Tiwari is part of a multi-university and national lab team that was recently awarded $2.7 million to design a software framework that would make it easier for domain scientists using data storage systems to glean insights from “massive scientific datasets,” he says.
“We will design novel storage interfaces and tools for autonomous and efficient data movement and metadata management,” Tiwari says.
Ultimately, that means designing techniques to facilitate and better manage the process of moving, or extracting, data from scientific instruments to storage systems, including cloud-based storage systems. As scientific instruments become ever more complex and sophisticated, the need for efficient data storage systems becomes greater.
One example Tiwari gives to illustrate project hopes are so-called “lattice light-sheet” microscopes, which can produce exquisite images of biological specimens at “subcellular resolutions.” These cutting-edge devices produce way more data than they can store, which is a significant problem for the scientists using them.
As with any task on a computer, when data “takes flight” (moves electronically from source to storage destination) processing power is needed, which translates into energy usage and, therefore, energy costs. The mere movement or transfer of data can actually use a significant amount of energy—up to 62% of overall energy consumption in computing, according to some estimates.
One question researchers like Tiwari are asking is: can you also analyze the data while it’s traveling? That is, can computation be combined with data processing in an energy-efficient way—and in a way that aids computational scientists as they look to obtain insights from their own experiments?
“This is pretty complex to do,” Tiwari says. “But there are smart storage devices emerging where you can have computational power in the storage device itself.”
It’s something Tiwari has been thinking about for a while now: how to bring high-performance computing to traditional storage devices.
“Almost a decade back, my collaborators and I were among the first ones to experiment with an interesting idea of embedding computation inside the flash-based storage devices for [high-performance computing] data analytics tasks,” he says. “Such computational storage devices are now being prototyped and productized by companies such as Samsung.”
The question then becomes, he says, is “can we enable computational scientists to take advantage of such unique storage devices with computational capability—and if so, how?” Tiwari and his colleagues have demonstrated that they can develop those computational storage capabilities as part of the Department of Energy-funded project.
That money is just one slice of a broader disbursement—nearly $12 million, all told—from the federal department for the purposes of improving data management.
“The new capabilities in data management and visualization these projects develop will help make the most of the deluge of data generated by modern scientific experiments and simulations,” Barbara Helland, associate director of science for advanced scientific computing research at the Department of Energy, said, in a statement.
Upgrading and refining data management methods would directly impact a range of scientific fields, from “materials science and chemistry … climate modeling and the development of new clean energy sources, to new approaches to increasing energy efficiency and reducing energy consumption,” Department of Energy officials said.
Collaborating universities include the Lawrence Berkeley National Laboratory and the University of Illinois Urbana-Champaign. Tiwari credits his students for their work in advancing these grant-worthy ideas.
“We are thankful for the support from [the Department of Energy],” he says. “I feel very fortunate that I have wonderful collaborators at Lawrence Berkeley National Laboratory and University of Illinois Urbana-Champaign, and remarkably talented and creative students in my lab at Northeastern to make progress on these ideas.”
by Tanner Stening, News @ Northeastern