NEWPORT, Ore. - Hatfield Marine Science Center researchers studying the marine food web have literally tens of millions of photographic images of small marine organisms called "plankton" to identify - a task that would take two lifetimes to finish manually.
Their hope is that the data science community can develop a computer algorithm that can do it automatically.
This week, Booz Allen Hamilton, a management and technology consultant firm, and Kaggle, the leading online data science competition community, announced the launch of the inaugural National Data Science Bowl to seek a solution to this "big data" challenge.
They are offering prize money totaling $175,000 to the creators of the top three algorithms - the largest such purse designated for a Kaggle competition benefitting social good. More information on the National Data Science Bowl is available at: http://www.datasciencebowl.com/
The 90-day competition will not only provide the data science community a chance to flex its creativity and brain power, it hopefully will solve a challenge facing marine science researchers who need to process massive amounts of data in hours, not decades. The winning algorithms will be donated to Oregon State University's Hatfield Marine Science Center in Newport, Ore., for use by the scientific community.
"The National Data Science Bowl was born from the realization that, in order for the data science community to grow and thrive, it must be given opportunities to use its talents to benefit both business and society," said Josh Sullivan, vice president of Booz Allen Hamilton's Strategic Innovation Group. "We are extremely honored to partner with leaders such as Kaggle and the Hatfield Marine Science Center for this initiative."
Robert Cowen, director of OSU's Hatfield Marine Science Center, admits the task is daunting. In the summer of 2014, center researchers embarked on an 18-day expedition funded by the National Science Foundation to study interactions between larval fishes, their planktonic prey, and their predators in the Straits of Florida. With their specially designed imaging system, the In Situ Ichthyoplankton Imaging System (ISIIS), they collected 32 terabytes of images of plankton, fish and jellyfish.
That is an amount of data equivalent to 9 million MP3 songs, or enough music to listen to nonstop for 52 years.
Plankton are the fundamental biological building blocks of ocean ecosystems, yet scientists don't know as much about them as they would like, including their diversity, interactions with other marine organisms, what triggers their blooms, and how they respond to climate change.
Advancing scientific knowledge about these tiny organisms begins with identifying and cataloguing them, Cowen pointed out.
"Many economically important animals - including fishes, crabs and other shellfish - are part of the plankton in their early life stages," he said. "Much of what we study relates to understanding the relationship between larval fishes and their planktonic prey and predators."
Ultimately, what scientists are interested in "is what drives variation in year-to-year population abundances of key fish species," said Su Sponaugle, co-principle investigator on the project and a professor in the Department of Integrative Biology at OSU.
Jessica Luo, a doctoral student from the University of Miami's Rosenstiel School of Marine and Atmospheric Sciences working with Cowen and Sponaugle at the Hatfield Center, said what the researchers need from the data science community is akin to "facial recognition" software for planktonic species.
"At a minimum, we're aiming for an automatic classification system that can identify organisms to the class or order level, in general groups like fish or shrimps," she said. "But with distinctly shaped or transparent organisms, we think it might be possible to get down to the genus or even species level. It will be difficult, because plankton are of all different sizes, shapes and orientations, and are moving in all different directions."
Kelly Robinson, a post-doctoral researcher at the Hatfield Marine Science Center, said scientists would benefit greatly from an automated system that could provide near real-time data of plankton abundance and diversity while aboard ships.
"From a resource management perspective, it is less effective to analyze plankton abundance and diversity from four years earlier if the resource that depends on plankton responds rapidly to environmental change," she said. "The current process of manually identifying organisms is time-consuming and laborious. The ocean is changing rapidly and there is an urgency to learn as much as we can about plankton interrelationships to help ensure the health of our marine environments."
For the competition, participants will be given access to nearly 100,000 underwater images and tasked with developing an algorithm that will identify and monitor them at a scale never before attempted. If successful, it will open up new doors to researchers and vastly improve the ability of resource managers to apply science to decision-making.
"The algorithms resulting from this competition will be applied to millions of images taken in a variety of marine environments, allowing cross-comparison and analysis at an unprecedented scale," Cowen said.