A recent article on Atlas Obscura pointed out that a massive number of fossils in museum collections around the world are simply languishing in drawers, not being used for scientific or exhibition purposes. While it’s not atypical for museums to have a larger collection than that which is publicly viewable, the unanalyzed data being held within these fossil collections could provide valuable insights into our planet’s incomplete historical record. Until now, combing through these collections was unfeasible, but innovations in image acquisition, content analysis, and machine learning have begun to make the once-impossible task possible.
Peter Roopnarine, curator of geology at the California Academy of Sciences, published a paper on this topic in Biology Letters which examined the collections of nine different institutions in California, Washington and Oregon. The team calculated that of all the specimens housed in these collections, more than 95 percent are from locations that have never been the subject of any published literature. From that, they predicted that “perhaps only 3–4% of recorded fossil localities are currently accounted for.” In an attempt to reverse this trend, Roopnarine and his associates established the Eastern Pacific Invertebrate Communities of the Cenozoic, or EPICC, a partnership between these nine natural history museums. The community’s goal? To close the gap by digitizing 1.6 million marine invertebrate fossil specimens from the region.
Making use of these unexplored collections, referred to by many as “dark data,” is no small feat. As pointed out in the Roopnarine paper “Prior to computers, these synoptic datasets were compiled by hand, a laborious undertaking that took years of effort and forced paleontologists to make difficult choices about what types of data to tabulate.” But some of the biggest breakthroughs in paleontology have come from the analysis of large data sets by determined scientists using little more than pen, paper, and tireless work ethic. For example, the first detailed account of Earth’s five mass extinction events came about in 1982 after two geologists combed through nearly 400 papers and databases of marine fossils and published their exhaustive findings.
In a similar effort, The Smithsonian Natural History Museum recently launched their Mass Digitization Project. With an estimated 40 million fossil specimens, the collection is the largest in the world, so workflow efficiency was a critical consideration in developing an action plan. It was also important to ensure that the imagery (and therefore, the data) captured wouldn’t become outdated or obsolete in the future, so the Smithsonian team insisted on using the most cutting-edge digitization equipment available. Their long-term vision, and unwillingness to compromise led them to choose equipment from DTCulturalHeritage.
DT CulturalHeritage was founded to address the unique digitization needs of Museum, Library, and Archival institutions. With a broad portfolio of digitization solutions built specifically for streamlining workflows and capturing FADGI-4 compliant images, DT CulturalHeritage systems are a perfect match for digitizing large fossil collections. Cutting-edge innovations like TruePPI and DT AutoColumn for fast and consistent calibration, SlipStream software to simplify the capture process, and the highest resolution image sensors on the market (up to 150 megapixels) made the DT Element the ideal tool for the job.
Doug Peterson, DT’s Head of R+D, outlined a hypothetical scenario of digitizing a collection of 1 million fossil specimens to explain the benefits in real-world terms. “Your best set-up would be 4 DT Atoms with Slipstream software on touchscreen interfaces, each with one camera and lighting setup calibrated for different sized material—say in 6-inch increments—to avoid unnecessary downtime due to set-up changes. One supervisor and their team could digitize something like 3,000 to 4,000 items a day. You’d knock out the entire collection in under a year.”
The potential of this untapped body of “dark data,” once digitized, is enormous. From uncovering patterns in the history of evolution and environmental changes, to increased understanding of how we classify living and extinct species, the ability to use Big Data tools to aggregate and process fossil metadata that’s been historically inaccessible has the potential to open up entirely new avenues of research. In anticipation of this burgeoning revolution, the Integrated Digitized Biocollections database (iDigBio) is promoting, sharing and coordinating best practices so that all museums can participate in this effort and ensure maximum fidelity of the data.
While there’s still plenty left to discover under the surface of our planet, it’s increasingly clear that data, not pickaxes, may lead to the next leap in our understanding of evolution, climate change, mass extinction, or some not-yet-known field of discovery. In the brave new world of Big Data, Cloud Computing, and “always on” internet access, the digitization of these extensive fossil assets has never been more important.