Designing Storage Architectures for Digital Collections

Digital storage meeting at the Library of CongressStoring and preserving digital content continues to be a significant expectation of libraries and cultural centers around the country. To better understand these needs, as well to see what digital archivists around the country are doing to meet this challenge, the Library of Congress holds an annual meeting called “Designing Storage Architectures for Digital Collections.” The DSA meeting brings together technical and industry experts, IT and subject matter experts, government specialists with an interest in preservation, decision-makers from a wide range of organizations with digital preservation requirements, and recognized authorities and practitioners of digital preservation. The meeting is by invitation only, and for the past two years Rutgers has been invited to take part in the conversation. The most recent meeting was held on September 17 and 18.

The first thing glaringly clear from our discussions was the increasing need for digital storage across all of our peers. From the few terabytes of data that Rutgers Libraries store in our repository, to the dozen or so petabytes stored by the Library of Congress, our digital collections continue to grow, and the demands for storage increase. This is driven by an increasing appetite for digital data from our patrons, but is also the effect of researchers and artists having greater access to digital authoring tools. We are now in the age where smartphones and tablets already in the hands of our user base can capture images, documents, and video in stunning quality—but with a cost in terms of larger file sizes.

To meet this challenge, storage makers continue in the short term to refine the technologies we are already familiar with. Reasonably-priced tablets and laptops are now shipping with solid state drives reaching a terabyte in capacity. Fourteen-TB traditional hard drives are now hitting the market. And for long-term backups, tape continues to rule, with 30TB tape cartridges costing about $200 each. At the institutional level, libraries are beginning to cooperate and pool resources to distribute their storage needs across multiple datacenters, for redundancy and additional capacity.

The not-too-distant future holds some different approaches, as well. In particular, research is ongoing to move beyond hard drives and tapes, and to begin storing data at the molecular level, using polymer chains. Even DNA sequencing is showing significant promise as a long-term method for archiving and preserving data.

Matt Badessa