In addition to the ever-increasing interest in digitizing materials for preservation, access and sustainability, interest in creating new digital collections is also on the rise. Digital collections are natural extensions to the idea of the library, an idea which itself has expanded rapidly in recent history—from physical collections to the concept of a collection. As with building physical collections, creating digital collections is arduous and richly rewarding.
For those beginning to create digital collections, the technicalities of digitization are only a small part of the larger process. The larger process requires planning all aspects of the project, especially accessibility and sustainability. Luckily, we can learn from the digitization work of others who have already documented their process. However, the individual requirements of projects emerge from specific collections and institutions and vary accordingly. 1 Those variations require extensive planning time even when using existing models.
In The Mythical Man-Month, Frederick P. Brooks offers a general recipe for software development, comprised of:
- 1/3 planning
- 1/6 coding
- 1/4 component test and early system test
- 1/4 system test, all components in hand (p. 20)
While Brooks focuses on software engineering, the planning required to develop a digital library project is similar. The longest single chunk of time for the first project iteration should be in planning. Planning is essential to project success because poorly planned projects are difficult to salvage. Brooks' law, "Adding manpower to a late software project makes it later," is equally applicable to many digital library projects.2 Proper planning plots the necessary infrastructure so that time-consuming and costly conversions do not become necessary.
When planning a digitization project, the main points to remember fall under three interdependent categories: sustainability, accessibility and interoperability. These may seem like vague requirements, but digital library and web standards provide benchmarks for each. Sustainability, the most important of these points, encompasses digitization by sustainable standards, creation of digital archives, using proper standards for display to ensure interoperability and accessibility, and creating a scalable system that is extensible enough to hold and connect more materials. 3
With these points in mind, the seven key steps to remember when creating a recipe for digital collection development are:
- Identifying the materials to be digitized.
What are the commonalities across these materials, or are there any? Are they significant for their form, content or both? Are they delicate, rare or only important for their content?
- Identifying the best equipment for digitization.
Scanning equipment that combines cameras, lights and book cradles is ideal for capturing the digital images of rare and delicate materials. Fast, automatic-feed scanners are optimal for scanning unbound materials. If the materials need to be unbound, what equipment is required? Some equipment requires more labor, and other equipment is more expensive but requires less labor. Different equipment takes different amounts of space, so how much space is available?
- Defining the workflow for creating archival and online files.
This includes the process of creating metadata for the materials, scanning the images, correcting the images (cropping and color correction), creating full archives, producing smaller image versions for use online and implementing means for further optimization (optical character recognition for full-text searching). Will the images require major corrections, or does the material's importance as artifacts require that flaws be captured? Planning and structure are essential to ensuring that materials flow through this resource-heavy process.
- Choosing a means of maintaining the full-version archives so they can be accessed when needed.
Which types of storage are best for the institution? Full-version archive files—like 300 dpi TIFF files—are extremely large. Keeping these large files on servers quickly becomes expensive, especially given the need for redundancy. Is the best option to keep them on servers, burn them to DVD, or write them on computer tape backups? Any method still requires that the materials be relatively easy to access in case a full version is needed for a research project, there is demand for a print copy of the materials, or many other reasons.
- Choosing a means for displaying the materials online.
In order to be scalable and extensible, the database should include not only the items and related metadata, but also a means of searching across the items and future collections. Will the library or institution alone be able to maintain the digital collection given the additional infrastructure requirements for servers and programmers? If not, is digitizing materials and then contributing to an existing repository an option? Are there area or subject-specialty collections that would host additional materials?
- Promoting the digital collections to help the them grow and to link them with existing projects.
Will the new materials add to a particular collection elsewhere? How will the new materials be promoted—through scholarly organizations, email lists, Wikipedia?
- Sharing information and resources about the digitization process and about the collections.
In addition to publicizing digitized collections, how will development processes be shared with the digital library community? Opportunities to contribute to the underlying infrastructure of digital library creation include: listing processes and specifications on a section of the digital collection's web site, presenting at conferences, publishing articles about the processes, sharing improvements through digital library organization meetings and exchanging information over email lists.
All of these steps are part of a successful recipe for building a digital library collection. Different digital collections have their own individualized models, and new projects can pick and choose components to use for creating new digital collections.
The most important points to remember in building a digital collection are to plan and to share. Librarians looking to develop and refine a strategy for creating new digital collections can gather project development information from existing resources.4 Following the creation of new digital collections, librarians should then share information about the processes followed and the collections themselves. Brooks' thoughts on software development and project documentation are again relevant: "The whole profession can only profit from sharing such data" (p. 21). Sharing information about digital library collections benefits everyone by adding to the digital library cookbook. Sharing such information also raises awareness about the collections, increasing the overall use of materials and libraries as more people learn about newly available collections and the many others waiting in the wings.5
Brooks, Frederick P. The Mythical Man-Month: Essays on Software Engineering. Reading, MA: Addison-Wesley, 1982.
1 Digitizing materials is especially rewarding when the materials meet particular needs. Those needs could be to digitize rare and brittle materials to ensure that they will not be damaged through over-use, to create an institutional repository of internal documents, or to digitize materials so that they could be used and connected in innovative projects.
2 Brooks' work is based on the idea that some work cannot be partitioned easily or at all, as is currently the case for digital library development. Brooks' ideas have in some cases been eclipsed by modern software programming methods, but they remain useful for certain types of projects.
3 Sustainable formats should follow existing standards, which can be found or derived from existing digital collections like the University of Virginia, and from groups like the Open Archives Initiative.
4 Existing resources include other digital libraries, various archives and repositories, and organizations like the Digital Library Federation. The University of Virginia offers a useful terminology page for help in navigating the many resources.
5 Sharing information includes learning from existing sources and contributing new information on digitization, as well as contributing materials to various collections and groups like the Internet Archives, Project Gutenberg, and the International Children's Digital Library.