The Power of Metadata: Readex and the Territorial Papers of the United States

Earlier this year Readex published the Territorial Papers of the United States, 1764-1953, the most important early American content not yet digitized—until now.

TPUS Interface Dec 2018.JPG

More than half of America’s states began as territories. From the 1760s to the 1950s the United States of America expanded southward and westward, acquiring territories that spanned from Florida to California to Alaska. Before they evolved into twenty-seven American states, these territories were managed by the U.S. State and Interior departments. The official history of their formative territorial years is recorded in Territorial Papers of the United States—a collection of Native American negotiations and treaties, official correspondence with the federal government, military records, judicial proceedings, population data, financial statistics, land records, and more.

About two thirds of these documents are in manuscript form. This means they cannot be made full-text searchable through the application of Optical Character Recognition (“OCR”) technologies. Yes, there are technologies today that can do a fairly decent job applying OCR to certain types of manuscripts, but the handwriting needs to be very clear, and extremely uniform, for the technology to work at all, and even then the results don’t match the quality that can be achieved from printed (as opposed to manuscript) documents.

The documents in Territorial Papers of the United States are from many time periods and in many handwritings, making them poor candidates for OCR application.

So how can a crucial resource like this be made accessible to students and scholars who live in a world of “full-text searchability?”

The answer: Create great metadata.

As most readers know, metadata is descriptive information that is applied to a document to help users find the document in a database and to understand the document in context. More succinctly, metadata is a set of data that describes and gives information about other data.

When full-text searchability is impossible, metadata is the key to findability. Users of Territorial Papers of the United States search the content in the usual way—either from the homepage single search box or from the more sophisticated Advanced Search—but the query is applied to the descriptive metadata, not to the “full text.”

That’s why the metadata needs to be “great.” Users rely on it to build queries that allow them to find documents based on crucial criteria like the following:

  • Territory name
  • Author of the document
  • Date of the document (often right down to the day of the month)
  • Type of document (report, list, letter, diary, field notes, and more)
  • Subject matter (Frontier and Pioneer Life; Civil War; Presidential Appointments)
  • and more.

Let’s look at an example.

Here is a document from December 9, 1839, covering a border dispute between the Missouri and Iowa territories. You are seeing the top of the first page of the document:

Image 1 TPUS Dec 2018.JPG

Here is the metadata associated with the document. Notice that many facets of the content are captured and described to a considerable degree of depth:

Image 3 TPUS Dec 2018.JPG

Image 5 TPUS Dec 2018.JPG

Researchers will find this document by searching on any one (or more) of the elements showing in the descriptive metadata.

He or she can start a search by Date or by Year (December 9, 1839); by Territory (Iowa or Missouri); by Place (Van Buren County, Iowa); by Notable Person (Uriah S. Gregory); by Document Author (Henry Heffleman or Robert Lucas); by Event (Honey War); by Subject (Tax Collection or Militia and National Guard); and more.

Equally important, a user can search using standard, well-known references like the National Archives and Records Administration (“NARA”) inventory number or NARA Record Group number. These numbers reflect NARA’s standard document-management protocols and are often cited in scholarly articles that draw on Territorial Papers of the United States.

Where does Readex get all of this metadata?

The answer: Many places.

First and foremost, we use descriptive data created by NARA. This tends to be highly accurate, but it is often limited in scope.

So we look elsewhere—to publicly available finding aids, to scholarly apparatuses created over the years in many ways and forms, and to standard “Finding Aids.” The latter tend to be extremely valuable.

One such finding aid is known as the Parker Calendar (formally titled “Calendar of Papers in Washington Archives relating to the Territories of the United Sates to 1873”). It was published in 1911 by the Carnegie Institute in Washington, D.C. The author—David W. Parker—assigned descriptive data to more than 9,000 documents in Territorial Papers.

Parker.jpg

The Parker data often highlights the most crucial items in a larger document, making it easy for users to “zoom” into the most important material instantly. This is important because many of the larger documents are actually compilations of many smaller documents—for example, a bundle of letters. In such cases, the large document is fully findable, but thanks to Parker, so too are the smaller items that live within it. The result: More users find what they need and want, and they find it quickly.

I should add one more point: Lots of the Territorial Papers metadata is created by Readex itself—by a team of highly skilled editors and subject-matter experts who assign subject terminology (and more) to the records. The taxonomy for this terminology is derived from the subject guide in Readex’s own digital edition of the U.S. Congressional Serial Set, which is considered by scholars and researchers to be definitive.

It’s a lot of work to build products this way—it takes four or five times more effort than it takes to create a typical “printed page” product—but at Readex we think it’s worth it. Indeed, such work is at the very center of our mission to provide the deepest, richest set of resources anywhere covering American history.

As of this writing, a large set of content is already available in each of the four series of Territorial Papers of the United States. Over the next year or so, the project will be fully completed. During this time we will continue to add metadata (including Parker metadata) at a regular pace.

Finally, I have to say yes, it is true that manuscripts often take a bit more effort to interpret, (after all, it’s often harder to read a handwritten document than a typewritten one), but such is the record of history. Not everything was printed, and indeed many of our most important documents have come down to us exactly as they were created, with all of the quirks and mysteries you would expect.


Back to top