Using AI: Tips, Techniques, and Pitfalls
- Combining high-speed capture with AI-driven text extraction can transform static physical records into dynamic, searchable datasets.
- Success with Large Language Models (LLMs) relies on precise prompt engineering, including domain-specific glossaries and iterative testing.
- AI is an efficiency tool, not a replacement for human oversight; robust quality control workflows are essential to catch “hallucinations.”
When the Utah Historical Society (UHS) began its transition to the new, purpose-built Museum of Utah, the team faced a modern archival dilemma: how to maintain access to critical reference materials in a shrinking physical footprint.
With the new engagement room offering less space for their extensive card catalogs, Digital Asset Specialist Michelle Gollehon devised a forward-thinking solution. By pairing the rapid capture capabilities of the DT Versa with the data parsing power of Artificial Intelligence, UHS is bridging the physical-digital divide, ensuring these vital historical records remain discoverable for future generations.
The Challenge: A Physical Move Sparks Digital Innovation
The impetus for this project was logistical, but the opportunity was transformational. The upcoming Museum of Utah features a state-of-the-art design with large windows and open spaces, features that enhance the visitor experience but significantly reduce the wall space available for traditional reference collections. The card catalogs, a primary resource for answering research questions, could no longer physically reside in the reading room.
Rather than simply moving the cards to off-site storage and accepting reduced accessibility, Gollehon saw an opportunity to enhance accessibility. The goal was not just to scan the cards as static images but to convert the information they contained (bibliographic data, subject citations, and building permits) into structured, searchable metadata.
This would allow the physical cards to be permanently stored in compact shelving while making their content more accessible than ever before.
The Solution: A Hybrid Workflow for Mass Digitization
To achieve this, the team implemented a hybrid workflow that leverages the strengths of both hardware and software. The first step was high-speed digitization using the DT Versa. Gollehon’s team can capture between 1,500 and 2,000 cards per day, creating high-resolution preservation-grade images. These images are then compiled into OCR-processed PDFs, which serve as the raw material for the AI.
Gollehon emphasizes a critical lesson for any institution exploring AI: the principle of “garbage in, garbage out.” The accuracy of the AI’s text extraction is directly dependent on the quality of the initial image.
By using the DT Versa to capture sharp, evenly illuminated images, UHS minimizes OCR errors from the start, giving the AI the best possible chance to accurately parse the text.
Mastering the Prompts: Speaking the Language of AI
A significant portion of the project’s success lies in “prompt engineering,” the art of giving the AI precise instructions. Gollehon discovered that generic requests yielded generic, often unusable results. Through iterative testing with Google Gemini, she developed a robust set of protocols.
- First, the AI must be taught the specific “language” of the collection. Gollehon provides the model with a glossary of domain-specific abbreviations, ensuring it understands that “UHQ” stands for “Utah Historical Quarterly” or that “SLC” means “Salt Lake City.”
- Second, the output format must be strictly defined. Rather than requesting a general transcription, she instructs the AI to generate a Markdown table with specific columns for names, citations, and notes.
This structured approach turns unstructured text into clean data ready for ingestion into their Digital Asset Management (DAM) system.
The Human Element: Why Oversight is Non-Negotiable
Despite the power of these tools, Gollehon is candid about their limitations. AI models are prone to “hallucinations” (confidently inventing data to fill in blanks). To mitigate this risk, UHS implemented a “human-in-the-loop” workflow.
The innovative twist was asking the AI to police itself. Gollehon included a specific “transcription notes” column in her prompt, instructing the model to flag any text that was faded, illegible, or ambiguous.
Instead of guessing, the AI flags these entries for human review. This allows the team to focus their quality control efforts on the 10% of cards that need expert attention, while the AI handles the routine data entry for the other 90%.
Digitize Diverse Collections with the DT Versa
The Utah Historical Society relies on the DT Versa to rapidly digitize thousands of index cards with preservation-grade quality. Designed for versatility, the DT Versa handles everything from bound books and manuscripts to loose cards and flat art, making it the ideal solution for institutions with diverse collections.
Ready to modernize your institution’s digitization strategy?