Fitchburg State University: 55 hours of oral history captioned in a day

Center for Italian Culture, Fitchburg State University

The collection

The Center for Italian Culture at Fitchburg State University has been running an oral history project since 2001. Over more than two decades, the center has built a 35-interview collection capturing the experience of first, second, and third generation Italian immigrants in America. Some interviews are audio only, some are video, and they range from 30 minutes to two and a half hours. In total, the collection holds roughly 55 hours of material.

Ross Caputi, Assistant Archivist for the collection, is the person responsible for getting all of it online and accessible.

The compliance problem

The new Title II of the Americans with Disabilities Act requires public institutions to make all public-facing material accessible. For Fitchburg State, that meant a year of scrambling. Faculty had to make course PDFs machine-readable. The library had to remediate its public catalog. And Ross had to figure out how to put captions on every minute of the oral history collection.

His complication was JSTOR. Fitchburg State publishes its community archives through JSTOR's community archives platform, and JSTOR does not accept PDF transcripts attached to recordings. It only accepts caption files in SRT or VTT format, attached directly to the audio or video.

So the work was specific: turn 55 hours of recorded oral history into clean, edited SRT or VTT files.

"This year we've had a pretty stiff workflow trying to convert all of our transcripts into caption files. We've tried a number of different tools, but Alice app got through all of it in a day. It was very, very fast, and as accurate or more accurate than any."

What didn't work

YouTube. Ross uploaded the video interviews to YouTube and used the automatic caption generator. Accuracy was reasonable, and the in-place editor let him adjust both text and timestamps, which mattered because oral history work involves cleaning up false starts, ums, and incomplete sentences out of respect for the narrator. But YouTube's caption export was the wrong format. JSTOR needed SRT or VTT. YouTube exported in a different file format which meant every file then had to be hand-reformatted.

Screen Pal. The university provided Screen Pal as an option, but it only worked with video. The audio-only interviews had to be converted into video files first, with a blank image and the audio running underneath, before they could be uploaded. The editor was hard to work with, especially the timestamps, and it slowed the editing down considerably. The one upside was that Screen Pal could export VTT directly.

Generic voice-to-text tools. Ross tried playing interviews next to a recorder running speech-to-text. The transcription part worked, but none of the tools could export in the formats JSTOR required. He was back to hand-formatting caption files.

"It seems like there are tools and services which provide you bits and pieces of the solution, but not everything."

What Alice did differently

Ross uploaded the full collection to Alice and had caption files for all 55 hours within a single day.

Three things stood out to him.

Accuracy was better or at least as good as anything else he had tried. The errors were the ones he expected — Massachusetts town names that are famously hard for any model, or uncommon people names. None of those are problems any transcription tool fully solves. On everything else, Alice held up.

Caption breaks landed in natural places. The other tools he tried broke captions mid-phrase or at awkward points in a sentence, which meant another pass of manual cleanup.

"Alice did a particularly good job at breaking the caption either at a natural shift in the intonation of the sentence or a natural pause between phrases. Others that I've tried just broke the captions in odd places. It appeared kind of odd on the screen, and it just ended up being one more thing that I needed to edit manually."

Editing and export happened in the same app. This was the biggest one. Every other workflow he tried split the job across multiple tools: generate transcripts in one place, edit in another, convert formats in a third. Alice handled data ingest, transcript, editing, and export to SRT and VTT in a single environment.

"Alice can be your single tool from start to finish. Going from data, to transcript, to caption file, however you need it. Then the editing process, and being able to export in the particular file format that you need. Whereas we needed multiple tools to do all those things in the past."

The numbers

Caption files for all 55 hours of material were generated in one day. Ross is still working through the light editing pass, but estimates Alice will save 40 to 80 hours of editing time across the project. That is one to two full work weeks recovered.

Advice to other archivists

Ross's advice to other archivists and accessibility leads facing the Title II deadline is direct. Think ahead about two things before committing to a tool: how much editing the transcripts will need, and what file format the destination platform actually accepts. Many tools handle one piece of the workflow well and break on the others. The tools that try to do everything tend not to excel at any of it.

"The AIs that try to be like a Swiss army knife, jack of all trades, tend to not excel at any of them."

For oral history work specifically, where transcripts need real editing before they go live, he recommends Alice for being able to take a project from raw audio or video all the way to a publishable, correctly formatted caption file without leaving the app.

About the Center for Italian Culture The Center for Italian Culture at Fitchburg State University runs a 35-interview, 55-hour oral history collection documenting the immigrant experience of Italians in America. The collection is published through JSTOR's community archives. You can visit it here: https://www.jstor.org/site/fitchburg-state/sogni-d-oro-collection/

To learn more about Ross Caputi, visit: https://rosscaputi.com/

About Alice Alice is the fastest, most accurate way to turn audio and video into editable transcripts and caption files. From uploading to publishable VTT or SRT in one app. Try it free at aliceapp.ai.