Using Voice Typing to Create Transcript Caption File

If you're looking for another workflow to generate transcripts for Google Drive videos than can be uploaded as captions, described in a previous post, Google Docs voice typing provides a possible starting point. 

During the testing of Caption Creator for Google Drive, created by Jordan Rhea of plnnr.net, I made 13 copies of the same 47 second .mp4 file trying different caption file formats and recording the subsequent results. Once I figured out that you could upload transcripts as .srt files without any of the timings and the captions would auto-sync, I tried Google Voice as a way to capture the transcript from the video. 


At this point I was tired of using the same video and I wanted to try the process with a longer video. Not having any of this media in my Google Drive, I downloaded a 2 minute video from my YouTube Channel and then uploaded it to Google Drive. Using the Tab Resize Chrome Extension, I created the side-by-side display with the video on the left and an open Google Doc on the right. I began playing the video and quickly enabled Voice Typing within Google Docs (Tools > Voice Typing) shortly thereafter. The video above captures some of this process. Using the video excerpt, these are the results of transcripts captured via voice typing and those created manually.


This is a 2 column table that contains transcript excerpts created two different ways. One through voice typing that lacks capitalization and punctuation and has a few errors compared to the transcript created manually.

Voice Typing did a fair job. In terms of content, "different conference time slots" was interpreted as "different poppers time slot" and lacks any capitalization or punctuation. In this case, using Voice Typing as a tool to capture a transcript provides a starting point that would need to be followed up with editing and verification of content before uploading it as an .srt file. Uploading a voice typing generated transcript as an .srt file without further editing would detract from accessibility and is similar to the terrible auto captions found on YouTube. But it is a tool and a great starting place both for creating the transcript and having a conversation about how quality, accurate captions provide learning opportunities for a much more diverse and inclusive audience.

To be fair, this method has only been tested at my kitchen table in a very controlled acoustic environment that may be difficult to replicate in a classroom setting. I will be interested to see how this possible workflow might be applied with students to create accessibility. 


No comments