How not to use technology to help with (machine) transcription

I’ve transcribed several videos here on Wibble. The most recent example was a short while ago: four-and-a-half minutes of Richard Dawkins talking about Douglas Adams. And almost immediately, Goldie blew my mind with a brilliant comment, asking whether I’d used voice-to-text technology to create the transcript. That idea had simply never occurred to me!

Transcription is a laborious, tedious, time-consuming task. Why do I bother? Well, there are several reasons:

  1. It’s an attempt to engage my audience: I believe that while some people may be reluctant to take the time to press ‘play’ on a video presented on a blog, they may take a couple of minutes to at least scan through a transcript.
  2. It’s an accessibility thing: Some people are deaf.
  3. It’s an SEO (search engine optimisation) thing: Search engines can index text; they can’t index audio (at least, not currently).
  4. It’s a ‘content-padding’ thing: A post that just contains a video, well, in my book that isn’t much of a post.
  5. It’s an attempt to retain my audience: The uploader of that Richard Dawkins interview, for instance, had disallowed that video’s embedding. (I put this one last, because, on the whole, I respect my audience’s ability to click away whenever the hell they choose to do so, and I consider the tricks some sites use to try to keep eyeballs as irritating — and ultimately futile — endeavours.)

Machine transcription is still in its infancy

Running with Goldie’s idea, I began investigating. It didn’t take me long to discover that there’s a variety of offerings that might at least help in the transcription process. No doubt some provide better results than others. One opinion I came across was this:

Since speech to text reliability is so low there are no professional applications which do this as it is more efficacious to use a human for this task. In studies, even with relatively clean audio such as newscasts, it takes longer to have a human proof and correct the output than to just have a human listen and type in the text.

Former Chief Science Officer James Francher on Quora (2019)

Having done just that myself — transcribe text from audio — on several occasions, I’ve always found it necessary to play the audio for a few seconds, transcribe what I’d heard, then continue, and repeat, often back-tracking. I guess my typing speed just isn’t good enough. I frequently have to replay the same section to get it right — and even then, whenever I finish the job, when I go back to check it I invariably find that I’ve made mistakes. I figure that as my proofreading ability is pretty good, correcting a machine transcription should save me some time.

Here’s the solution I came up with…

I thought I’d share my experiences with the solution I’ve tried in the hope that it might enlighten others. On the face of it, it appears to be a quite straightforward process:

  1. Select a YouTube video you want to transcribe.
  2. Convert it to an audio file (I used Ontiva: free, no registration required).
  3. Convert the audio file to text (I used Sonix: trial account includes 30 free minutes).
  4. Proofread the result.

1. Select a YouTube video

Monty Python’s ‘Lumberjack Song’ seems a good choice: it’s only two and a half minutes long

2. Convert to an audio file

Ontiva.com gives a choice of formats. I actually chose 64k MP3 — I didn’t notice the ‘audio’ tab! But it didn’t seem to matter — not at first…

Audio conversion of ‘The Lumberjack Song’

3. Convert to text

Sonix, I see now, offers the choice of selecting a YouTube video directly, which would cut out step 2. But I didn’t go that route. I did get a warning that the quality of the file wasn’t the best.

Better audio = better transcript

Please upload the highest quality audio that you can. Poor audio quality, background noise, speakers talking over each other, and slurred words don’t jive with our automated transcription algorithms.

Here’s the raw output from the Sonix transcription:

Yes, a lumberjack clipping from three to three years on the mighty rivers of British Columbia, the giant with launch for the mighty sky high. What about the smell of fresh cut timber? The crush of mighty trees? With my best girl by my side, we sing, sing, sing. I'm a lumberjack and I'm OK. I sleep all night and work all day. And he's a lumberjack and he's OK. Is it so nice that he works all day? I cut down trees, I eat my lunch, I go to the lavatory on Wednesday. I go shopping and have buttered scones for tea. You got his lunch. He goes to the lover's dream on Wednesday. He goes shopping and he is a lumberjack. He sold his soul mate and he works all day. I cut down trees. I skip and jump. I like to press wild flowers. I'll put on women's clothing and hang around in bars and. He got some trees him sometimes he likes to dress wildflowers, he puts on women's clothing and hang around in the Japanese, OK? You it so nice. And so I cut down trees. I wear heels, suspenders and a bra. Oh, wish I'd been a girly. Just like my dear papa. I cut down trees and high heels and glass, which should be just like my dear papa. And I thought you were some, but are you proud of your. And that was something completely different, bitterly. But I know that nothing I can say can alter the fact that in this restaurant you have been given a dirty, filthy, smelly piece of cutlery, wasn't smelly, smelly and obscene and disgusting. But I hated I hated getting nasty.

… not exactly wonderful! But, I guess that it is at least a starting point.

4. Proofread the result

Yes,! aA lumberjack! clippingLeaping from threetree to threetree years onas they float down the mighty rivers of British Columbia,. tThe gGiant Redwood, Larch, the Fir, with launch for the mighty sky highScots Pine.

What about my bloody parrot?

tThe smell of freshcut timber?! The crushcrash of mighty trees?! With my best girl by my side, we sing, sing, sing.!

I’m a lumberjack and I’m OK. I sleep all night and I work all day.

And hHe’s a lumberjack and he’s OK. Is it so nice thatHe sleeps all night and he works all day?.

I cut down trees, I eat my lunch, I go to the lavatory. oOn Wednesday. I go shopping and have buttered scones for tea.

You gotHe cuts down trees, he eats his lunch., Hhe goes to the lover’s dreamlavatory. oOn Wednesday. Hhe goes shopping and has buttered scones for tea. hHe is a lumberjack and he’s OK. He sold his soul matesleeps all night and he works all day.

I cut down trees. I skip and jump. I like to press wild flowers. I’ll put on women’s clothing and hang around in bars and.

He got somecuts down trees, he skips and jumps, him sometimes he likes to dresspress wild flowers,. hHe puts on women’s clothing and hangs around in the Japanese, OK? You it so nice. And sobars? He’s a lumberjack and he’s OK. He sleeps all night and he works all day.

I cut down trees. I wear high heels, suspenders and a bra. Oh,I wish I’d been a girly., Jjust like my dear papa.

[To be fair to Sonix, it’s not really surprising that it gets this next bit so badly wrong!]
I cut down trees and high heels and glass, which should be just like my dear papa. And I thought you were some, but are you proud of your.
/ I cuts down trees. I wears high heels, suspenders and a bra.
/ He cuts down trees, he wears… high heels? Suspenders? And a bra?

I wish I’d been a girly, just like my dear papa.

Oh, Bevin; and I thought you were so butch!

And that wasnow for something completely different.

[I’ve cut this part entirely, as it’s not part of the song.]
, bitterly. But I know that nothing I can say can alter the fact that in this restaurant you have been given a dirty, filthy, smelly piece of cutlery, wasn’t smelly, smelly and obscene and disgusting. But I hated I hated getting nasty.

Call me a completionist, but here’s the final result. I’ve left the blue text in, as I think it reveals just how poor a job Sonix actually did, in this case:

Yes! A lumberjack! Leaping from tree to tree as they float down the mighty rivers of British Columbia. The Giant Redwood, Larch, the Fir, the mighty Scots Pine.

What about my bloody parrot?

The smell of freshcut timber! The crash of mighty trees! With my best girl by my side, we sing, sing, sing!

I’m a lumberjack and I’m OK. I sleep all night and I work all day.

He’s a lumberjack and he’s OK. He sleeps all night and he works all day.

I cut down trees, I eat my lunch, I go to the lavatory. On Wednesday I go shopping and have buttered scones for tea.

He cuts down trees, he eats his lunch, he goes to the lavatory. On Wednesday he goes shopping and has buttered scones for tea. He is a lumberjack and he’s OK. He sleeps all night and he works all day.

I cut down trees. I skip and jump. I like to press wild flowers. I’ll put on women’s clothing and hang around in bars.

He cuts down trees, he skips and jumps, he likes to press wild flowers. He puts on women’s clothing and hangs around in bars? He’s a lumberjack and he’s OK. He sleeps all night and he works all day.

I cut down trees. I wear high heels, suspenders and a bra. I wish I’d been a girly, just like my dear papa.

/ I cuts down trees. I wears high heels, suspenders and a bra.
/ He cuts down trees, he wears… high heels? Suspenders? And a bra?

I wish I’d been a girly, just like my dear papa.

Oh, Bevin; and I thought you were so butch!

And now for something completely different.

Conclusion

Some of the phrases that Sonix came up with were hilarious! ‘Lover’s dream’ instead of ‘lavatory’? ‘Sold his soul mate’ instead of ‘sleeps all night’? The ‘Japanese’ bit really had me wondering, and missing out the ‘bloody parrot’ entirely, well, that’s just blatantly parroticist.

On the whole, the machine transcription did save me a little typing. But it clearly lacks the ability to recognise proper sentence structure, inflection and punctuation. Importantly, in a couple of places, I very nearly missed that the machine had omitted words entirely (such as the ‘high’ in ‘high heels’), so I would have been better off not using it at all. And I still had to listen to the audio, in some places several times, to check what had actually been said. So, not much time saved there.

To be fair to Sonix, their system did warn me up front that the file I’d uploaded was not sufficiently good quality, and that their algorithm couldn’t cope well with speakers talking over each other (which became particularly apparent in the last part of the song). Perhaps I’d have got better results from a different source file§ — or with a different text-to-speech system.

But still, for the moment at least, I’m forced to concur with Former Chief Science Officer James Francher: it takes longer to have a human proof and correct the output than to just have a human listen and type in the text…

Postscript

… especially when you also have to contend with the ‘wonderful’ new WordPress block editor. As I edited the ‘Proofread the result’ section above, this cruddy editor became progressively more and more sluggish. This post took about twice as long as it should have done because of that. Ah, the wonders of modern technology!


§ The answer to that is “Yes”: I’ve experimented some more since creating this post, and now realise that I should have paid more attention to the advice Sonix gave me after I’d uploaded ‘The Lumberjack Song’ file; the warning was very clear that its quality was poor. I’ve done another test since, on a different file, and the results have been much better; so much so that you can almost completely disregard my ‘conclusion’ above — but I’ve spent so much time on this post that I’m bloody well going to publish it as scheduled anyway; it’s, it’s, call it an object lesson in how not to trust first impressions :) I’ll follow up soon!

About pendantry

Phlyarologist (part-time) and pendant. Campaigner for action against anthropogenic global warming (AGW) and injustice in all its forms. Humanist, atheist, notoftenpist. Wannabe poet, writer and astronaut.
This entry was posted in art, Communication, Education, Just for laughs, Ludditis, Phlyarology, Strategy and tagged , , , , . Bookmark the permalink.

15 Responses to How not to use technology to help with (machine) transcription

  1. Herb says:

    This is very interesting. I’ll be watching for the follow-up.

    Liked by 1 person

  2. davidatqcm says:

    Tres interesant

    Liked by 1 person

  3. Lover’s dream and lavatory sound like pretty much the same thing to me.

    Liked by 1 person

  4. Hmmm… I thought you’d basically play the video while having an app opened that takes the voice and transcribes it. I guess your idea is more direct.

    Liked by 1 person

  5. Fascinating Was that you were wearing Glass High heels Cinderella? or was tnhat Papa. I don’t know I liked the way DSonixc wrote it.
    ;;
    ;;
    ;;
    Laugh You’ve earned it

    Liked by 1 person

  6. Pingback: Choose your battles wisely (especially when transcribing) | Wibble

  7. Pingback: A thorny issue: transcripts and intellectual property | Wibble

  8. Pingback: How not to use technology to help with (machine) transcription | Wibble | Ned Hamson's Second Line View of the News

I'd love to hear your thoughts...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s