Choose your battles wisely (especially when transcribing)

When I wibbled about machine transcription a few days ago, I concluded that it was perhaps still too much in its infancy to be of any use. However, I thought I would give it a second chance, as I’ve no doubt that I will want to transcribe content from videos in the future, as I have done in the past. And I’m glad that I did, because my second test revealed that my decision to ignore the advice given me by Sonix (the transcription software I used) was, uh, shall we say, “unwise.”

Why I take the time to transcribe content

In my first attempt at machine translation, I made several mistakes, and they led me to a bad result — and a worse conclusion. Allow me to begin again, from the top (or, if you wish, you can skip to giving Sonix another chance).

Transcription is a laborious, tedious, time-consuming task. Why do I bother? Well, there are several reasons:

  1. It’s an attempt to engage my audience: I believe that while some people may be reluctant to take the time to press ‘play’ on a video presented on a blog, they may take a couple of minutes to at least scan through a transcript.
  2. It’s an accessibility thing: Some people are deaf.
  3. It’s an SEO (search engine optimisation) thing: Search engines can index text; they can’t index audio (at least, not currently).
  4. It’s a ‘content-padding’ thing: A post that just contains a video, well, in my book that isn’t much of a post.
  5. It’s an attempt to retain my audience: The uploader of that Richard Dawkins interview, for instance, had disallowed that video’s embedding. (I put this one last, because, on the whole, I respect my audience’s ability to click away whenever the hell they choose to do so, and I consider the tricks some sites use to try to keep eyeballs as irritating — and ultimately futile — endeavours.)

Giving Sonix another chance

Here’s how it went: I was spending some time doing ‘the other side of blogging’ (visiting others’ blogsites), and I chanced upon a post by Debbie of ‘An Off Grid Life’, entitled ‘Climate Change: Fossil Fuels 101‘ (dated 02Apr2018… three years ago§). The post was just a video presentation, by Debbie. The video looked a bit lonely, sitting there all on its own; I thought it could use a transcription to accompany it. And I thought it would make a good subject for a second test.

As it turns out, it was!

As a reminder, this is the advice that Sonix gives as standard on its site when you upload a file:

Better audio = better transcript

Please upload the highest quality audio that you can. Poor audio quality, background noise, speakers talking over each other, and slurred words don’t jive with our automated transcription algorithms.

In my earlier attempt (‘The Lumberjack Song’), I’d converted the YouTube video to audio first; I should perhaps have realised that might be a problem, since any file conversion will almost inevitably cause data deterioration. As I noted in my earlier post, I’d also missed the point that Sonix allows you to use YouTube videos directly. Now, Debbie’s video isn’t on YouTube, but I took the hint; I downloaded the video file from her page, and uploaded it to Sonix.

When the upload had completed (it took about a half hour), instead of the ‘poor quality’ warning (the one I’d ignored) Sonix’s system had given me about the ‘Lumberjack’ file, this time it confirmed that this one was likely to produce good results. So, as Debbie is a USAn, not a Brit like me, I selected ‘American’ as opposed to ‘English’ (I figured that maybe accents would play a part), and told Sonix to do its thing. It did, and in about ten minutes it announced that it had finished.

Another thing I did differently on this pass was that, instead of copy-pasting the raw transcript from Sonix into the clunky WordPress block editor, I used the Sonix editor instead. And I’m so very glad that I did! The Sonix system presents the video and transcript together in the page, and synchronises both; you click on a word in the transcript and the video tracks to that point. It’s brilliant! None of the changes I made to the text caused any problems whatsoever. And, unlike the <spit> WordPress block editor, the Sonix one behaved as any well-behaved editor should: no lagginess; immediate response.

It took me about an hour in all to finish proofreading the text. At a guess, I’d say that my old method (play a short section of the video on one monitor, type in the text, scan back to ensure I’d got it right and to double-check on uncertain bits, rinse and repeat until the end and then run through it all once again) would have taken at least five times longer.

A satisfying result

Here’s the transcript that Sonix came up with. I’ve edited it a little bit for clarity, and have added some links where appropriate. But the important point to note is that the original transcript, in this case, didn’t require very much editing at all. Sonix had correctly omitted all of Debbie’s “errs” and “ums,” and although it had included some repetition that clearly needed to be cut (for which it could be forgiven), it had, this time, done an admirable job.

Welcome to ‘An Off Grid Life’. My name is Debbie, and this is the second in a series of talks on the environment. The first one was ‘Climate change and how we’re insulated‘. It’s easy to forget that climate change is happening and it’s also easy not to know how it happens.

This is ‘Fossil Fuels 101’, and this is basically how and what we’re doing to get all that pollution in the air. Basically, we’re burning fossil fuels every minute of the day. And you could be in your living room reading a book; you’re burning fossil fuels. And those fossil fuels are coal, natural gas and oil. And up front, let’s talk about 50% of all of the CO2 in the atmosphere is coming from buildings. And each of these fossil fuels provide the energy for not only our home, but ‘Big Box stores,’ hospitals, businesses and warehouses. So, 50% of that is coming from buildings. We’re talking about coal, oil and natural gas. Let’s take coal: we used to get 50% of our electricity from coal, but now we get about 30% because natural gas is providing some of that — which is not a good thing, and we’ll go into ‘why’ in a minute — but we have approximately 572 coal plants in the United States and about 2,300 worldwide; and when they’re burning coal, mercury, arsenic and lead (all toxic things) are spewing into the atmosphere. You don’t want to live near a coal plant; you don’t want to be near coal extraction, like going into a mine or mountaintop removal. So, coal provides our electricity.

And next: natural gas. Sixty percent of our heat in the United States comes from burning natural gas, and the people that are getting heat from natural gas are getting it from propane — still a fossil fuel. So, it was about maybe 10 years ago, that they discovered or designed a new process, and it’s called ‘fracking’. And I’d like to recommend a documentary that is awesome: it’s called ‘Gasland‘ [trailer on YouTube]; it’s by Josh Fox. I’ve seen this thing three times; I’ve written down notes; it’s awesome. It takes step by step how he discovered this because it was happening in his town in Pennsylvania and he took a trip around the United States and found out all about fracking. ‘Gasland 2‘ is also another documentary he did and how it got to be legal and why it’s not going away.

So, fracking: they used to just go straight down into the ground, thousands of feet to break up shale. And so now they go down and they go horizontal, which means millions of gallons of water are going down into this well, with hundreds of toxic chemicals (that they won’t disclose) and sand, silica sand. So, the toxic chemicals are not safe — and neither is the silica sand; you can get silicosis from either in a sand mine or if it’s in your town floating through the air. So, this water is going down into the well. Half of it comes back up and it’s called ‘produced water’, which may go into a holding pond — which may or may not have a liner — and wildlife and birds and insects are drinking this. They’re getting sick. They’re dying. This water is being trucked off to get treated. It has to be treated in a toxic facility, I mean, a facility that handles toxic stuff. And it’s not that they’re not having a good result with that. So, the rest of the water that’s in the well, the casing eventually cracks and the water goes into aquifers and other groundwater, people’s wells. Yes.

In the movie they [showed] that set on fire, water, water faucets because of the natural gas. Methane is the main ingredient in natural gas, which is a highly concentrated greenhouse gas, worse than just CO2. And also, they’re finding there’s earthquakes happening in areas that are fracking, fracking is done in a large portion of the United States. And also the holding tanks for the methane, for the natural gas, is leaking out. Methane: not good. So fracking, it’s legal. It should not be; it should be stopped. New York State banned it; everybody should ban it.

Oil is the last. And I save this for the last because oil is the big kahuna of net fossil fuels, because without oil, we don’t get anything. We don’t get coal out of the ground. We don’t get natural gas out of the ground. We don’t get food. We don’t get clothes. We don’t get anything out of the Big Box store. We don’t provide everything in all our heat and energy. Some years ago, a couple of years, it was at 84 million barrels a day worldwide. That’s how much we’re going through. But the International Energy Agency — I looked it up — we’re at 96 million barrels: and each barrel is 42 gallons. How we have any left is beyond me.

So, we have two big economic players that are emerging: China and India. And I mean, they just didn’t happen overnight. It’s been happening because China does almost all of the manufacturing for us. And also India is doing most of the computing like call centers and imaging issues processing. So those two countries: 1.4 billion people in China, 1.2 [billion] almost in India: we only have 320 million people in the United States. The United States consumes the most resources per person and also pollutes the most per person. But now we have these other two players [whose] standard of living is increasing and they’re buying cars and living a lifestyle close to what we’re doing. So, that’s what the increase is.

There’s also another thing that needs to be watched and that is a number; a measurement of the CO2 in the atmosphere, which let’s say ‘350’, 350 parts per million. And I don’t know exactly when we got to 350, but it was a while ago; that’s considered safe. 450 parts per million is unsafe. We’re — a couple of years ago we were at 402, so I looked it up; NASA has us, in January of 2018, at 407.98: let’s call it 408; four hundred and eight parts per million. And NASA had it at 402.5 in January of 2016 — two years ago. So it went up 5.5 points, and it was going up two points every year, so it’s increased.

So, if we do the math, we have about 14 years until we get to 450. You know, 14 years is nothing. And that’s exactly what the United States is doing: nothing. Countries like Norway, Sweden, Germany or Costa Rica trying to get off of fossil fuels are putting solar panels up. No, we’re all about just burning fossil fuels.

The other thing is the Center for Climate and Energy Solutions, a great website that details all of the climate talks and what was achieved — or not achieved — during those talks. They started them in 1992 and Business Insider in June 1st, 2017, cited that or stated that 195 countries are on board with the Paris agreements that happen in 2015 and the only two countries that weren’t on board: Syria, which is in a civil war and not worrying about climate when you’re in a civil war; and Nicaragua said, no, it doesn’t go high enough.

So, Trump gets into office and pulls us out of the Paris agreements, which won’t go into effect until 2019. But the greatest country in the world (supposedly) pulls us out of the Paris agreements. The rest of the world is all on board and we’re pulling out? What is it, 99.9% of the scientists are on board? Everybody is freaking out and we need to do something. And also, the Union of Concerned Scientists is also a great site to go to for information on fossil fuels and things like that. A great site. Google that.

So, we need to do something about this. The more we know about it, the more we can do something about it. I hope you will talk with your friends and family about this. Do some, you know, little researching or investigating on this. I know this is a condensed version of fossil fuels, but that’s what we’re doing. That’s what we’re burning. That’s what we’re up against. So I hope you’ll join me for future talks on what we’re up against with climate. And thank you for listening.

(Initial transcription by Sonix, tidied up by yours truly)

Lessons learned

  1. Choose your sources wisely (Monty Python songs may not be the best choice!).
  2. Use the original file where possible (assuming it’s compatible).
  3. Pay attention when Sonix advises that an upload may not be of sufficiently high quality!
  4. Use the Sonix editor, not the WordPress block editor, when proofreading.

This can lead to a substantial saving in transcription time, as, given a low-noise input, the output Sonix provides avoids an awful lot of typing!


In recent years, I’ve come to detest technology for its obsession with ‘upgrades-that-fix-things-that-aren’t-broken’ (and quite often breaking them in the process) and, in particular, its failure to live up to its promise of removing drudgery from our lives. Sonix has proven that this can still be done; their tool is going to save me absolutely hours in future. Its developers deserve much kudos for developing a system that does such a fine job, and in such a user-friendly way. Even if you’ve never transcribed any audio before, I highly recommend that you take Sonix for a spin!

I want to again offer a hat tip to Goldie, who set me on the path to finding this helpful tool. Take his advice: Do Not Be Afraid To Try New Things. :)

P.P.S. As at March 2021 — three years after Debbie’s presentation — we’re now at 417.64 ppm CO2; another ten point increase :( The graph below (to which Funda at kindly pointed me) answers Debbie’s uncertainty about when we passed the ‘safe’ 350 ppm barrier — the answer is: 1988. I don’t know about you, but I find the trend of this graph even more scary than the Indiana Jones clip above.

Global CO2 atmospheric concentration, 1833-2018
Global CO2 atmospheric concentration 1833-2018
Source: Our World in Data

§ An Off Grid Life does have current content; I was ‘?Random Raiding!‘ it, you see :)

