Transcribing Audio is Much Harder than You Think

“The three great essentials to achieve anything worthwhile are, first, hard work; second, stick-to-itiveness; third, common sense.” ― Thomas Edison

For someone who speaks English, understands English and can type, transcribing audio should be a breeze, right? Wrong.  Transcribing audio to text is harder and more involved than most people assume.  When employers make this mistake, it can cost them valuable time and resources.

According to the National Center for Voice and Speech, the average rate of speech, for English speakers, is around 150 words per minute.  According to high estimates, the average typing speed for a professional typist transcribing audio is between 50-80 words per minute (WPM) , with two-finger typists measuring 27 WPM.  The true average is closer to 40 WPM. Finally, consider the error rate.  The average error rate is 6%, or about 1 out of every 17 words when transcribing audio.  So, even using the highest numbers, that works out to each minute of audio taking twice as long to transcribe.  One hour of audio will require at least two hours to transcribe, contain 9000 words and 540 errors.  And that is transcribing audio is under ideal conditions with a highly trained professional.


Transcribing audio with multiple speakers, heavy regional accents or difficult or specialized terminology.  Like that required for medical practices, law offices or law enforcement agencies – and the time required to produce a high quality, properly formatted transcript could easily double or triple.  When employers tack on transcription duties to someone as an extra task, they are removing that person from their primary role and their overall productivity will suffer as a result.

Instead, employers would be wise to heed the above quote from Mr. Edison.  Work hard at your business, be persistent and use common sense.  When it comes to transcribing audio, the common sense approach is outsourcing.

