August 16, 2016
Written by Lori Mankin
We’ve all been victims of voice text. Whether Siri pulls up the wrong location or you send your friend a slightly awkward text message, voice recognition certainly isn’t perfect. And when you think about it, that shouldn’t be too surprising.
Overall, speech is complicated. Though other animals communicate with sound to a certain extent, it’s nowhere nearly as complex as English. Each word we speak contains phonemes made up of sound packets called phones, and computers process these phones to understand what we’re saying. Phones represent the sound each letter makes to collectively form the sound of the word. Think about all the different phones in the word “training.” Now think about all the different phones in the phrase “knowledge management.” See where I’m going with this?
In addition to keeping up with the pace, computer software faces other obstacles processing speech. For example, everyone’s voice varies with different tones and accents. Furthermore, the English language is full of homonyms, which makes it difficult to determine the correct spelling when words are processed without context. Imagine the challenges programmers face when creating software that can keep up with a sales person who speaks at a mile a minute. It takes you back to learning a foreign language and getting overwhelmed the first time you hear it at a fluent pace. Slooooow doooown.
We should probably cut computer software some slack. Voice recognition has come a long way and there are many approaches that were developed over the years. There is simple pattern matching, which is often used by automated call centers. So when you have to give your cable company your account number before talking to a real person, the cable company is likely using simple pattern matching, and their software is comparing the sounds in the word “six” to patterns stored in memory. OK we’ve all screamed “no!” at these things at one point or another so maybe they still need work.
A more efficient form of speech recognition is language modeling and statistical analysis. This is where the computer’s memory uses grammar, speech patterns and relationships between words and sounds to determine likely word pairings. You see a form of this when you text on many smartphones: Based on grammar and history it guesses which word you want to use.
Finally, the heavy-weight champ of voice recognition is artificial neural networks (ANNs). This is what Bloomfire uses to transcribe videos in our knowledge base. Simply put, scientists taught computers to recognize patterns the way the brain does. Super creepy, yes. But over the last 30 years it’s become incredibly fast and accurate. ANNs are used in many different ways not just for voice recognition. ANNs help fight crime. Large financial institutions use them analyze credit card transactions to detect fraud. They even analyze the air for chemicals to detect bombs at airports.
The advancements in the technology behind voice recognition have made video transcription possible. Video transcription allows us to make knowledge sharing that much easier. It allows us to make videos in Bloomfire searchable, and automated time alignment allows you to find the exact part of a video you need. Storing videos isn’t enough.
It’s no surprise why video capture and upload features are staples for knowledge management solutions. Videos are a critical tool in training and onboarding, and for good reason. Watching a one-minute video has the impact of reading 1.8 million words, according to Forrester research. However, businesses that want to use video for knowledge transfer often miss one essential piece of the puzzle: video transcription. There’s a reason most knowledge management solutions don’t have it. Video transcription relies on voice recognition and voice recognition is incredibly difficult.
Harness The Power Of Knowledge Sharing With Digital Transformation
Companies that grasp what the digital workplace is really all about are willing to change the ways people and applications connect across their organizations. By fostering a digitally driven culture of collaboration, they break down silos, share knowledge more effectively, and compete more successfully.Download Now