Sunday, July 20, 2008

A Few More Thoughts on Googles Speech-to-Text Technology

A few days ago the Berman Post had the article "Berman Post: Google Makes Politicians Speeches Searchable" about, well, Google's Speech-to-Text technology. I have a few more thoughts to discus on the mater of Speech-to-Text technology.

In order to pool the worlds knowledge effectively, you need to make everything searchable on the micro scale. Think about this; what if you could only search blogs by title? What if the only way to figure out what the blog contained was to read it in its entirety? Worse yet, imagine that the words appeared at a predetermined speed and disappeared just as quickly. You would not be able to jump ahead, or review what you just read without restarting the process over from scratch, or jumping around trying to find the right spot through random guessing. Such is, or used to be, the case with knowledge recorded verbally. Searching for knowledge recorded on an audio format used to mean searching through the titles and then listening to which ever seemed the most promising.

The result, most information recorded in an audio format goes unused and ignored unless someone painstakingly goes through and transcribes it. Even then, there is the persistent problem of inaccuracy. A single word addition or miss could change the entire meaning. In order to figure out if their is an error you would have to try to find the place in the recording that the words were transcribed from and listen yourself. Finding that specific point is usually time consuming, and this process combined with the possible errors means that a lot of transcribed work goes unused as well (or is used with no regard for the possible errors it contains).

Google's (and some others who are working on the same thing) new technology fix this problem. Google as automated the transcription process taking away the painstaking work that used to have to be done by humans. To combat the problem of transcription errors, the text points to the exact moment in the video where it was said. That way, if you want to make extra certain the transcription is correct, you can jump right to the point where it was said instead of trying guess your way to the right spot.

Currently, Google only has this technology available for political videos on YouTube. With the rise of "vlogs" (video blogs) this technology could not come to soon. Combine the vlogs with the 24/7 news casts and pot casting, and you can see how this technology will allow for a significant chunk of human knowledge/experience to finally become searchable on the micro scale.

No comments:

Post a Comment

Related Posts with Thumbnails

Like what you read; Subscribe/Fan/Follow