Speech recognition begins to makes itself heard

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

NEWS

Decades of research and development into speech recognition technology are finally beginning to result in promising commercial signs, according to research firm Gartner.

After declining from a peak in 2000, the worldwide market for speech-recognition products is on pace to reach $130m in 2003, up from $128m in 2002, Gartner said on Wednesday. The figures show that buyers are once again beginning to show interest in a technology that many believe will ultimately transform the way people interact with computers.

The industry suffered declines in 2001 and 2002, after a peak of $140m in 2000, but now companies such as market leaders Nuance and ScanSoft are putting forward a good business case for speech recognition, according to Gartner. "Many implementations provide proof that solutions that use speech recognition can deliver business value, as cost savings or improved customer service," said analyst Steve Cramoysan in a statement. Efforts by Microsoft and IBM are adding momentum to the industry, Cramoysan said.

The technology has improved to the point where vendors can't compete solely on the basis of speech recognition success rates, Gartner said, while Internet applications and standards such as VoiceXML are helping to broaden the technology's appeal. Most products are used in call centres and business portals.

North America is the biggest speech recognition market, generating 61 percent of 2003 revenues, but this will decline as markets such as EMEA (Europe, the Middle East and Africa) develop, Gartner predicted. EMEA currently represents 26 percent of the market.

Giants of the high-tech industry such as IBM, Microsoft and Intel are continuing to invest heavily in improving the ability of PCs and servers to interpret spoken language.

Microsoft in July released the first public beta of its Speech Server, which lets servers better handle oral commands. Speech Server, formerly .Net Speech Platform, is an attempt to reduce the cost of creating automated phone response systems.

IBM, meanwhile, is using its research labs and services divisions to create showcase applications for large corporations. Financial services firm T Rowe Price has installed an account management system from Big Blue that lets its customers conduct transactions through common spoken requests.

In April, Intel released software that lets computers read lips, a step forward that could lead to better speech recognition applications. The Audio Visual Speech Recognition (AVSR) software tracks a speaker's face and mouth movements. By matching these movements with speech, the application can provide a computer with enough data to respond to spoken commands, even when these are given in noisy environments.

Most in the industry agree that it will take some time for the benefits of speech recognition to develop -- closer to 50 years than to 10, according to Intel co-founder and chairman emeritus Gordon Moore.

By 2010, through its "Super human speech recognition project", IBM hopes to develop commercially viable systems that can transcribe speech into written text more accurately than humans. At the moment, machines have an error rate that is five to 10 times higher than that of humans, according to various estimates. Automated translation will also be greatly improved.

Researchers at Microsoft and elsewhere are creating computers that can understand speech as a function of probability, rather than trying to understand syntax. For example, Yoda, a speech-to-text engine under development at Microsoft, can turn spoken word into coherent text email messages by studying a user's habits.

CNET News.com's Michael Kannellos contributed to this report.

Talkback

Dictation speech recognition has been almost perfect for about five years. The claim of five to ten times in mistakes is a result of the following not being done:

1. Computer prepared from original CD-ROMs, plus specifications significantly higher then manufacturer states.

2. Vocabulary prepopulates with words and phrases from previous electronic data.

3. Pace of the voice determined at the time of starting. We created a Speaking Clearly Baseline to help with this process. This also helps the user to speak clearly.

Mark L. Pearson
M-CBS
2511 Driftcreek Rd SE
Sublimity, OR 97385
Toll Free Phone 1-877-873-5568 Ext 613
Fax: 1-877-873-5568
Web: https://www.m-cbs.com
E-Mail: mark@m-cbs.com

via Facebook 5 November, 2003 13:33
Reply

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

BrownieBoy

@Jack, > Works really well for thieves.... Nice attempt to deflect the argument by tossing in a point that's totally irrelevant, even it were...

6 hours ago by BrownieBoy on AMD Ultrathins to challenge Intel Ultrabooks
bootlegger

Make that 13 people now - I got refused today at Manchester airport. I thought I was up to date on this legislation - I knew of the EU ruling from...

9 hours ago by bootlegger on UK airport body scans will not be opt out
tinycg

Don't forget to check out apps like GoodReader or SlideShark either, they're indispensible for people on the go in presentation situations. Best...

11 hours ago by tinycg on Four top iPad apps for people on the move
TerryRK

Well it seems there is something a number of us agree on. Why is the Ubuntu Unity launcher so ugly? I thought perhaps it was something to do with...

16 hours ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
Freebies202

Duplicate comments are not made intentionally. Its very good to know that now you are keeping check on this problem because sometimes a commenter...

1 day ago by Freebies202 on Microsoft fixes blog comments, speeds up blogs with open source
kevinmchapman

"the very significant number of users" and "many (most) of us" - you have no evidence for these statements. It is a fact that most users are saying...

1 day ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
Marg Menzies Harrison

Another grammar faux pas is the improper use of "you". When sitting down down in a restaurant, for example, I get cringe when the waitress...

1 day ago by Marg Menzies Harrison via Facebook on 10 flagrant grammar mistakes that make you look stupid
zdnetukuser

And NOW, folks, for Canonical's next trick... Kubuntu is late. Here's a pencil. Draw your own conclusions. cf.:...

1 day ago by zdnetukuser on Linux Minterface
Moley

@kevinmchapman. The discussion here reflects the very significant number of users who really do like the traditional menu system and who wish to...

2 days ago by Moley on A tale of two distros: Ubuntu and Linux Mint
kevinmchapman

Er, no... It is an efficient means of finding the application/file/setting you need in one place. The icons are a simply a fallback for when you...

2 days ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
TerryRK

Isn't the provision of a text based search an admission by the developers that the mass of icons approach does not work? I don't need to use a...

2 days ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
kevinmchapman

"Unity and GNOME 3 both abandon the old text-based cascading menus in favour of a graphical icon-driven system." Point truly missed. Both use a...

2 days ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
TerryRK

whs001 - Thank you, I'm glad you liked the article. I absolutely agree with you on your first point. I should perhaps have made it clearer that...

2 days ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
Dennis Nilsson

If we allow corporate interest to dictate the way our government circumvents due process against foreign entities then we should accept the same...

2 days ago by Dennis Nilsson via Facebook on ACTA stumbles in Germany
GHar123

I totally dislike pirating of works, I fear that artists will be deterred from creating works if they think that they are going to get ripped off....

2 days ago by GHar123 on ACTA stumbles in Germany
JCB33

How dare film makers, artists or anybody that invests in creativity stop us pirating their works for free. I want to be able to walk into my local...

2 days ago by JCB33 on ACTA stumbles in Germany
Moley

@GrueMaster. I prefer horses for courses rather than one size fits all. I, and I suspect most other computer users, do not really wish to have...

2 days ago by Moley on A tale of two distros: Ubuntu and Linux Mint
greycynic

The product that scares me every time I have to use it is the Office 2007 version of Excel. The first bug that I found was applying the median...

2 days ago by greycynic on Ten flawed products that derail productivity
GrueMaster

Nice review and very informative. One thing I'd like to add (in reply to whs001's 1st question), the main reason to have the same interface from...

2 days ago by GrueMaster on A tale of two distros: Ubuntu and Linux Mint
Frederick Wrigley

I'be been using Mint 12 since the RC came out, and I am far more happy with the Cinnamon, the Mate, and, yes (with extensions), theGnome 3...

2 days ago by Frederick Wrigley via Facebook on A tale of two distros: Ubuntu and Linux Mint