... the Hidden Markov Model. These techniques all attempt to search for the most likely word sequence given the fact that the acoustic signal will also contain a lot of background noise. The task is made easier if the system can be trained to recognise one person's voice pattern rather than that of many people, and it is also easier if isolated words are to be recognised rather than continuous speech. Similarly, the task is easier if the vocabulary is small, the grammar constrained and the context well-defined.
Grammar and context are particularly important elements in speech recognition, particularly in a highly complex language like English, and this has taken speech recognition system developers into areas like natural language analysis and comprehension.
The complexity of these problems has meant that most of the voice recognition systems developed to date are either small-vocabulary isolated-word recognition systems or large-vocabulary single-speaker recognition systems. Researchers are still a few years away from being able to produce a general purpose automatic speech recognition system that can recognise continuous speech from a wide variety of people and with a wide vocabulary as successfully as any human listener.
Although the technology for speaker-dependent large-vocabulary dictation systems now works quite well on a PC, they have not proved as popular as many predicted. This has been because in most situations it is quicker and easier to edit a document using a conventional keyboard and mouse. Furthermore the high background noise levels found in the average office make recognition hard, and recognition rates can fall as low as 50 percent compared with a normal quiet office level of up to 98 percent.
The application of speech recognition has been more successful in telephony, in applications that are not automatable using conventional push-button interactive voice response systems, such as directory assistance. Speech recognition technology is today widely used in automated phone-based information systems, such as travel booking and information, financial account information, and customer service call routing.
In such applications accuracy of recognition is very high, despite high noise levels, because such systems use constrained grammar recognition. This simply means that a highly optimised telephone application can trigger a prompt from the user to repeat the previous answer whenever the system's confidence in recognition of that input is low.
Speech recognition software is now increasingly used in mobile phones as a faster way to input SMS messages. Nuance Communications, one of the biggest producers of voice recognition products, claims that more than 50 million phones are now equipped with such software. Here, although the background noise levels can be very high, vocabulary size is much smaller and the grammar constrained, so once again recognition rates are high.
In such applications voice input is becoming popular because with multiple menus, options and sub-menu paths to access each application even a simple task on a modern mobile phone is becoming time consuming. Just writing and sending a five-word SMS message...






