An Integrated Language Model for Automatic Speech Recognition

Available for download

Abstract

With society's ever increasing reliance on computers, speech is being considered as a possible means of Man-machine communication. In particular there is much research in the area of automatic speech recognition.

The way in which humans perceive and understand speech is not fully understood, but speech perception experiments show that people use more than just the acoustic signal when recognising speech. They use their knowledge of language and the subject and context of a conversation, to make sense of what they hear.

For automatic speech recognition, in addition to acoustic modelling stages, limited models of the structure of speech and language can be incorporated to improve performance, just as humans often seem to use additional knowledge.

There are two main approaches to building formal models of the syntax of language: the statistical data-driven approach of models such as N-grams and the knowledge based rule-driven approach of models such as context-free grammars.

Both N-gram based and grammar based approaches to language modelling have their own respective strengths and weaknesses. In particular N-grams are limited by their inherent local nature and the fact that they cannot generate any structural representation of an utterance. Grammar models, unlike N-grams, are hard to train in an unsupervised fashion but are more powerful in that they exploit explicit linguistic rules.

Various hybrid language models have been built in an attempt to overcome these problems and have been shown to improve performance. However, the partnering of approaches in hybrid systems can be somewhat arbitrary and not necessarily based on solid theoretical foundations. An integrated language model is therefore proposed which closely combines a (partial) linguistic model of language within a flexible statistical framework. Structure is derived from probabilistic context-free grammar rules and is linked with symbol bigrams between terminal and non-terminal symbols within a parse forest representation.

The basic elements of the integrated language model are: a robust extended LR parsing algorithm for ambiguous (and possibly incomplete) grammars, an unsupervised training algorithm for symbol bigram and grammar rule probabilities, a fast scoring algorithm and an algorithm to find new conceptual structures called trails, to provide interpretations of an utterance.

An implementation of the integrated language model has been applied to the modelling of airborne reconnaissance mission reports and better results are obtained than those for standard N-gram and extended N-gram techniques. This is a task to which traditional linguistic models could not be applied due to their inflexible nature.

Enhancements to the integrated language model are also proposed and it is suggested that in addition to its application in automatic speech recognition, the integrated language model could form the basis of a grammatical inference technique.

A thesis submitted to the University of Bristol in accordance with the requirements of the degree of Doctor of Philosophy in the Faculty of Engineering, Department of Engineering Mathematics