Publications

On Landscape

Eigg without Rùm, Issue 219, November 2020.
Roads in the Landscape, Issue 179, March 2019.
Faroese Visions with Adam Pierzchala, Richard Childs, Pete Hyde and Dave Martin, Issue 168, October 2018.
First Light Exhibition Discussion with Joe Cornish, Tim Parkin, Julian Calverley, Baxter Bradford, Beata Moore and Matt Lethbridge, Issue 135, April 2017.
Going it Alone on Harris with Adam Pierzchala, Issue 122, September 2016.
The Voyage of the Malmö, Issue 108, February 2016.
Rockpool Photography, Issue 88, March 2015.
End Frame, Issue 78, June 2014.
Featured Photographer, Issue 73, April 2014.
Sun on Rùm, Issue 59, June 2013.

2003

M.J. Carey, G.D. Tattersall, H. Lloyd-Thomas and M.J. Russell, Inferring identity from user behaviour, in IEE Proc. Vision, Image and Signal Processing, vol. 150, no. 6, pp. 383-388, December 2003.
Biometrics using inherited characteristics are frequently used in security systems. An alternative metric is user behaviour. Examples of user behaviour are the set of web pages accessed over a number of Internet sessions, or, the set of television programmes viewed by an individual over a number of days. A mathematical framework is developed which enables simple and robust identification algorithms based on this kind of user behaviour to be presented. Experimental results are based on a database of 33 users' television viewing habits. The use of these algorithms to this domain is not intended as a real application, but rather as an illustration of the power of the techniques. Practical applications would include user authentication for fraud prevention and preference prediction on the Internet. The user discrimination performance using the TV viewing database is modest, an equal error rate of about 18%. However, it is reasonable to suppose that the discrimination given by this technique would be orthogonal to those of other biometrics and so could provide a useful improvement in performance in a system combining the two.
J.H. Jin, M.J. Russell, M.J. Carey, J. Chapman, H. Lloyd-Thomas and G.D. Tattersall, A spoken language interface to an electronic programme guide, in Proc. European Conference on Speech Communication and Technology (EUROSPEECH), (Geneva, Switzerland), September 2003.
This paper describes research into the development of personalised spoken language interfaces to an electronic programme guide. A substantial data collection exercise has been conducted, resulting in a corpus of nearly 10,000 spoken queries to an electronic programme guide by a total of 64 subjects. A substantial part of the corpus comprises recordings of many queries from a small number of 'core' subjects to facilitate research into personalisation and the construction of user profiles. This spoken query data is supported by a second corpus which contains a record of subjects' viewing habits over a two year period. Finally, the two corpora have been combined to create two information retrieval test sets. Two probabilistic information retrieval systems are described, and the results obtained on the PUMA IR test sets using these systems are presented.

2000

R. Auckenthaler, M.J. Carey and H. Lloyd-Thomas, Score normalization for text-independent speaker verification systems, in Digital Signal Processing, vol. 10, nos. 1-3, pp. 42-54, January/April/July 2000.
This paper discusses several aspects of score normalization for text-independent speaker verification. The theory of score normalization is explained using Bayes' theorem and detection error trade-off plots. Based on the theory, the world, cohort, and zero normalization techniques are explained. A novel normalization technique, test normalization, is introduced. Experiments showed significant improvements for this new technique compared to the standard techniques. Finally, there is a discussion of the use of additional knowledge to further improve the normalization methods. Here, the test normalization method is extended to use knowledge of the handset type.

1999

E.S. Parris, M.J. Carey and H. Lloyd-Thomas, Feature fusion for music detection, in Proc. European Conference on Speech Communication and Technology (EUROSPEECH), (Budapest, Hungary), pp. 2191-2194, September 1999.
Automatic discrimination between music, speech and noise has grown in importance as a research topic over recent years. The need to classify audio into categories such as music or speech is an important part of the multimedia document retrieval problem. This paper extends work previously carried out by the authors which compared performance of static and transitional features based on cepstra, amplitude, zero-crossings and pitch for music and speech discrimination. Two approaches are described to combine the features to improve overall performance. The first approach uses separate GMM classifiers for each feature type and fuses the outputs of the classifiers. The second approach combines different features into a single vector prior to modelling the data with a GMM. Significant improvements in performance have been observed using both approaches over the results achieved by a single type of feature. An equal error rate of 0.3% is achieved for the best system on ten second tests using seventeen hours of test material. The performance is maintained as the length of test file is reduced with an equal error rate of less than 1% being achieved with only two seconds of data.
M.J. Carey, E.S. Parris and H. Lloyd-Thomas, A comparison of features for speech, music discrimination, in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Phoenix, AZ), pp. I-149-I-152, March 1999.
Several approaches have previously been taken to the problem of discriminating between speech and music signals. These have used different features as the input to the classifier and have tested and trained on different material. In this paper we examine the discrimination achieved by several different features using common training and test sets and the same classifier. The database assembled for these tests includes speech from thirteen languages and music from all over the world. In each case the distributions in the feature space were modelled by a Gaussian mixture model. Experiments were carried out on four types of feature, amplitude, cepstra, pitch and zero-crossings. In each case the derivative of the feature was also used and found to improve performance. The best performance resulted from using the cepstra and delta cepstra which gave an equal error rate (EER) of 1.2%. This was closely followed by normalised amplitude and delta amplitude. This however used a much less complex model. The pitch and delta pitch gave an EER of 4% which was better than the zero-crossing which produced an EER of 6%.

1998

H. Lloyd-Thomas, E.S. Parris and J.H. Wright, Recurrent substrings and data fusion for language recognition, in Proc. International Conference on Spoken Language Processing (ICSLP), (Sydney, Australia), pp. 169-172, November/December 1998.
Recurrent phone substrings that are characteristic of a language are a promising technique for language recognition. In previous work on language recognition, building anti-models to normalise the scores from acoustic phone models for target languages, has been shown to reduce the Equal Error Rate (ERR) by a third. Recurrent substrings and anti-models have now been applied alongside three other techniques (bigrams, usefulness and frequency histograms) to the NIST 1996 Language Recognition Evaluation, using data from the CALLFRIEND and OGI databases for training. By fusing scores from the different techniques using a multi-layer perceptron the ERR on the NIST data can be reduced further.

1997

E.S. Parris, H. Lloyd-Thomas, M.J. Carey and J.H. Wright, Bayesian methods for language verification, in Proc. European Conference on Speech Communication and Technology (EUROSPEECH), (Rhodes, Greece), pp. 59-62, September 1997.
This paper describes a number of techniques for language verification based on acoustic processing and n-gram language modelling. A new technique is described which uses anti-models to model the general class of languages. These models are then used to normalise the acoustic score giving a 34% reduction in the error rate of the system. An approach to automatically generate discriminative subword strings for language verification is presented. The occurrence of recurrent strings are scored using a Poisson-based significance test. It is shown that when significant sub-strings do occur in the test material they are strong indicators of the target language occurring.
M.J. Carey, E.S. Parris, S.J. Bennett and H. Lloyd-Thomas, A comparison of model estimation techniques for speaker verification, in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Munich, Germany), pp. 1083-1086, April 1997.
In this paper we address the problem of building speaker dependent Hidden Markov Models for a speaker verification system. A number of model building techniques are described and the comparative performance of a system using models built using each of these techniques is presented. Mean estimated models, models where the means of the HMMs are estimated using segmental K means but where the variances are taken from speaker independent models, out performed other techniques such as Baum Welch re-estimation for training times of 120s, 60s and 15s. Mean estimated models were also built with varying numbers of components in the state mixture distributions and a performance gain was again observed. The incorporation of transitional features into the system had degraded performance when the Baum-Welch algorithm was used for model estimation. However the inclusion of delta and delta-delta cepstra into the system using mean estimated models now gave a significant improvement in performance. Taken together these changes halved the equal error rate of the system from 15.7% to 7.8%.

1996

E.S. Parris, H. Lloyd-Thomas and M.J. Carey, Language verification using anti-models, in Proc. Institute of Acoustics (Speech & Hearing), (Windermere, UK), pp. 107-114, November 1996.
M.J. Carey, E.S. Parris, H. Lloyd-Thomas and S.J. Bennett, Robust prosodic features for speaker identification, in Proc. International Conference on Spoken Language Processing (ICSLP), (Philadelphia, PA), pp. 1796-1799, October 1996.

1995

H. Lloyd-Thomas, An Integrated Language Model for Automatic Speech Recognition. Ph.D. Thesis, Dept. of Engineering Mathematics, University of Bristol, September 1995.
G.J.F. Jones, H. Lloyd-Thomas and J.H. Wright, Lattice parsing and application of integrated language models for speech recognition, in Proc. European Conference on Speech Communication and Technology (EUROSPEECH), (Madrid, Spain), pp. 1789-1792, September 1995.
H. Lloyd-Thomas, J.H. Wright and G.J.F. Jones, An integrated grammar/bigram language model using path scores, in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Detroit, MI), pp. 173-176, May 1995.

1994

J.H. Wright, G.J.F. Jones and H. Lloyd-Thomas, Language model training and robust parsing for speech recognition, in Proc. Institute of Acoustics (Speech & Hearing), (Windermere, UK), pp. 63-71, November 1994.
J.H. Wright, G.J.F. Jones and H. Lloyd-Thomas, Training and application of integrated grammar/bigram language models, in Proc. International Colloquium on Grammatical Inference (ICGI), (Alicante, Spain), pp. 246-259, September 1994.
J.H. Wright, G.J.F. Jones and H. Lloyd-Thomas, A robust language model incorporating a substring parser and extended N-grams, in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), (Adelaide, Australia), pp. I-361-I-364, April 1994.

1993

J.H. Wright, G.J.F. Jones and H. Lloyd-Thomas, A consolidated language model for speech recognition, in Proc. European Conference on Speech Communication and Technology (EUROSPEECH), (Berlin, Germany), pp. 977-980, September 1993.
G.J.F. Jones, H. Lloyd-Thomas and J.H. Wright, Adaptive statistical and grammar models of language for application to speech recognition, in Proc. IEE Colloquium on Grammatical Inference: Theory, Applications and Alternatives, (Colchester, UK), pp. 25/1-25/8, April 1993.

1992

G.J.F. Jones, J.H. Wright, H. Lloyd-Thomas and A. Wrigley, A hybrid grammar-bigram language model with decoding of multiple (N-best) hypotheses for speech recognition, in Proc. Institute of Acoustics (Speech & Hearing), (Windermere, UK), pp. 329-336, November 1992.