Paper Downloads
AbstractVoice production is generally modelled as a two-component dynamical process composed of the vocal folds and vocal tract. Figure 1 shows a diagram of the arrangement of the vocal fold and vocal tract inside the head and neck. The vocal tract is comprised of the pharyngeal, oral and nasal cavities. It is usually modelled as a linear acoustic resonator, and the vocal folds as a nonlinear dynamical system comprising masses, viscoelastic damping and forcing due to lung pressure. However, it is generally the case that systems used for speech transmission, analysis or compression do not utilise an explicit, dynamical model of the vocal folds. They take several different approaches (Kleijn & Paliwal 1995), including: (a) waveform coders with no model of speech production, (b) source coders which use a vocal tract model and a simple characterisation of the vocal fold behaviour, i.e. whether it is periodic or noise-like, and (c) hybrid methods with a vocal tract model and selection of a representation of the vocal fold behaviour that minimises overall waveform error. In this paper we introduce a method for modelling the dynamical behaviour of the vocal folds in speech processing. This method is based around a discrete dynamical model that is suitable for direct fitting to the vocal fold signal. Thus parameters that represent the biomechanical behaviour of the vocal folds can be identified. These parameters, together with the initial conditions and the model residual are an exact but smaller representation, in the information-theoretic sense, of the vocal fold dynamics. This representation could then, for example, form the basis for a low bit-rate source coder. KeywordsVariational integration, speech, nonlinear models, vocal fold dynamics
|