Digital Signal Processing

From Raspberry Pi Min-Grant project
Revision as of 22:43, 19 February 2015 by Cmaina (talk | contribs)
Jump to: navigation, search

Digital Signal Processing

Students at DeKUT are currently taught digital signal processing (DSP) in the final year of their five year program. The aim of the course is to introduce the student to a number of fundamental DSP concepts including:

  1. Discrete time signals and systems
  2. Linear time invariant (LTI) systems
  3. Frequency-domain representation of discrete time systems
  4. z-transform
  5. Sampling of continuous time signals
  6. Filter design
  7. The discrete Fourier transform

Currently, the laboratory exercises in this course are Matlab based and focus on learning how to manipulate discrete signals, plot frequency responses of digital LTI systems and design digital filters. These exercises are designed to ensure the students understand the theory of DSP. 

We propose to design a Raspberry Pi based DSP laboratory which will further enhance the understanding of these concepts by exposing the students to the processing of the human voice. A large number of DSP applications deal with speech processing and are now found in modern day electronics. These include speaker identification and speech identification. We aim to introduce the students to speech processing using a simple example, the estimation of fundamental frequency in a speech segment. It is hoped that this will motivate the students to explore more advanced applications such as speech recognition.

Background

Human speech is arguably one of the most important signals encountered in engineering applications. Numerous devices record and manipulate speech signals to achieve different ends. To properly manipulate the signal, it is important to have an understanding of the speech production process. The lungs, vocal tract and vocal cords all play an important role in speech production. The speech production model consists of an input signal from the lungs and a linear filter.  In this model, the input is a white noise process which is spectrally flat. This input is then spectrally shaped by a filter which models the properties of the vocal tract. Since the properties of the vocal tract are constantly changing as different sounds are produced, the filter is time varying. However, the filter is often modelled as quasi-stationary with filter parameters constant over a period of approximately 30ms.

When the vocal cords vibrate as is the case when pronouncing the sound /a/ in cat, we say that the sound is voiced and in this case the signal is seen to be exhibit some periodicity. When the vocal cords do not vibrate the sound is unvoiced.


Estimation of Fundamental Frequency
When speech is voiced, it is seen to exhibit periodicity and it is often important in speech applications to estimate the pitch of these signals. To achieve this, we estimate the fundamental frequency of this signal also refered to as F0. A popular method for estimation of F0 is based on the autocorrelation function (ACF). 

Consider a periodic signal <math>\cos(2\pi f_0 t)</math> which oscillates at a frequency $f_0$. To work with this signal on a computer we sample it at a frequency $f_s=\frac{1}{T_s}$ to form a discrete time signal $x[n]=\cos(2\pi f_0 nT_s)$. We can compute the ACF of the signal $x[n]$ using