Implementing H.323 Terminal: Real-Time Protocol-based Audio Engine

Michał Konrad Rój

mroj@elka.pw.edu.pl

Abstract

This work describes a so-called Real-Time Protocol-based Audio Engine which is a set of procedures and threads responsible for proper real-time sound processing and sending it through the network. An H.323 terminal, capable to send and receive sounds, must meet many requirements. First, it must be able to process a sound signal in real-time. Then, it must use a standardized set of audio codecs to be correctly understood by the remote terminal (an H.323-compliant terminal it communicates with). Finally, it must send and receive the signal in a proper Real-Time Protocol (RTP) format. From the user's point of view, it should have simple, though parametrized, API. This would allow the user to create sound streams flexibly, depending on particular needs.

This work covers two fundamental aspects of the Real-Time Protocol-based Audio Engine: Capturing and Playing Module, responsible for proper real-time sound processing and Audio Codec Module responsible for encoding and decoding audio streams, as well as managing available audio codecs. The modules are flexible (e.g. ready to be expanded by new audio codecs). The modules include interfaces needed to exchange information with the other modules. The Audio Codec Module includes an G.711 codec, what makes this part of the terminal fully usable.

Notice that the sound equipment is beyond the scope of ITU-T Rec. H.323. The Recommendation specifies neither the algorithms how the equipment should work nor suggested mechanisms to be used. Consequently, the great parts of this work are the design of the author of this work.


Table of Contents
1. Introduction
1.1. The goal of this work
1.2. PC Audio Tutorial
1.2.1. Sound devices
1.2.2. PCM: Sampling and Quantization
1.2.3. Audio Codecs
1.2.4. Audio in packet-based networks
1.3. Audio in H.323 Terminal
1.4. Windows Audio Mechanisms
1.5. Microsoft Windows Waveform Audio
1.5.1. WaveForm Audio Overview
1.5.2. Capturing
1.5.3. Playing
1.5.4. Shutting Down
2. Implementation
2.1. Audio Engine Architecture
2.1.1. Audio Streams
2.1.2. Capturing Loop
2.1.3. Playing Loop
2.2. Overview of this software
2.3. Communication with RTP : Audio Queue Module
2.4. Audio Codec Controlling : Codec Management Module
2.4.1. Codec Management Overview
2.4.2. Codec Management API
2.4.3. How to add a new codec
2.5. Capturing and Playing Module
2.6. Implementation of G.711 codec
Bibliography
A. Virtual System API
B. WaveForm API
List of Figures
1-1. Inside the H.323 Terminal
1-2. The C-code of the capturing loop.
1-3. The C-code of the playing loop
2-1. Input Audio Stream
2-2. Output Audio Stream
2-3. Capturing Thread
2-4. Encoding Thread
2-5. Safe Capturing Loop
2-6. Improved Capturing Loop
2-7. Playing Thread
B-1. A fragment of the capturing procedure
List of Examples
2-1. Codec Management Example
Audio Codec Module