Implementing H.323 Terminal: Real-Time Protocol-based Audio Engine

Michał Konrad Rój

mroj@elka.pw.edu.pl

Abstract

This work describes a so-called Real-Time Protocol-based Audio Engine which is a set of procedures and threads responsible for proper real-time sound processing and sending it through the network. An H.323 terminal, capable to send and receive sounds, must meet many requirements. First, it must be able to process a sound signal in real-time. Then, it must use a standardized set of audio codecs to be correctly understood by the remote terminal (an H.323-compliant terminal it communicates with). Finally, it must send and receive the signal in a proper Real-Time Protocol (RTP) format. From the user's point of view, it should have simple, though parametrized, API. This would allow the user to create sound streams flexibly, depending on particular needs.

This work covers two fundamental aspects of the Real-Time Protocol-based Audio Engine: Capturing and Playing Module, responsible for proper real-time sound processing and Audio Codec Module responsible for encoding and decoding audio streams, as well as managing available audio codecs. The modules are flexible (e.g. ready to be expanded by new audio codecs). The modules include interfaces needed to exchange information with the other modules. The Audio Codec Module includes an G.711 codec, what makes this part of the terminal fully usable.

Notice that the sound equipment is beyond the scope of ITU-T Rec. H.323. The Recommendation specifies neither the algorithms how the equipment should work nor suggested mechanisms to be used. Consequently, the great parts of this work are the design of the author of this work.