This chapter describes how Audio Modules are implemented in this project. The description includes Capturing and Playing Module for 32-bit Windows system, Audio Codec Module which claims to be portable, Audio Queue Module, the module used to exchange the audio frames between Audio and RTP modules (portable) and finally, the G.711-based codec implementation description (portable, too).
Audio Modules are very system-dependent modules. They must be designed and prepared taking to consideration the capabilities of the current OS and used audio library. If one is going to prepare the versions for the other operating systems and/or the other sound libraries, one should re-design and rewrite this module completely. This version of the Audio Engine is designed for Windows 95, 98, NT, and 2000 operating systems (NT/2000 are suggested) and based on the rules described in Section 2.1. The audio mechanism used by the project is Waveform Audio.
Capturing and Playing Module is a part of an application (e.g. multimedia terminal), responsible for a proper capturing and playing real-time sounds. There are plenty of audio mechanisms prepared for various operating systems. But just a small subset of them is ready to be used in real-time applications. The real-time audio programming and the full-duplex programming requires some specific approach to the matter. Occasionally, programmer must make the most of the available device and the used programming libraries to prepare such a program. This chapter explains how to design a real-time application module.
In multimedia terminal the two type of audio streams can be separated: an input stream and an output stream. They should be treated independently.
In case of the input stream, as showed at Figure 2-1, the sound is taken from the microphone, sampled and prepared by the Input Audio Engine. Next, it is encoded and sent to the Real Time Protocol (RTP) process. The RTP process puts the encoded sound (payload) into the special frame and sends it to the network. In case of the output sound (Figure 2-2), the sound payload is taken from the RTP process, then decoded and sent to Output Audio Engine, responsible for continuous playing of the audio signal.The RTP module is beyond the scope of this work. Coder Modules and Decoder Modules are very simple from the user point of view. They are usually procedures that get a block of data and return the other ("encoded" or "decoded") data block. The remaining work is done by the Capturing and Playing Module. Luckily, low-leveled sampling is performed by OS and sound device. It is the Capturing and Playing Module that supports real-time capturing and playing of the sounds. It is Capturing and Playing Module that examines sound devices and their capabilities. Finally, it is Capturing and Playing Module that fills the empty space when sound frames are not received from the network.
Very popular scenario of real-time sound processing are "capturing loop" and "playing loop". Any loop is independent from the other loops. They may be organized as separate threads or even the separate processes. To simplify the matter, the capturing, encoding, decoding and playing procedures are called further threads.
while(EXIT_CONDITION == FALSE) { TABLE_OF_SAMPLES = get N samples from 'INPUT SOUND DEVICE'; pass TABLE_OF_SAMPLES to 'CODER PROCEDURE'; }
Figure 2-3. Capturing Thread
while(EXIT_CONDITION == FALSE) { SAMPLES = get TABLE_OF_SAMPLES from 'INPUT AUDIO ENGINE'; ENCODED_FRAME = encode SAMPLES; pass ENCODED_FRAME to 'RTP PROCEDURE'; }
Figure 2-4. Encoding Thread
First, the next block should be started immediately after the previous one. This feature must be supplied by the operating system (OS). Windows 9x/NT systems go with such a feature. Before starting sampling, the user sends to the OS a number of buffers. Then, the sampling is started. The system fills every buffer and after completing the current one it begins filling the next buffer. The system sends also the signal to the user that the sampling of a certain data has been completed. The user can process it, and after all this work he adds the buffer to the system. Consequently, it can be filled again, and again.
Now, there is no need to separate the encoding thread from the capturing thread. In case of the contemporary personal computers, sampling takes much more time than encoding of the input block. This would save the designer from programming the inter-thread communication for capturing and encoding threads. This would also simplify the control over the threads. The new (safe) procedure is shown in Figure 2-5.
The improved algorithm (with included encoding) would look like at Figure 2-6.for (i = 1; i < NUMBER_OF_SYSTEM_BUFFERS; i++) { put BUFFER[i] to 'OPERATING SYSTEM'; } start capturing to buffers in 'OPERATING SYSTEM'; while(EXIT_CONDITION == FALSE) { wait for the fill-up of the current block; SAMPLE_TABLE = get the recently filled buffer from 'OPERATING SYSTEM'; ENCODED_TABLE = encode SAMPLE_TABLE; pass ENCODED_TABLE to 'RTP PROCEDURE'; put SAMPLE_TABLE to 'OPERATING SYSTEM'; }
Figure 2-6. Improved Capturing Loop
In this implementation capturing thread is organized as separate thread. The threads is called by start_recorder routine that initiates data structures, especially headers (using waveInPrepareHeader), adds them to system buffers, runs recorder_thread as a new thread and returns a handle of the type snd_thread_id_ptr. By this handle user can control recording thread, especially kill it if no longer required (using stop_recorder procedure).
Procedure recorder_thread starts sampling and goes to the main capturing loop. At the beginning of the loop thread is being blocked. It is system who unblocks recording thread when a block of samples has been recorded. Then, recorder encodes block of samples using a specified audio coder. Encoded frame is put to queue owing by RTP thread. Empty header is thrown to the system (waveInAddBuffer) and the thread looks whether the stop condition is true (this makes thread call reset, stop and close audio device and free all allocated data). If not, thread jumps to the beginning of the loop i.e. it blocks itself. And so forth.
Playing must be organized in a different way. First, it must support a buffer of frames ready to be played (an anti-jitter buffer). Second, it should be occasionally ready for fill-up of the lacking frames. The pseudo-code of such a procedure goes below:
while('ANTI-JITTER BUFFER' is not fully filled) { ENCODED_TABLE = get from 'RTP PROCEDURE'; SAMPLE_TABLE = decode ENCODED_TABLE; put SAMPLE_TABLE into 'ANTI-JITTER BUFFER'; } start playing from 'ANTI-JITTER BUFFER'; while(EXIT CONDITION == FALSE) { wait for the the current buffer buffer being finished; ENCODED_TABLE = get from 'RTP PROCEDURE'; if (ENCODED_TABLE == EMPTY) then /* no buffer from RTP */ SAMPLE_TABLE = prepare the virtual frame; else SAMPLE_TABLE = decode ENCODED_TABLE; put ENCODED_TABLE into 'ANTI-JITTER BUFFER'); }
Figure 2-7. Playing Thread
The algorithm described above shows just the main idea of the procedure. Thus it is very simplified here. First, samples should be not played directly from the anti-jitter buffer. Second, preparation of lacking frames is sometimes very complicated process, and usually it is not performed immediately after one delayed frame. Finally, the modern complex algorithms are being developed to solve the lack of Quality of Service (QoS) in certain packet-based networks. These algorithms are not took into consideration here.
Playing loop is organized as separate thread called Playing Thread. Playing Thread is started the start_player routine. This function prepares all data structures for playing thread e.g., headers, starts new thread and returns snd_thread_id_ptr. At first playing thread starts waiting for sufficient number of digital audio frames in queue (this number is parametrized value). This is made by blocking supported by the queue between RTP and playing thread. Then the waiting is finished playing thread gets a few frames from the queue, decodes them, sends them to the system (waveOutWrite) and runs playing (waveOutRestart). Then it blocks itself waiting for finish of playing audio frame. Every frame this thread is unblocked, gets next frame from the queue, decodes it and sends to system. If no frames are available (what means that something wrong must have happen - delay or even loss of RTP packet) than playing thread prepares frame using previous one.
It is clear that capturing and playing procedures should be organized separately. They should be implemented as separate processes or separate threads. The other reason to do so is that these procedures should have the special privileges : they are really real-time procedures, so any delay or stopping of these can cause serious effects. The privileges are very system-dependent and will not be described in this chapter.