Unlike other H.323 entities, H.323 terminal is a really multimedia entity. I can support audio, video and data (chat-like and/or ftp-like) capabilities. Audio capabilities are the essential part of such a terminal. First, it allows multimedia computer to become a telephone-like device, device that mankind has been used to for decades. Second, this device is extremely cheap while using: usually it is a "Voice Over IP" device, so it requires just access to Internet to find itself in the global network.
According to [7] audio capabilities must be present in every H.323-compliant terminal. Video and data capabilities are optional. Thus, this terminal, as well as any other H.323-compliant terminal, supports audio.
Figure 1-1 (taken from the ITU-T Rec. H.323) shows the H.323 terminal and where is every functional block placed. The input audio stream starts in a microphone, then it is handled by the operation system, and then is processed by audio capturing module. These parts are outside the terminal and are represented by the Audio I/O Equipment in the figure. Then, audio stream must be encoded using a specific audio coder, and the encoded signal could be passed to the Real-Time Protocol (RTP) Block (H.225.0 Layer). The module prepares the frames and sends them to the network.The output audio stream is a functional inversion of the input stream.
Although Audio I/O Equipment is a very system-dependent module, data streams generated by the the module must fit to the input of audio coders. Coders demand of Audio I/O Equipment to supply them with audio signal continuously, packed in frames. This signal is usually 8 or 16-bit, 8 or 16 kHz PCM. Such a signal must be generated by the sound device. Typically both participants are able to send and receive audio data, so sound device must be full-duplex. If not, the multimedia terminal can work just as a transmitter or by a receiver, never both of them. For some codec sets, playing sampling rate may differ from recording sampling rate, but not all sound devices support this capability. Consequently, before any audio capabilities are used (especially sent to the remote terminal by use of the control channel), the local sound system must be thoroughly examined.
For H.323 terminal [7] the following codecs are defined:
G.711-based set of codecs. This set of codecs is mandatory in every H.323 terminal.
G.722 codec.
G.728
G.729
MPEG 1 audio
G.723.1
H.323 terminals must be able to encode and decode audio signals using G.711 codec family, both A-law and U-law. Other audio codecs are optional, although recommended for some types of networks. So the minimal H.323-compliant terminal would be a G.711-audio terminal.
The H.323 requires to treat all the audio codecs as frame-oriented codecs (even those sample-oriented). If a codec is sample-oriented a frame is produced (by collecting some number of samples). The length of the frame must be 8 octets long. Many frames can be sent in one packet but one frame cannot be divided into many frames (for details refer [5], p. 12).
A receiver must be able to acquire 200 ms of sound in a single packet (this condition is introduced to set an appropriate receive buffer lenght).
If no sound is received for some time, receiver should repeat the latest received audio frame with a lower volume.