Signal data policy

The simple view

Telephony audio is 16 bit signed, and sampled 8000 times per second. Everyone who knows anything about telecoms knows that, and anyone who doesn't needs educating.

The fuller view

OK. The above is a gross simplification. Whilst true of the analogue telephone network, the ISDN network has offered the possibility of wider bandwidth since its early days (see the G.722 codec). Cellular telephone systems, especially 3G ones, are also offering wider bandwidth - usually 16000 samples per second. This has little benefit for the basic intelligibility of speech. However, you can't tell an isolated 's' from an isolated 'f' at 8000 samples per second. At 16000 samples per second these unvoiced sounds are generally easy to distinguish. Perhaps more importantly, from a practical point to view, the sound is more pleasant/less tiring to listen to.

What is very much true about the simple view is that where the terminal equipment is always traditional PSTN equipment - FAX machines, modems, analogue telephones, etc. - there is no need to cater for any other form of sampling. A G3 FAX modem or analogue line caller ID unit, for example, will never require any other sampling rate.

Coding issues

Signal processing software tends to vary a long in how things are handled.

Integers of specific sizes are referred to by their ANSI C99 names - int16_t, int32_t, etc. Terms like "short int" are less portable than many people think. Now that C99 has standardised names for specific lengths of integer, there is no excuse for creating new ones, as many packages do. Standard names are considered a good thing.

Making some simple functions "static inline" eliminates call overheads very well with most modern compilers. They are generally as efficient as using #define'd code. For more complex functions the overhead of having multiple copies of the function is a real disadvantage. The practice is, therefore, to allow most complex functions to be called with an arbitrary length block of samples. The functions are designed to behave properly whatever number of samples is suppled by the caller.

footnote

Before A-law or u-law is compressed to 8 bit pseudo-logarithmic data, it starts out as 12 to 13 bit signed linear data. Generally, telephony audio has no most than 13 bits of real resolution. However, numbers that size don't generally exist in modern computers, so we use 16 bit signed values for all telephony audio.