Dumb versus smart telephony cards

Steve Underwood

documentation in progress

This document can be freely redistributed according to the terms of the GNU General Public License.


Table of Contents

The merits of dumb and smart telephony cards in the age of fast PCs and VoIP.
Its smart to be dumb
Some people just can't stop talking
So, how well does being dumb really work out?
Defining the goals - G.168
Comparing against the goals

The merits of dumb and smart telephony cards in the age of fast PCs and VoIP.

Before the Zapata 1 T1/E1 card was developed in 1999 dumb telephony cards for PCs were almost unknown. The market was dominated by cards from people like Dialogic, NMS, and Pika, which all had on board processing. This generally consisted of some low performance DSP, capable of simple activities for voice mail and IVRs, and a general purpose processor to handle the call signalling. These cards made sense in the early 90s, when the main processor in a PC was rather slow.

In the early days of VoIP, Dialogic and others developed cards with greater processing power and ethernet ports. These are basically VoIP gateways on a card. They leave the PC's main CPU(s) to do little more than call management. Stand alone VoIP gateways have generally made more sense than these cards, and they seem to have been a dead end.

Its smart to be dumb

By the time the Zapata 1 card was developed (1999), Pentium III and Athlon processors were heading towards 1GHz, and provided sufficient performance to do rather more than a traditional Dialogic card ever could with its on board processing. This card was designed for extreme simplicity, and placed a heavy burden on the system CPU(s), which needed to cope with 8000 interrupts each second. The card also had an ISA bus, just as these were disappearing from the market. The Zapata 2 card, with 4 E1 or T1 ports and a PCI bus, was a much more usable design. However, it still demanded 1000 interrupts per second. Also, it did not use bus mastering. The lack of mastering was a design weakness, but the high interrupt rate is a direct consequence of it being a dumb card. To see why, we need to look at at the nature of phone calls, and a little DSP....

Some people just can't stop talking

The data rates for voice may not seem that high, but voice is a true real time activity. That data absolutely, positively will not stop. In a video phone, people will forgive a slightly stuttery picture. A stuttery voice is not tolerable. People will forgive a little variable lag in the video. A few hundred millisecond lag in the voice makes conversation very difficult. Voice telephony by computer can be a surprisingly hard problem, because of these factors.

So, how well does being dumb really work out?

Many people make grandiose claims about the processing that should and should not be on a telephony card, in the age of fast CPUs for PCs. Rational analysis to back up those claims is less common.

There is one major candidate for on-card processing which stands out, as it affects such key elements of the system design - echo cancellation. VoIP requires echo cancellation. Echo cancellation requires lots of computation. Several other things in VoIP require lots of computation, like G.729 and Speex codecs. However, echo cancellation is special. It is the only major function that operates as an adaptive control loop. That means the delay between the signal on the cable and the computation is critical to good performance. Bottom line: if you echo cancel on the main CPU(s), you can't afford to buffer much.

In the Zapata 2 design, and the other cards that followed from Digium, the compromise chosen was to process the 8000 samples per second voice data stream in blocks of 8 samples, leading to 1000 interrupts per second. In each interrupt, 8 audio samples must be echo cancelled for every active channel. This means the interrupts are both frequent, and heavy in computational load. This is very demanding, especially in environments like Linux or BSD, which were not designed for true real time activities (there are true real time add ons, but telephony cards are not generally used in machines with those installed). If another complex interrupt, like a disk or network card interrupt, gets in the way, a telephony interrupt might be lost. Disk interrupts are never lost. The disk drive waits for however long it takes the CPU to service it's request. Network card interrupts can be lost, if the buffers on the network card overflow. However, most network activities will retry if this happens. Voice is like the proverbial fire hose.

So, we have cards with a somewhat difficult to meet interrupt flow, and its difficult to meet mostly due to host software based echo cancellation. Voice is packetised for VoIP applications in packets bearing at least 10ms of audio. 20ms or 30ms packets are more common. If the telephony card includes an echo canceller, it can DMA 10ms or 20ms chunks, instead of 1ms chunks. The interrupt timing becomes far more relaxed. This is the main benefit of putting echo cancellation on a telephony card. It certainly reduces the compute load of the main CPU, but the relaxation of timing constraints is the real killer benefit.

Defining the goals - G.168

When looking at how well echo cancellation performs, it is obviously important to understand the goals. "It sounds nice" might be the ultimate goal, but its hard to build metrics around that. The industry standard for assessing echo cancellers, for cancelling telephone line echos, is G.168. This specification is basically a series of tests, which a good canceller should pass. These tests are practical real world issues for good cancellers. They start with the basic issue that a canceller must to converge quickly and reliably. They cover the need to suppress small residual echos due to the lossy nature of the u-law and A-law codecs used on the PSTN (echo cancellation cannot reduce an echo by much more than 30dB over a u-law or A-law link). They cover the need to remain stable when DTMF, supervisory and other narrow band signals appear on the line. G.168 is updated every few years, and usually more tested are added in response to real world issues that have been found.

Comparing against the goals

People claim various weird and wonderful things for the merit of host based echo cancellation and card based echo cancellation. Lets look at them.

Claim: Software echo cancellers are incapable of working as well as hardware echo cancellers

As stated, this claim is actually meaningless. Practically every echo canceller is somewhere between mostly and entirely software. Where they apply dedicated hardware it is usually in the form of a fast, compact, dot product engine to speed up the filters. That is controlled by software and doing precisely what the software could do. Most echo cancellation chips are just DSPs sold pre-programmed for one job. The issue here is not hardware versus software, but host versus on-card processing. Let's look at things in that light.

When a software echo canceller is buried in a device driver interrupt routine, with only 8 audio samples separating it from the data on the wire, it is possible for it to achieve the same results as a dedicated canceller on the card. The only system difference is the issue of tight versus relaxed timing described above. If the timing constraints are met, there is no difference in the potential performance. If the timing constraints are not met, horrible things happen. If the software can respond quickly and consistently, and the canceller does not behave as well as an on-card canceller, the difference is purely in the quality of the algorithms.

Claim: Zaptel canceller X is much better than canceller Y

The current zaptel software for the Digium cards contains various echo cancellers. Some people claim one or other of these cancellers to be wonderful. Actually, all the current cancellers are all rather poor. Some of the versions produced have had bugs, and those obviously have more serious problems than the others. However, little separates the ones which have been well debugged. They implement very similar algorithms, which only cater for the most basic convergence issues. None of them could get through more than a couple of the tests in G.168. Critically, they lack a way to tolerate tones, which can badly mess up the adaption of a canceller. They lack non-linear processing, to suppress small residual echos. These things are essential for a stable, robust, high quality canceller.

Claim: Canceller X is an implementation of the algorithms in an application note from a famous vendor. It must be commercial quality.

A little reality: Application notes are great sources for ideas, but poor sources for complete solutions. There are many application notes on the internet, typically from DSP vendors, describing the core elements of a canceller. Can you find one that describes how to get through some of the more interesting tests in G.168?

Claim: Canceller X passes G.168

It is difficult to know exactly what that means. Most cancellers do not pass all the tests in G.168, but pretty much all commercial cancellers claim compliance. In the fine print they may list which tests they pass. Most commercial cancellers pass the core tests, which check for basic convergence, tolerance of tones, etc. Some of the other tests are a bit greyer, and its not easy to tell which pass really means.

Claim: Canceller X passes G.168, so its clearly as good as any other

Not everyone would agree with that. It is possible to pass every test in G.168 and still have significant real world limitations. Various proprietary, and sometimes patented, methods are used to maximise the stability and reliability of the best commercial cancellers. There is still substantial variation in how well cancellers from different sources perform, and some aspects are still research topics.