Time scaling speech

What does it do?

The time scaling module allows speech files to be played back at a different speed from the speed at which they were recorded. If this were done by simply speeding up or slowing down replay, the pitch of the voice would change, and sound very odd. This module keeps the pitch of the voice at its original level.

The speed of the voice may be altered over a wide range. However, the practical useful rates are between about half normal speed and twice normal speed.

How does it work?

The time scaling module is based on the Pointer Interval Controlled OverLap and Add (PICOLA) method, developed by Morita Naotaka. Mikio Ikeda has an excellent web page on this subject at http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html There is also working code there. This implementation uses exactly the same algorithms, but the code is a complete rewrite. Mikio's code batch processes files. This version works incrementally on streams, and allows multiple streams to be processed concurrently.

How do I used it?

The output buffer must be big enough to hold the maximum number of samples which could result from the data in the input buffer, which is:

input_len*playout_rate + sample_rate/TIME_SCALE_MIN_PITCH + 1