Skip to content

Time Stretch and Pitch Shift

Speeding up a tape changes pitch and duration together. Modern editing DSP separates them.

OperationWhat changesWhat should stay stable
Time stretchLengthPitch
Pitch shiftPitchLength

This page explains the machinery behind that separation and grounds it in libsonare's stretch backend. For the vocabulary first, read Editing Basics.

The phase vocoder

The core tool is the phase vocoder.

At a high level, it does three things:

  1. Run an STFT to split audio into short time-frequency frames.
  2. Resample those frames along the time axis, changing how quickly they advance.
  3. Rebuild phase so partials stay continuous instead of smearing.

The hard part is phase coherence. The STFT splits each frame into frequency bins — one slot per narrow band of frequencies. When frames are spaced differently than they were analyzed, the phase in each bin has to be re-propagated so the individual frequency components (the partials that make up the sound) stay continuous. If that goes wrong, you hear the classic "phasey" or metallic artifact.

Two operations, one backend

OperationChangesKeepsHow
Time stretchDurationPitchPhase vocoder rescales the time axis
Pitch shiftPitchDurationTime-stretch by a ratio, then resample back to the original length

This is why pitch shift and time stretch share a backend: a pitch shift is a time stretch followed by resampling. rate > 1.0 shortens a clip; semitones = 12 shifts up an octave.

PARAM SWEEP · TIME STRETCHIDLE
Time stretch — changing length, not pitch

Time stretching is pitch shift's exact opposite: it changes how long the audio lasts while leaving the pitch alone. Drag the rate and the drum hits spread out or bunch up — the waveform fills more or less of the panel — but the spectrum below barely moves. Below 1.0 the clip slows down and grows; above 1.0 it speeds up and shrinks. Press play to hear the groove change tempo with no chipmunk effect.

Rate
1 ×
PARAM SWEEP · PITCH SHIFTIDLE
Pitch shift — moving the whole harmonic comb

Pitch shifting transposes a sound without changing its length. Drag the semitones and watch the spectrum: every harmonic scales together and the fundamental marker tracks the new pitch. Because this shift is not formant-preserving, the formant bumps move too — the "chipmunk" effect. Press play to hear it.

Pitch
0 st

Why large moves create artifacts

Both operations invent or discard information.

EditWhat can go wrong
Stretch a sound to twice its lengthMuch of the new audio is synthesized from phase assumptions
Shift a voice up a fifthFormants move too unless corrected
Make a large transient editAttacks can soften or smear

Small moves stay transparent because the assumptions still hold. Large moves expose the assumptions as smearing, transient softening, or a "chipmunk" timbre.

Practical rule: keep edits conservative for natural results, and treat big moves as deliberate creative effects.

How libsonare implements stretching

libsonare's timeStretch and pitchShift sit on a phase-vocoder core (phase_vocoder) combined with resampling for the pitch axis, selected through a StretchBackend. A pitch shift is implemented as a time-stretch by the pitch ratio followed by a resample back to the original duration. The same core underlies noteStretch (region-bounded stretching) and the pitch path of voiceChange. All operate on decoded mono Float32Array / sample sequences; quality degrades gracefully with the size of the move, so conservative ratios stay artifact-free.

Related: Editing Basics, Pitch Correction, Voice and Formant, Editing DSP