Skip to content

Editing DSP

libsonare exposes editing-oriented DSP alongside the analysis and mastering APIs. These functions work on decoded mono Float32Array or Python sample sequences and are useful for simple vocal correction, note edits, and voice design.

Use this page when you already have audio samples and want to change the sound itself. If you only want to measure BPM, key, chords, or features, start with Getting Started instead.

If semitone, MIDI note number, or formant are unfamiliar, read Editing Basics first — this page assumes that vocabulary and focuses on which function to call.

DSP vs editing

DSP means Digital Signal Processing: measuring or transforming audio as a signal. The editing DSP on this page does more than measure BPM or key; it rewrites the sound itself, changing pitch, duration, or vocal character.

What You Will Learn

By the end of this page you should be able to:

  • distinguish analysis APIs from editing APIs that actually rewrite the signal;
  • choose time stretch, pitch shift, pitch correction, note stretch, spectral edit, or voice change by musical intent;
  • choose the offline voiceChange(...) helper versus the block-safe realtime voice changer;
  • convert between seconds, samples, semitones, and MIDI notes without guessing;
  • understand why large pitch/formant moves create artifacts and how to keep edits conservative.

Which Edit Do You Need?

GoalUseBeginner note
Make a clip longer or shorter without changing the noteTime stretchrate is a playback-speed multiplier: rate=2.0 plays twice as fast, so the clip is half as long; rate=0.5 is half speed and twice as long. The pitch does not change. (Note: noteStretch's stretchRatio is a length multiplier, so it works the opposite way.)
Move the whole clip up or down in pitchPitch shiftsemitones=12 means one octave up; -12 means one octave down
Nudge a vocal note toward a target notePitch correctionYou must know or estimate the current pitch first
Hold or shorten one note regionNote stretchRegion positions are sample offsets, not seconds; stretchRatio > 1 lengthens the region
Attenuate or heal a time-frequency regionSpectral editDraw a rectangle in samples and Hz, then apply gain, attenuate, mute, or heal
Change vocal characterVoice changePitch changes the note; formant changes the perceived body or character of the voice
Process live voice blocks with presetsRealtime voice changerUse this for AudioWorklet, monitoring, or chunked processing where DSP state must continue across blocks
What is a formant?

A formant is a peak of acoustic energy in a particular frequency range. It is created by resonances of the vocal tract.

Formants shape the vowel sound and the perceived size or character of a voice, independent of the pitch (the note being sung).

That is why voiceChange separates the two controls. Lowering the formant factor makes a voice sound larger and darker. Raising it makes the voice sound smaller and brighter. The pitch stays where you set it.

PARAM SWEEP · PITCH SHIFTIDLE
Pitch shift — moving the whole harmonic comb

Pitch shifting transposes a sound without changing its length. Drag the semitones and watch the spectrum: every harmonic scales together and the fundamental marker tracks the new pitch. Because this shift is not formant-preserving, the formant bumps move too — the "chipmunk" effect. Press play to hear it.

Pitch
0 st
PARAM SWEEP · TIME STRETCHIDLE
Time stretch — changing length, not pitch

Time stretching is pitch shift's exact opposite: it changes how long the audio lasts while leaving the pitch alone. Drag the rate and the drum hits spread out or bunch up — the waveform fills more or less of the panel — but the spectrum below barely moves. Below 1.0 the clip slows down and grows; above 1.0 it speeds up and shrinks. Press play to hear the groove change tempo with no chipmunk effect.

Rate
1 ×

Functions

Parameters like f0Hz (a per-frame pitch contour), hopLength, and voiced are defined under Time-varying pitch correction below.

TaskWASM / browser JavaScriptPython
Shift the whole signal without changing durationpitchShift(samples, sampleRate, semitones)pitch_shift(samples, sample_rate, semitones)
Time-stretch the whole signal without changing pitchtimeStretch(samples, sampleRate, rate)time_stretch(samples, sample_rate, rate)
Correct from one MIDI note to anotherpitchCorrectToMidi(samples, sampleRate, currentMidi, targetMidi)pitch_correct_to_midi(samples, sample_rate, current_midi, target_midi)
Follow a per-frame pitch contour toward a notepitchCorrectToMidiTimevarying(samples, f0Hz, targetMidi, sampleRate, hopLength, voiced?, voicedProb?)pitch_correct_to_midi_timevarying(samples, f0_hz, target_midi, sample_rate, hop_length, voiced?, voiced_prob?)
Stretch only a note regionnoteStretch(samples, sampleRate, { onsetSample, offsetSample, stretchRatio })note_stretch(samples, sample_rate, onset_sample, offset_sample, stretch_ratio)
Edit a time-frequency regionspectralEdit(samples, sampleRate, ops, options?)spectral_edit(samples, sample_rate, ops, ...)
Shift pitch and formants independentlyvoiceChange(samples, sampleRate, { pitchSemitones, formantFactor })voice_change(samples, sample_rate, pitch_semitones, formant_factor)
Stateful realtime voice preset chainRealtimeVoiceChangerRealtimeVoiceChanger
One-shot realtime voice preset renderNode native: voiceChangeRealtime(samples, sampleRate, preset)voice_change_realtime(samples, sample_rate, preset)

Node native argument order

Node native uses timeStretch(samples, rate, sampleRate?) and pitchShift(samples, semitones, sampleRate?). The WASM/browser functions above keep sampleRate before the edit amount.

Usage

typescript
import { init, noteStretch, pitchCorrectToMidi, voiceChange } from '@libraz/libsonare';

await init();

const tuned = pitchCorrectToMidi(vocal, sampleRate, 68.7, 69);
const heldNote = noteStretch(vocal, sampleRate, { onsetSample: 12000, offsetSample: 24000, stretchRatio: 1.25 });
const character = voiceChange(vocal, sampleRate, { pitchSemitones: 5, formantFactor: 1.1 });
python
import libsonare as sonare

tuned = sonare.pitch_correct_to_midi(vocal, sample_rate, current_midi=68.7, target_midi=69)
held_note = sonare.note_stretch(vocal, sample_rate, onset_sample=12000, offset_sample=24000, stretch_ratio=1.25)
character = sonare.voice_change(vocal, sample_rate, pitch_semitones=5, formant_factor=1.1)
bash
# The sonare CLI loads and writes WAV/MP3 directly
sonare pitch-correct vocal.wav --current-midi 68.7 --target-midi 69 -o tuned.wav
sonare note-stretch vocal.wav --onset 12000 --offset 24000 --ratio 1.25 -o held.wav
sonare voice-change vocal.wav --pitch-semitones 5 --formant-factor 1.1 -o character.wav

For rectangular time/frequency edits such as whistle attenuation or short artifact repair, see Spectral Editing.

How pitchCorrectToMidi(...) works

pitchCorrectToMidi(...) takes two MIDI note numbers: the pitch the audio currently has, and the pitch you want it to move toward. It does not detect the current pitch by itself. The caller provides currentMidi / current_midi.

The usual workflow is:

  1. Estimate the current pitch with pitchYin(...), pitchPyin(...), or your own detector.
  2. Pass the measured pitch as currentMidi.
  3. Pass the target note as targetMidi.
typescript
const currentMidi = 68.7; // slightly below A4
const targetMidi = 69;    // A4
const tuned = pitchCorrectToMidi(vocal, sampleRate, currentMidi, targetMidi);

Time-varying pitch correction

F0, frames, and "voiced"

F0 is the fundamental frequency — the pitch — measured in Hz. A pitch detector reports one F0 per short time slice (a frame, here one hopLength of samples), giving an F0 contour that traces how the pitch moves. A frame is voiced when the singer is actually producing pitched sound (a sung vowel) rather than a breath or silence; only voiced frames are worth retuning.

pitchCorrectToMidi(...) applies a single constant transpose, which flattens any vibrato or drift in the take. When you want to keep that expression while still pulling the note toward a target, use pitchCorrectToMidiTimevarying(...). Instead of one current-pitch number, it follows a caller-supplied per-frame F0 contour and retunes every voiced frame toward targetMidi, so the natural pitch movement is tracked rather than ironed out.

typescript
import { init, pitchPyin, pitchCorrectToMidiTimevarying } from '@libraz/libsonare';

await init();

const frameLength = 512;
const hopLength = 512;

// 1. Measure a per-frame F0 contour (any detector that emits one F0 per hop).
const pitch = pitchPyin(vocal, sampleRate, frameLength, hopLength, 65, 1000, 0.3);

// 2. Retune every voiced frame toward A3 (MIDI 57) while keeping the contour.
const tuned = pitchCorrectToMidiTimevarying(
  vocal,
  pitch.f0,          // Float32Array, one F0 (Hz) per analysis frame
  57,                // target MIDI note
  sampleRate,
  hopLength,         // frame i covers sample i * hopLength
  pitch.voicedFlag,  // optional: only retune voiced frames
  pitch.voicedProb,  // optional: voicing probability in [0, 1]
);

voiced (non-zero = voiced) and voicedProb are optional; omit them to treat every frame as voiced. Use the same hopLength that produced the F0 contour, so frame i lines up with sample i * hopLength.

Constant vs contour-following correction

Use pitchCorrectToMidi(...) for a steady held note where one transpose is enough. Reach for pitchCorrectToMidiTimevarying(...) when the take has vibrato, slides, or drift you want to preserve while nudging it onto pitch.

MIDI note numbers

A MIDI note number represents pitch as a semitone index. Each whole number is one semitone, and fractional values are allowed.

Two anchors are worth memorizing:

NoteMIDI note numberFrequency
C4 (middle C)60about 261.63 Hz
A469440 Hz

Every octave adds 12. For example, C4 is 60, so C5 is 72.

For the full mapping table and the freq ↔ midi formula, see Editing Basics.

Region selection for noteStretch(...)

noteStretch(...) takes the region in sample offsets, not seconds.

If your UI works in seconds, convert like this:

typescript
const onsetSample = Math.round(onsetSeconds * sampleRate);
const offsetSample = Math.round(offsetSeconds * sampleRate);
const heldNote = noteStretch(vocal, sampleRate, { onsetSample, offsetSample, stretchRatio: 1.25 });

stretchRatio is the length multiplier for the selected region.

stretchRatioResult
1.25Make the region 25% longer
1.0Keep the same length
0.8Make the region 20% shorter

Offline voiceChange(...) vs RealtimeVoiceChanger

voiceChange(...) is the simple offline helper for a decoded mono clip: pass semitone and formant values, get a processed buffer back.

RealtimeVoiceChanger is the stateful preset chain used for live or chunked voice processing. It combines retune, formant, EQ, gate, compressor, de-esser, reverb, and limiter stages. Factory preset IDs include neutral-monitor, bright-idol, soft-whisper, deep-narrator, robot-mascot, and dark-villain.

Use the realtime class when you process repeated blocks and need continuity across calls. In WASM, call prepare(...) and delete() yourself. In Python, use the context manager or close().

typescript
import { init, RealtimeVoiceChanger, realtimeVoiceChangerPresetNames } from '@libraz/libsonare';

await init();

const changer = new RealtimeVoiceChanger('bright-idol');
changer.prepare(48000, 128, 1);

try {
  const out = changer.processMono(inputBlock);
  changer.setConfig('soft-whisper');
  console.log(realtimeVoiceChangerPresetNames(), changer.latencySamples(), out);
} finally {
  changer.delete();
}
python
import libsonare as sonare

with sonare.RealtimeVoiceChanger(48000, preset="bright-idol", max_block_size=128) as changer:
    out = changer.process_mono(input_block)
    changer.set_config("soft-whisper")
    print(sonare.realtime_voice_changer_preset_names(), changer.latency_samples())
bash
sonare voice-presets --json
sonare voice-change vocal.wav --preset soft-whisper -o rendered.wav

Using the Audio wrapper

The Audio wrapper exposes the same operations as instance methods. In file-based Python workflows, you can load the file once and then apply edits to the same Audio object.

Creative effect inserts

Beyond pitch and time transforms, two of the mixer/mastering insert processors are go-to voice and instrument color tools:

  • effects.modulation.ensemble — a BBD-style (an analog bucket-brigade delay chorus) string-machine ensemble that thickens a thin source into a wide, chorused pad.
  • saturation.ampSim — a guitar amp simulation that adds drive and speaker-cabinet character.

Load them as inserts on a strip (see Mixing Engine) rather than as standalone functions on a raw buffer.

Offline transforms vs arrange-time warp

The functions on this page are offline transforms: you hand them a buffer and get a new buffer back. They are different from arrange-time warp — clip repitch and tempo-sync inside a project, where a clip follows the timeline rather than being baked once. For that project-level workflow, see Project Editing.

Practical Notes

These APIs are intentionally lightweight editing tools, not a full non-destructive pitch editor. For transparent vocals, keep pitch correction intervals small and avoid extreme formant factors. For sound design, larger pitchSemitones and formantFactor moves are valid, but expect stronger artifacts.

Why big moves sound worse

These transforms work by analyzing the sound into short overlapping frames and re-spacing or re-pitching them. Small shifts stay close to the original frames and sound clean; large shifts force the engine to invent material that was never recorded, so you hear smearing, a "watery" or robotic quality, and a less natural voice. That is why the advice is to keep correction intervals small for transparent results.