Editing DSP
libsonare exposes editing-oriented DSP alongside the analysis and mastering APIs. These functions work on decoded mono Float32Array or Python sample sequences and are useful for simple vocal correction, note edits, and voice design.
Use this page when you already have audio samples and want to change the sound itself. If you only want to measure BPM, key, chords, or features, start with Getting Started instead.
If semitone, MIDI note number, or formant are unfamiliar, read Editing Basics first — this page assumes that vocabulary and focuses on which function to call.
DSP vs editing
DSP means Digital Signal Processing: measuring or transforming audio as a signal. The editing DSP on this page does more than measure BPM or key; it rewrites the sound itself, changing pitch, duration, or vocal character.
What You Will Learn
By the end of this page you should be able to:
- distinguish analysis APIs from editing APIs that actually rewrite the signal;
- choose time stretch, pitch shift, pitch correction, note stretch, spectral edit, or voice change by musical intent;
- choose the offline
voiceChange(...)helper versus the block-safe realtime voice changer; - convert between seconds, samples, semitones, and MIDI notes without guessing;
- understand why large pitch/formant moves create artifacts and how to keep edits conservative.
Which Edit Do You Need?
| Goal | Use | Beginner note |
|---|---|---|
| Make a clip longer or shorter without changing the note | Time stretch | rate is a playback-speed multiplier: rate=2.0 plays twice as fast, so the clip is half as long; rate=0.5 is half speed and twice as long. The pitch does not change. (Note: noteStretch's stretchRatio is a length multiplier, so it works the opposite way.) |
| Move the whole clip up or down in pitch | Pitch shift | semitones=12 means one octave up; -12 means one octave down |
| Nudge a vocal note toward a target note | Pitch correction | You must know or estimate the current pitch first |
| Hold or shorten one note region | Note stretch | Region positions are sample offsets, not seconds; stretchRatio > 1 lengthens the region |
| Attenuate or heal a time-frequency region | Spectral edit | Draw a rectangle in samples and Hz, then apply gain, attenuate, mute, or heal |
| Change vocal character | Voice change | Pitch changes the note; formant changes the perceived body or character of the voice |
| Process live voice blocks with presets | Realtime voice changer | Use this for AudioWorklet, monitoring, or chunked processing where DSP state must continue across blocks |
What is a formant?
A formant is a peak of acoustic energy in a particular frequency range. It is created by resonances of the vocal tract.
Formants shape the vowel sound and the perceived size or character of a voice, independent of the pitch (the note being sung).
That is why voiceChange separates the two controls. Lowering the formant factor makes a voice sound larger and darker. Raising it makes the voice sound smaller and brighter. The pitch stays where you set it.
Functions
Parameters like f0Hz (a per-frame pitch contour), hopLength, and voiced are defined under Time-varying pitch correction below.
| Task | WASM / browser JavaScript | Python |
|---|---|---|
| Shift the whole signal without changing duration | pitchShift(samples, sampleRate, semitones) | pitch_shift(samples, sample_rate, semitones) |
| Time-stretch the whole signal without changing pitch | timeStretch(samples, sampleRate, rate) | time_stretch(samples, sample_rate, rate) |
| Correct from one MIDI note to another | pitchCorrectToMidi(samples, sampleRate, currentMidi, targetMidi) | pitch_correct_to_midi(samples, sample_rate, current_midi, target_midi) |
| Follow a per-frame pitch contour toward a note | pitchCorrectToMidiTimevarying(samples, f0Hz, targetMidi, sampleRate, hopLength, voiced?, voicedProb?) | pitch_correct_to_midi_timevarying(samples, f0_hz, target_midi, sample_rate, hop_length, voiced?, voiced_prob?) |
| Stretch only a note region | noteStretch(samples, sampleRate, { onsetSample, offsetSample, stretchRatio }) | note_stretch(samples, sample_rate, onset_sample, offset_sample, stretch_ratio) |
| Edit a time-frequency region | spectralEdit(samples, sampleRate, ops, options?) | spectral_edit(samples, sample_rate, ops, ...) |
| Shift pitch and formants independently | voiceChange(samples, sampleRate, { pitchSemitones, formantFactor }) | voice_change(samples, sample_rate, pitch_semitones, formant_factor) |
| Stateful realtime voice preset chain | RealtimeVoiceChanger | RealtimeVoiceChanger |
| One-shot realtime voice preset render | Node native: voiceChangeRealtime(samples, sampleRate, preset) | voice_change_realtime(samples, sample_rate, preset) |
Node native argument order
Node native uses timeStretch(samples, rate, sampleRate?) and pitchShift(samples, semitones, sampleRate?). The WASM/browser functions above keep sampleRate before the edit amount.
Usage
import { init, noteStretch, pitchCorrectToMidi, voiceChange } from '@libraz/libsonare';
await init();
const tuned = pitchCorrectToMidi(vocal, sampleRate, 68.7, 69);
const heldNote = noteStretch(vocal, sampleRate, { onsetSample: 12000, offsetSample: 24000, stretchRatio: 1.25 });
const character = voiceChange(vocal, sampleRate, { pitchSemitones: 5, formantFactor: 1.1 });import libsonare as sonare
tuned = sonare.pitch_correct_to_midi(vocal, sample_rate, current_midi=68.7, target_midi=69)
held_note = sonare.note_stretch(vocal, sample_rate, onset_sample=12000, offset_sample=24000, stretch_ratio=1.25)
character = sonare.voice_change(vocal, sample_rate, pitch_semitones=5, formant_factor=1.1)# The sonare CLI loads and writes WAV/MP3 directly
sonare pitch-correct vocal.wav --current-midi 68.7 --target-midi 69 -o tuned.wav
sonare note-stretch vocal.wav --onset 12000 --offset 24000 --ratio 1.25 -o held.wav
sonare voice-change vocal.wav --pitch-semitones 5 --formant-factor 1.1 -o character.wavFor rectangular time/frequency edits such as whistle attenuation or short artifact repair, see Spectral Editing.
How pitchCorrectToMidi(...) works
pitchCorrectToMidi(...) takes two MIDI note numbers: the pitch the audio currently has, and the pitch you want it to move toward. It does not detect the current pitch by itself. The caller provides currentMidi / current_midi.
The usual workflow is:
- Estimate the current pitch with
pitchYin(...),pitchPyin(...), or your own detector. - Pass the measured pitch as
currentMidi. - Pass the target note as
targetMidi.
const currentMidi = 68.7; // slightly below A4
const targetMidi = 69; // A4
const tuned = pitchCorrectToMidi(vocal, sampleRate, currentMidi, targetMidi);Time-varying pitch correction
F0, frames, and "voiced"
F0 is the fundamental frequency — the pitch — measured in Hz. A pitch detector reports one F0 per short time slice (a frame, here one hopLength of samples), giving an F0 contour that traces how the pitch moves. A frame is voiced when the singer is actually producing pitched sound (a sung vowel) rather than a breath or silence; only voiced frames are worth retuning.
pitchCorrectToMidi(...) applies a single constant transpose, which flattens any vibrato or drift in the take. When you want to keep that expression while still pulling the note toward a target, use pitchCorrectToMidiTimevarying(...). Instead of one current-pitch number, it follows a caller-supplied per-frame F0 contour and retunes every voiced frame toward targetMidi, so the natural pitch movement is tracked rather than ironed out.
import { init, pitchPyin, pitchCorrectToMidiTimevarying } from '@libraz/libsonare';
await init();
const frameLength = 512;
const hopLength = 512;
// 1. Measure a per-frame F0 contour (any detector that emits one F0 per hop).
const pitch = pitchPyin(vocal, sampleRate, frameLength, hopLength, 65, 1000, 0.3);
// 2. Retune every voiced frame toward A3 (MIDI 57) while keeping the contour.
const tuned = pitchCorrectToMidiTimevarying(
vocal,
pitch.f0, // Float32Array, one F0 (Hz) per analysis frame
57, // target MIDI note
sampleRate,
hopLength, // frame i covers sample i * hopLength
pitch.voicedFlag, // optional: only retune voiced frames
pitch.voicedProb, // optional: voicing probability in [0, 1]
);voiced (non-zero = voiced) and voicedProb are optional; omit them to treat every frame as voiced. Use the same hopLength that produced the F0 contour, so frame i lines up with sample i * hopLength.
Constant vs contour-following correction
Use pitchCorrectToMidi(...) for a steady held note where one transpose is enough. Reach for pitchCorrectToMidiTimevarying(...) when the take has vibrato, slides, or drift you want to preserve while nudging it onto pitch.
MIDI note numbers
A MIDI note number represents pitch as a semitone index. Each whole number is one semitone, and fractional values are allowed.
Two anchors are worth memorizing:
| Note | MIDI note number | Frequency |
|---|---|---|
| C4 (middle C) | 60 | about 261.63 Hz |
| A4 | 69 | 440 Hz |
Every octave adds 12. For example, C4 is 60, so C5 is 72.
For the full mapping table and the freq ↔ midi formula, see Editing Basics.
Region selection for noteStretch(...)
noteStretch(...) takes the region in sample offsets, not seconds.
If your UI works in seconds, convert like this:
const onsetSample = Math.round(onsetSeconds * sampleRate);
const offsetSample = Math.round(offsetSeconds * sampleRate);
const heldNote = noteStretch(vocal, sampleRate, { onsetSample, offsetSample, stretchRatio: 1.25 });stretchRatio is the length multiplier for the selected region.
stretchRatio | Result |
|---|---|
1.25 | Make the region 25% longer |
1.0 | Keep the same length |
0.8 | Make the region 20% shorter |
Offline voiceChange(...) vs RealtimeVoiceChanger
voiceChange(...) is the simple offline helper for a decoded mono clip: pass semitone and formant values, get a processed buffer back.
RealtimeVoiceChanger is the stateful preset chain used for live or chunked voice processing. It combines retune, formant, EQ, gate, compressor, de-esser, reverb, and limiter stages. Factory preset IDs include neutral-monitor, bright-idol, soft-whisper, deep-narrator, robot-mascot, and dark-villain.
Use the realtime class when you process repeated blocks and need continuity across calls. In WASM, call prepare(...) and delete() yourself. In Python, use the context manager or close().
import { init, RealtimeVoiceChanger, realtimeVoiceChangerPresetNames } from '@libraz/libsonare';
await init();
const changer = new RealtimeVoiceChanger('bright-idol');
changer.prepare(48000, 128, 1);
try {
const out = changer.processMono(inputBlock);
changer.setConfig('soft-whisper');
console.log(realtimeVoiceChangerPresetNames(), changer.latencySamples(), out);
} finally {
changer.delete();
}import libsonare as sonare
with sonare.RealtimeVoiceChanger(48000, preset="bright-idol", max_block_size=128) as changer:
out = changer.process_mono(input_block)
changer.set_config("soft-whisper")
print(sonare.realtime_voice_changer_preset_names(), changer.latency_samples())sonare voice-presets --json
sonare voice-change vocal.wav --preset soft-whisper -o rendered.wavUsing the Audio wrapper
The Audio wrapper exposes the same operations as instance methods. In file-based Python workflows, you can load the file once and then apply edits to the same Audio object.
Creative effect inserts
Beyond pitch and time transforms, two of the mixer/mastering insert processors are go-to voice and instrument color tools:
effects.modulation.ensemble— a BBD-style (an analog bucket-brigade delay chorus) string-machine ensemble that thickens a thin source into a wide, chorused pad.saturation.ampSim— a guitar amp simulation that adds drive and speaker-cabinet character.
Load them as inserts on a strip (see Mixing Engine) rather than as standalone functions on a raw buffer.
Offline transforms vs arrange-time warp
The functions on this page are offline transforms: you hand them a buffer and get a new buffer back. They are different from arrange-time warp — clip repitch and tempo-sync inside a project, where a clip follows the timeline rather than being baked once. For that project-level workflow, see Project Editing.
Practical Notes
These APIs are intentionally lightweight editing tools, not a full non-destructive pitch editor. For transparent vocals, keep pitch correction intervals small and avoid extreme formant factors. For sound design, larger pitchSemitones and formantFactor moves are valid, but expect stronger artifacts.
Why big moves sound worse
These transforms work by analyzing the sound into short overlapping frames and re-spacing or re-pitching them. Small shifts stay close to the original frames and sound clean; large shifts force the engine to invent material that was never recorded, so you hear smearing, a "watery" or robotic quality, and a less natural voice. That is why the advice is to keep correction intervals small for transparent results.