Skip to content

Editing Basics

Editing here means changing the sound of one part: its pitch, its length, or the character of a voice.

It is different from analysis, which only measures a track and leaves it untouched. It is also different from mixing, which balances several parts together.

This page defines the words you need before reading the Editing DSP guide. It is concepts only; no code.

Where editing sits

You usually edit individual parts (fix timing, tune a note, change a voice), then mix them together, then master the finished stereo file for delivery. Editing is the per-part repair-and-shape step. Conservative edits keep the part natural; aggressive edits are a creative effect.

Pitch and time are independent

Speed up a tape and the music gets both faster and higher — the two are coupled. The whole point of modern editing DSP is to separate them, so you can change one without dragging the other along:

  • Time stretch changes the duration without changing the pitch. Here rate is a playback-speed multiplier: a rate above 1.0 plays faster, so the clip gets shorter; below 1.0 plays slower, so it gets longer.
  • Pitch shift changes the pitch without changing the duration.

Doing this cleanly is hard: the algorithm has to invent or remove time while keeping each note's frequency, so very large moves always leave some artifacts (smearing, a "phasey" or robotic tone). Small moves stay transparent.

PARAM SWEEP · TIME STRETCHIDLE
Time stretch — changing length, not pitch

Time stretching is pitch shift's exact opposite: it changes how long the audio lasts while leaving the pitch alone. Drag the rate and the drum hits spread out or bunch up — the waveform fills more or less of the panel — but the spectrum below barely moves. Below 1.0 the clip slows down and grows; above 1.0 it speeds up and shrinks. Press play to hear the groove change tempo with no chipmunk effect.

Rate
1 ×

Semitones and cents

Pitch in music is measured in semitones, the smallest step on a piano keyboard. Twelve semitones make one octave, and an octave doubles the frequency: A4 = 440 Hz, A5 = 880 Hz.

So semitones = 12 shifts up one octave, -12 shifts down one octave, and 7 is a perfect fifth.

A cent is 1/100 of a semitone. It is used for fine tuning; most listeners start to notice pitch error past roughly ±5-10 cents.

MIDI note numbers

When a function asks for a MIDI note number (such as pitchCorrectToMidi's currentMidi and targetMidi), it wants an integer (or fractional) index where each whole number is one semitone. Two anchors are worth memorizing:

  • A4 = 69 = 440 Hz — the tuning reference.
  • C4 = 60 — "middle C".

Every octave adds 12, so C5 = 72, C3 = 48. A few common targets:

NoteMIDIFrequency
C348130.81 Hz
C4 (middle C)60261.63 Hz
E464329.63 Hz
G467392.00 Hz
A4 (tuning reference)69440.00 Hz
C572523.25 Hz

To convert by hand: midi = 69 + 12 · log2(freq / 440) and freq = 440 · 2^((midi − 69) / 12). A fractional MIDI value like 68.7 is simply a pitch a little below A4 — useful because a singer is rarely exactly on a note, and you can pass the measured pitch as the current value and a whole number as the target.

Formant: pitch's independent partner

A formant is a peak of acoustic energy at a fixed frequency range. In a voice, formants come from resonances of the vocal tract. In instruments, similar resonances come from the body of the instrument.

Formants shape vowels and the perceived size or character of a voice. They are different from pitch: a singer can sing different notes while the vowel character still comes from the same vocal-tract resonances.

This is why voice tools keep pitch and formant on separate controls.

ControlWhat changesWhat you hear
PitchThe musical noteThe same voice singing higher or lower
FormantThe vocal characterA smaller/brighter or larger/darker voice at the same note

Moving both together too aggressively creates the familiar artificial "chipmunk" effect.

Samples vs seconds

Editing functions that act on a region (like note stretch) usually take sample offsets, not seconds, because sample positions are exact and never drift. Convert with samples = round(seconds × sampleRate) — at 48 kHz, 0.25 s is sample 12000. See Audio Basics for what a sample and sample rate are.

How libsonare implements editing DSP

The implementation uses the same C++ DSP core as analysis and mastering, but these functions rewrite the signal instead of measuring it.

Function familyImplementation ideaBeginner takeaway
Time stretch / pitch shiftPhase vocoder plus resamplingDuration and pitch can be changed separately, but large moves create artifacts
voiceChangeShift the spectral envelope separately from harmonic pitchVoice character can move without changing the target note
pitchCorrectToMidiMove from a caller-supplied current MIDI note to a target MIDI noteEstimate the current pitch first with pitchYin, pitchPyin, or your own detector
noteStretchProcess an exact sample-offset regionConvert seconds to samples before calling it

Related: Editing DSP, Audio Basics, MIR Overview, JavaScript API