Skip to content

Python API

libsonare provides Python bindings using ctypes over the native C API for high-performance audio analysis on desktop platforms. PyPI wheels are available for supported Linux and macOS targets.

What is ctypes (and the C API)?

libsonare's core is compiled C/C++. ctypes is Python's built-in way to call functions in a compiled shared library (.so/.dylib) directly, with no extra C extension to build. The Python package is a thin wrapper that forwards your calls to the same native code the C++ library runs, so you get native speed from plain Python. ("C API" just means the flat set of C functions that wrapper targets.)

Use this page when you want scripts, notebooks, batch analysis, or local tools that can read files directly. The Python package is usually the easiest route if you are not building a browser UI.

Python Mental Model

StepWhat happens
1. Load audioAudio.from_file(...) reads supported file formats into samples
2. Inspect or processCall detect_bpm, analyze, feature functions, editing DSP, mastering, or mixing APIs
3. Use resultsPrint values, save JSON, render audio, or feed features into your own pipeline

Most Python APIs accept raw sample arrays plus sample_rate. The Audio wrapper is a convenience for file-based workflows.

Start with Audio or call functions directly

If your workflow begins with an audio file, start with Audio.from_file(...). If you already have samples from NumPy or another loader, call module-level functions such as detect_bpm(samples, sample_rate) directly.

How To Read This Reference

Read this page in three passes:

  1. If you are loading files, start with Audio.from_file(...); if you already have samples, call the module-level functions directly.
  2. Use Pick The Smallest API That Solves The Job to choose a function family instead of scanning the full reference.
  3. Return to Types only when you need exact attribute names, row-major matrix shapes, or JS parity aliases.

A single analyze(...) call returns the complete result — chords, sections, timbre, dynamics, rhythm, melody, form, and per-beat strength — matching the other bindings. Reach for the focused functions below when you only need one field or want per-call options.

Default sample rate varies by family

Music-analysis and metering helpers default to sample_rate=22050; room-acoustic helpers (analyze_impulse_response, detect_acoustic, estimate_room) default to 48000. When you load with Audio.from_file(...), always pass audio.sample_rate so the per-family default never silently applies to audio recorded at a different rate.

Pick The Smallest API That Solves The Job

You needStart withWhy
A script that reads files and prints metadataAudio.from_file(...) + detect_bpm / detect_key / analyzePython handles decoding and keeps the code short
Detailed music analysisanalyze_bpm, detect_chords, analyze_sections, analyze_timbre, analyze_dynamics, analyze_rhythmThese run a single facet of analysis with extra parameters; analyze(...) already returns all of these fields in one AnalysisResult
Feature arrays for notebooks or MLmel_spectrogram, mfcc, chroma, cqt, vqt, nnls_chromaReturns plain Python lists / result objects that can be converted to NumPy if desired
Editing a cliptime_stretch, pitch_shift, pitch_correct_to_midi, note_stretch, voice_change, RealtimeVoiceChangerThese transform the signal itself
Mastering a filemaster_audio, mastering_chain, StreamingMasteringChainPresets first, explicit chain config when you need control
Live or chunked analysisStreamAnalyzerFeed audio blocks, drain feature frames, and read progressive BPM/key/chord estimates
Stem mixingmix_stereo or Mixer.from_scene_json(...)One-shot arrays first; scene mixer for sends, buses, automation, and meters
Room decay, clarity, equivalent-room estimates, or generated room characteranalyze_impulse_response, detect_acoustic, estimate_room, synthesize_rir, room_morphThese describe or apply the room, not the song

Installation

Requires Python 3.11 or later (3.11, 3.12, 3.13).

bash
pip install libsonare

This also installs the sonare CLI command. See CLI Reference for details.

Default PyPI wheels decode WAV and MP3. Use libsonare.has_ffmpeg_support() to check the loaded build. If you need direct M4A/AAC/FLAC/OGG/Opus decoding, install from source with FFmpeg enabled:

bash
SONARE_FFMPEG=1 pip install libsonare --no-binary libsonare

FFmpeg-enabled builds require FFmpeg development libraries. On macOS, install them with brew install ffmpeg. On Debian/Ubuntu, install libavformat-dev libavcodec-dev libavutil-dev libswresample-dev.

Building from Source (alternative)

If pre-built wheels are not available for your platform, you can build from source:

bash
git clone https://github.com/libraz/libsonare.git
cd libsonare
cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED=ON
cmake --build build -j

cd bindings/python
pip install -e .

Requirements for building from source:

  • Python 3.11+
  • CMake 3.16+
  • C++17 compiler (GCC or Clang on the supported Linux/macOS targets)
  • Optional FFmpeg development libraries when building with SONARE_FFMPEG=1

Quick Start

python
from libsonare import Audio, analyze, detect_bpm, detect_key, detect_beats

# Load audio from file
audio = Audio.from_file("music.mp3")

# Individual analysis
bpm = detect_bpm(audio.data, audio.sample_rate)
key = detect_key(audio.data, audio.sample_rate)
beats = detect_beats(audio.data, audio.sample_rate)

# Full analysis
result = analyze(audio.data, audio.sample_rate)
print(f"BPM: {result.bpm} ({result.bpm_confidence:.0%})")
print(f"Key: {result.key}")
print(f"Time Signature: {result.time_signature}")
print(f"Beats: {len(result.beat_times)} detected")

Error handling

Errors raise SonareError, a RuntimeError subclass carrying the native error code in its .code attribute, so except RuntimeError: continues to work while except sonare.SonareError as e: gives you the code. The codes are the same C-ABI values the JS bindings expose as ErrorCode (see Error Handling), and the CLI maps them onto its exit codes.

Audio Effects

python
from libsonare import Audio

audio = Audio.from_file("music.mp3")

# Harmonic-Percussive Source Separation
hpss_result = audio.hpss()
harmonic = audio.harmonic()
percussive = audio.percussive()

# Time stretch / pitch shift
stretched = audio.time_stretch(rate=1.5)       # 1.5x speed
shifted = audio.pitch_shift(semitones=2.0)     # Up 2 semitones

# Normalize and trim silence
normalized = audio.normalize(target_db=-3.0)
trimmed = audio.trim(threshold_db=-60.0)

# Resample
resampled = audio.resample(target_sr=44100)

For region-based time/frequency edits, use spectral_edit(samples, sample_rate, [SpectralRegionOp(...)]); see Spectral Editing.

Feature Extraction

python
from libsonare import Audio

audio = Audio.from_file("music.mp3")

# Spectrogram features
stft_result = audio.stft(n_fft=2048, hop_length=512)
mel = audio.mel_spectrogram(n_fft=2048, hop_length=512, n_mels=128)
mfcc = audio.mfcc(n_fft=2048, hop_length=512, n_mels=128, n_mfcc=20)
chroma = audio.chroma(n_fft=2048, hop_length=512)

# Spectral features
centroid = audio.spectral_centroid()
bandwidth = audio.spectral_bandwidth()
rolloff = audio.spectral_rolloff(roll_percent=0.85)
flatness = audio.spectral_flatness()
zcr = audio.zero_crossing_rate()
rms = audio.rms_energy()

# Pitch detection
pitch_yin = audio.pitch_yin(fmin=65.0, fmax=2093.0)
pitch_pyin = audio.pitch_pyin(fmin=65.0, fmax=2093.0)
print(f"Median F0: {pitch_pyin.median_f0:.1f} Hz")
MEL · SPECTRALIDLE
Mel spectrogram — frequency the way we hear it

The same STFT, re-mapped onto the mel scale: fine resolution low down where the ear discriminates, coarser up high. The harmonic stack and formant bands of this vowel-like tone sit closer together than on a linear axis — the view our ears (and most ML front-ends) actually use.

Pitch and feature terms: YIN/pYIN, zero-crossing rate, MIDI note
  • YIN / pYIN — algorithms that estimate the fundamental frequency (the perceived pitch) of monophonic audio. YIN uses autocorrelation; pYIN adds probabilistic smoothing so the pitch line stays steadier over time. Both track one note at a time, not chords.
  • Zero-crossing rate (ZCR) — how often the waveform crosses zero per frame. High ZCR means noisy or high-frequency content (cymbals, fricatives); low ZCR means smooth, tonal sound.
  • MIDI note number — an integer naming a pitch: A4 = 69, middle C = 60, each semitone ±1. hz_to_midi / midi_to_hz (below) convert between Hz and this scale.

Unit Conversions

python
from libsonare import hz_to_mel, mel_to_hz, hz_to_midi, midi_to_hz
from libsonare import hz_to_note, note_to_hz, frames_to_time, time_to_frames

hz_to_mel(440.0)       # → Mel scale value
mel_to_hz(549.64)      # → Hz
hz_to_midi(440.0)      # → 69.0
midi_to_hz(69.0)       # → 440.0
hz_to_note(440.0)      # → "A4"
note_to_hz("A4")       # → 440.0

frames_to_time(100, sr=22050, hop_length=512)  # → seconds
time_to_frames(2.32, sr=22050, hop_length=512) # → frame index

API Reference

Audio

MethodDescription
Audio.from_file(path)Load WAV/MP3 from disk; also FFmpeg-supported formats when the library is built with FFmpeg
Audio.from_buffer(data, sample_rate)Create from float samples
Audio.from_memory(data)Decode encoded audio bytes with the same format support as from_file
audio.dataRaw float samples
audio.sample_rateSample rate (Hz)
audio.durationDuration (seconds)
audio.lengthNumber of samples
audio.close()Free native memory

The Python Audio object is broader than the WASM convenience wrapper.

It includes the common feature, editing, loudness, mastering, and resampling methods. It also adds focused analysis methods such as analyze_bpm(...), analyze_impulse_response(...), detect_acoustic(...), analyze_rhythm(...), analyze_dynamics(...), analyze_timbre(...), and positional detect_chords(...).

Supports context manager for automatic cleanup:

python
with Audio.from_file("music.mp3") as audio:
    result = audio.analyze()

Analysis Functions

FunctionReturn TypeDescription
detect_bpm(samples, sample_rate)floatTempo in BPM
detect_key(samples, sample_rate)KeyRoot, mode, confidence
detect_beats(samples, sample_rate)list[float]Beat timestamps (seconds)
detect_onsets(samples, sample_rate)list[float]Onset timestamps (seconds)
detect_downbeats(samples, sample_rate)list[float]Downbeat timestamps (seconds)
detect_key_candidates(samples, sample_rate, ...)list[KeyCandidate]Ranked key candidates with correlation
detect_chords(samples, sample_rate, ...)ChordAnalysisResultChord segments over time
analyze(samples, sample_rate)AnalysisResultFull analysis: BPM, key, time signature, beats, chords, sections, timbre, dynamics, rhythm, melody, form
analyze_with_progress(samples, sample_rate, on_progress?)AnalysisResultSame result as analyze, with an optional (progress, stage) callback
analyze_bpm(samples, sample_rate, ...)BpmAnalysisResultBPM with top candidates
chord_functional_analysis(samples, key_root, key_mode?, ...)list[str]Roman-numeral labels ("I", "IV", "V", "vi", ...) for detected chords, relative to a key
analyze_rhythm(samples, sample_rate, ...)RhythmResultSyncopation, groove type, regularity
analyze_dynamics(samples, sample_rate, ...)DynamicsResultDynamic range, loudness range, crest factor
analyze_timbre(samples, sample_rate, ...)TimbreResultBrightness, warmth, density, roughness, complexity, plus per-window timbre_over_time (timbreOverTime alias)
analyze_sections(samples, sample_rate, ...)SectionResultSong-structure sections (intro/verse/chorus/...)
analyze_melody(samples, sample_rate, ...)MelodyResultMonophonic melody contour (YIN)
analyze_impulse_response(samples, sample_rate, ...)AcousticResultRoom acoustics from an impulse response (RT60/EDT/C50/C80)
detect_acoustic(samples, sample_rate, ...)AcousticResultBlind room-acoustic estimation
estimate_room(samples, sample_rate, ...)RoomEstimateEquivalent-room estimate with volume, dimensions, DRR, absorption bands, RT60 bands, and confidence
synthesize_rir(length_m, width_m, height_m, ...)RirResultMono room impulse response from shoebox geometry
room_morph(samples, sample_rate, length_m, width_m, height_m, ...)list[float]Offline creative morph toward a target room
version()strLibrary version
voice_changer_abi_version()intABI version of the realtime voice-changer POD config; separate from preset JSON schemaVersion
voice_character_preset_id(preset)str | NoneCanonical voice-character preset ID for an integer ordinal
realtime_voice_changer_preset_config(preset)RealtimeVoiceChangerConfigResolved flat POD config for a built-in voice preset, without JSON parsing
engine_abi_version()intABI version of the realtime engine interface
project_abi_version()intABI version of the project/editing surface used by Project serialization, bounce, and realtime clip exchange
has_ffmpeg_support()boolWhether the loaded native library can decode via FFmpeg

Most core analysis, effects, feature, loudness, and mastering helpers are also available as Audio instance methods (e.g., audio.detect_bpm()). Some focused helpers such as analyze_sections(...), analyze_melody(...), cqt(...), and vqt(...) remain standalone functions; pass audio.data and audio.sample_rate to those.

In Python, analyze(...) calls sonare_analyze_json and returns the full AnalysisResult: BPM (with confidence), key, time signature, beat times and per-beat strengths, chords, sections, timbre, dynamics, rhythm, melody, and form. The focused functions above remain useful when you want a single facet, parameterized/targeted analysis, or to avoid recomputing the whole result. (Acoustic/room metrics are separate — see estimate_room and the room helpers; they are not part of AnalysisResult.)

python
keys = sonare.detect_key_candidates(
    audio.data,
    audio.sample_rate,
    modes=["major", "minor"],
    profile="krumhansl",
)

chords = sonare.detect_chords(
    audio.data,
    audio.sample_rate,
    use_hmm=True,
    use_key_context=True,
    key_root=keys[0].key.root,
    key_mode=keys[0].key.mode,
    chroma_method="nnls",
)

sections = sonare.analyze_sections(audio.data, audio.sample_rate)

For long files, analyze_with_progress(...) returns the same AnalysisResult as analyze(...) but accepts an on_progress=(progress, stage) callback, mirroring the mastering progress callbacks below:

python
def on_step(progress: float, stage: str) -> None:
    print(f"{progress:5.1%}  {stage}")

result = sonare.analyze_with_progress(audio.data, audio.sample_rate, on_progress=on_step)

To label chords with Roman numerals relative to a key, use chord_functional_analysis(...). It detects chords with the same algorithm as detect_chords(...), then returns one label per detected chord, in chord order:

python
labels = sonare.chord_functional_analysis(
    audio.data,
    key_root=keys[0].key.root,
    key_mode=keys[0].key.mode,
    sample_rate=audio.sample_rate,
    use_key_context=True,
)
print(labels)  # e.g. ['I', 'V', 'vi', 'IV']

Room Acoustics

Use these functions for the room or playback space, not for song structure.

GoalUse
Measure a clean impulse responseanalyze_impulse_response(...)
Estimate room decay from ordinary audiodetect_acoustic(...)
Fit a practical room model from audioestimate_room(...)
Create a mono room impulse response from dimensionssynthesize_rir(...)
Add a target-room character as an effectroom_morph(...)

Defaults and terms

analyze_impulse_response(...) and detect_acoustic(...) return AcousticResult with RT60, EDT, C50, C80, D50, per-band arrays, confidence, and is_blind. Their sample_rate default is 48000, unlike most music-analysis helpers that default to 22050. RIR means room impulse response.

python
ir = sonare.analyze_impulse_response(ir_samples, sample_rate, n_octave_bands=6)
print(ir.rt60, ir.edt, ir.c50, ir.c80, ir.confidence)

blind = sonare.detect_acoustic(
    room_recording,
    sample_rate,
    n_octave_bands=6,
    n_third_octave_subbands=24,
    min_decay_db=30.0,
    noise_floor_margin_db=10.0,
)
print(blind.is_blind, blind.rt60_bands)

estimate = sonare.estimate_room(room_recording, sample_rate, n_octave_bands=6)
print(estimate.volume, estimate.length, estimate.width, estimate.height)
print(estimate.drr_db, estimate.confidence, estimate.absorption_bands)

rir = sonare.synthesize_rir(7.0, 5.0, 3.0, absorption=0.2, sample_rate=sample_rate)
print(rir.sample_rate, len(rir.rir), rir.has_error)

morphed = sonare.room_morph(room_recording, sample_rate, 12.0, 9.0, 4.0, wet=0.6)

Keep three cautions in mind:

  • estimate_room(...) returns an equivalent room, not guaranteed real geometry; inspect confidence.
  • synthesize_rir(...) reports invalid source/listener placement through has_error.
  • room_morph(...) is a creative effect, not dereverberation.

See Room Acoustics for interpretation notes and when a blind estimate is appropriate.

Effects Functions

FunctionReturn TypeDescription
hpss(samples, sample_rate, kernel_harmonic?, kernel_percussive?)HpssResultHarmonic-Percussive Source Separation
harmonic(samples, sample_rate)list[float]Extract harmonic component
percussive(samples, sample_rate)list[float]Extract percussive component
time_stretch(samples, sample_rate, rate)list[float]Time-stretch without pitch change
pitch_shift(samples, sample_rate, semitones)list[float]Pitch-shift without tempo change
pitch_correct_to_midi(samples, sample_rate, current_midi?, target_midi?)list[float]Pitch-correct toward a target MIDI note
pitch_correct_to_midi_timevarying(samples, f0_hz, target_midi, sample_rate?, hop_length?, voiced?, voiced_prob?)list[float]Contour-following pitch correction: retunes every voiced frame toward target_midi along a per-frame f0_hz contour, preserving vibrato/drift instead of flattening it
note_stretch(samples, sample_rate, onset_sample?, offset_sample?, stretch_ratio?)list[float]Stretch a single note region in place
voice_change(samples, sample_rate, pitch_semitones?, formant_factor?)list[float]Independent pitch + formant shift
voice_change_realtime(samples, sample_rate?, preset?, channels?)np.ndarrayOne-shot render through the realtime voice preset chain
normalize(samples, sample_rate, target_db?)list[float]Normalize to target dB (default: 0.0)
trim(samples, sample_rate, threshold_db?)list[float]Trim silence (default: -60.0 dB)
resample(samples, src_sr, target_sr)list[float]Resample to target sample rate

trim(...) is the simple threshold-based edit helper. The librosa-compatible trim_silence(...) helper below uses frame RMS and top_db, and returns the trimmed audio together with its original sample range.

Realtime voice changer

RealtimeVoiceChanger wraps the same preset-driven live voice chain exposed by WASM and Node native. It keeps retune, formant, EQ, gate, compressor, de-esser, reverb, and limiter state across blocks. Use it instead of voice_change(...) when processing microphone or stream blocks.

python
import json
import libsonare as sonare

print(sonare.realtime_voice_changer_preset_names())
print(sonare.voice_changer_abi_version())  # native POD-config ABI version
print(sonare.voice_character_preset_id(1))  # "bright-idol"
preset_json = sonare.realtime_voice_changer_preset_json("bright-idol")
print(sonare.validate_realtime_voice_changer_preset_json(preset_json)["ok"])
preset_config = sonare.realtime_voice_changer_preset_config("bright-idol")  # canonical RealtimeVoiceChangerConfig

with sonare.RealtimeVoiceChanger(48000, preset="bright-idol", max_block_size=128) as changer:
    out = changer.process_mono(input_block)
    changer.set_config(json.loads(preset_json))
    print(changer.latency_samples(), changer.config_json(), out.shape)

# Convenience one-shot render through the same realtime chain.
processed = sonare.voice_change_realtime(vocal, sample_rate=48000, preset="soft-whisper")

Preset IDs currently include neutral-monitor, bright-idol, soft-whisper, deep-narrator, robot-mascot, and dark-villain.

Use realtime_voice_changer_preset_config(preset) when you want the resolved POD config rather than the JSON form. It returns the canonical, normalized RealtimeVoiceChangerConfig for a built-in preset by ID or index.

realtime_voice_changer_preset_pod(preset) remains as a compatibility alias.

Feature Extraction Functions

FunctionReturn TypeDescription
stft(samples, sample_rate, n_fft?, hop_length?)StftResultShort-Time Fourier Transform
stft_db(samples, sample_rate, n_fft?, hop_length?)tupleSTFT in decibels
mel_spectrogram(samples, sample_rate, n_fft?, hop_length?, n_mels?, fmin?, fmax?, htk?)MelSpectrogramResultMel spectrogram; fmin/fmax bound the band edges, htk=True uses the HTK Mel formula
mfcc(samples, sample_rate, n_fft?, hop_length?, n_mels?, n_mfcc?, fmin?, fmax?, htk?)MfccResultMel-Frequency Cepstral Coefficients
chroma(samples, sample_rate, n_fft?, hop_length?)ChromaResultChroma features (pitch class distribution)
spectral_centroid(samples, sample_rate, n_fft?, hop_length?)list[float]Spectral centroid per frame
spectral_bandwidth(samples, sample_rate, n_fft?, hop_length?)list[float]Spectral bandwidth per frame
spectral_rolloff(samples, sample_rate, n_fft?, hop_length?, roll_percent?)list[float]Spectral rolloff per frame
spectral_flatness(samples, sample_rate, n_fft?, hop_length?)list[float]Spectral flatness per frame
spectral_contrast(samples, sample_rate?, n_fft?, hop_length?, n_bands?, fmin?, quantile?)np.ndarraySpectral contrast, shape (n_bands + 1, n_frames)
poly_features(samples, sample_rate?, n_fft?, hop_length?, order?)np.ndarrayPer-frame polynomial spectral coefficients
zero_crossing_rate(samples, sample_rate, frame_length?, hop_length?)list[float]Zero-crossing rate per frame
zero_crossings(samples, threshold?, ref_magnitude?, pad?, zero_pos?)np.ndarraySample indices where the waveform crosses zero
waveform_peaks(samples, channels, *, samples_per_bucket=512, validate=True)WaveformPeaksReportReduce interleaved multichannel audio (length a multiple of channels) to per-channel min/max buckets for waveform drawing; min/max are channel-major (channel * bucket_count + bucket)
waveform_peak_pyramid(samples, channels, *, samples_per_bucket_levels=(512, 1024, 2048, 4096), validate=True)list[WaveformPeaksReport]One peaks report per zoom level (one entry per bucket width)
rms_energy(samples, sample_rate, frame_length?, hop_length?)list[float]RMS energy per frame
pitch_yin(samples, sample_rate, frame_length?, hop_length?, fmin?, fmax?, threshold?, fill_na?)PitchResultYIN pitch estimation; unvoiced f0 stays nan unless fill_na=True
pitch_pyin(samples, sample_rate, frame_length?, hop_length?, fmin?, fmax?, threshold?, fill_na?)PitchResultpYIN pitch estimation; unvoiced f0 stays nan unless fill_na=True
pitch_tuning(frequencies, resolution?, bins_per_octave?)floatGlobal tuning offset from detected frequencies, in fractions of a bin
estimate_tuning(samples, sample_rate?, n_fft?, hop_length?, resolution?, bins_per_octave?)floatEstimate tuning offset directly from audio
cqt(samples, sample_rate, hop_length?, fmin?, n_bins?, bins_per_octave?)CqtResultConstant-Q Transform magnitude
vqt(samples, sample_rate, hop_length?, fmin?, n_bins?, bins_per_octave?, gamma?)CqtResultVariable-Q Transform magnitude
hybrid_cqt(samples, sample_rate?, hop_length?, fmin?, n_bins?, bins_per_octave?)CqtResultHybrid CQT magnitude (CQT/pseudo-CQT blend across bins)
pseudo_cqt(samples, sample_rate?, hop_length?, fmin?, n_bins?, bins_per_octave?)CqtResultApproximate (pseudo) CQT magnitude
bass_chroma(samples, sample_rate?, hop_length?, n_chroma?)ChromaResultBass-focused chroma (low-register pitch-class distribution)
chroma_cens(samples, sample_rate?, hop_length?, n_chroma?)ChromaResultCENS energy-normalized/smoothed chroma
nnls_chroma(samples, sample_rate)tuple[int, list[float]]NNLS chromagram — returns (n_frames, row-major 12 x n_frames data)
decompose(s, n_features, n_frames, n_components, n_iter?, beta?)tupleNMF decomposition factors (w, h) from a row-major spectrogram
decompose_with_init(s, n_features, n_frames, n_components, n_iter?, beta?, init?)tupleNMF decomposition (w, h) with a selectable initialiser; init defaults to 'random', also accepts 'nndsvd' (SVD warm start)
nn_filter(s, n_features, n_frames, aggregate?, k?, width?)np.ndarrayNearest-neighbor filtering of a row-major spectrogram
onset_envelope(samples, sample_rate, n_fft?, hop_length?, n_mels?)list[float]Onset strength envelope (input to the tempogram family)
onset_strength_multi(samples, sample_rate?, n_fft?, hop_length?, n_mels?, n_bands?)tuple[int, list[float]]Multi-band onset strength; returns (n_frames, [n_bands x n_frames]) row-major (n_bands default 3)
lufs(samples, sample_rate)LufsResultIntegrated/momentary/short-term LUFS + loudness range (EBU R128)
lufs_interleaved(samples, channels, sample_rate?)LufsResultChannel-weighted multichannel loudness from interleaved samples
ebur128_loudness_range(samples, sample_rate?)floatEBU R128 loudness range (LRA) in LU
momentary_lufs(samples, sample_rate)list[float]Momentary LUFS per frame
short_term_lufs(samples, sample_rate)list[float]Short-term LUFS per frame

Default parameters: n_fft=2048, hop_length=512, n_mels=128, n_mfcc=20, pitch fmin=65.0, fmax=2093.0, threshold=0.3, roll_percent=0.85. CQT/VQT use fmin=32.70319566 Hz (C1), n_bins=84, and bins_per_octave=12.

Additional effect helpers include remix(samples, intervals, sample_rate?, align_zeros?), phase_vocoder(samples, sample_rate?, rate?), and hpss_with_residual(samples, sample_rate?, kernel_harmonic?, kernel_percussive?). Use them when you need librosa-style interval remixing, direct phase-vocoder time scaling, or HPSS with the residual signal preserved.

Inverse Reconstruction Functions

Reconstruct a spectrum or audio from a mel spectrogram or MFCC matrix. Phase is estimated with Griffin-Lim, so the round-trip is lossy — see Inverse Features. Matrix inputs are row-major.

FunctionReturn TypeDescription
mel_to_stft(mel, n_mels, n_frames, sample_rate?, n_fft?, fmin?, fmax?, htk?)InverseResultLinear STFT power from a mel spectrogram
mel_to_audio(mel, n_mels, n_frames, sample_rate?, n_fft?, hop_length?, fmin?, fmax?, n_iter?, htk?)list[float]Audio from a mel spectrogram (Griffin-Lim)
mfcc_to_mel(mfcc_coeffs, n_mfcc, n_frames, n_mels?)InverseResultMel spectrogram (dB) from MFCC coefficients
mfcc_to_audio(mfcc_coeffs, n_mfcc, n_frames, n_mels?, sample_rate?, n_fft?, hop_length?, fmin?, fmax?, n_iter?, htk?)list[float]Audio from MFCC coefficients

Pass 0.0 for fmin/fmax to use the full-band defaults; n_iter defaults to 32. Keep fmin/fmax/htk identical to the values used by the forward transform so the round-trip stays consistent.

Metering Functions

Standalone level, dynamics, and stereo-image meters. Each accepts a keyword-only validate flag (default True); pass validate=False to skip NaN/Inf input checks on hot paths. The stereo meters require left and right to be equal length. sample_rate defaults to 22050.

FunctionReturn TypeDescription
metering_peak_db(samples, sample_rate?, *, validate?)floatSample peak (dBFS)
metering_rms_db(samples, sample_rate?, *, validate?)floatRMS level (dBFS)
metering_crest_factor_db(samples, sample_rate?, *, validate?)floatCrest factor, peak − RMS (dB)
metering_dc_offset(samples, sample_rate?, *, validate?)floatMean (DC) offset, linear amplitude
metering_true_peak_db(samples, sample_rate?, oversample_factor?, *, validate?)floatInter-sample (true) peak (dBFS); oversample_factor is a power of two in 1..16 (0 = default 4)
metering_detect_clipping(samples, sample_rate?, threshold?, min_region_samples?, *, validate?)ClippingReportClipped-sample runs; threshold default 0.999, min_region_samples default 1
metering_dynamic_range(samples, sample_rate?, window_sec?, hop_sec?, low_percentile?, high_percentile?, *, validate?)DynamicRangeReportSliding-window dynamic range; pass 0.0 for window_sec/hop_sec defaults (3 s / 1 s); pass a negative value (the default -1.0) for low_percentile/high_percentile defaults (0.10 / 0.95) — 0.0 requests the 0th percentile, not the default
metering_stereo_correlation(left, right, sample_rate?, *, validate?)floatPearson correlation, −1..1
metering_stereo_width(left, right, sample_rate?, *, validate?)floatMid/side stereo width
metering_vectorscope(left, right, sample_rate?, *, validate?)VectorscopeReportPer-sample mid/side point series
metering_vectorscope_decimated(left, right, sample_rate?, max_points?, *, validate?)VectorscopeReportDisplay-sized mid/side vectorscope; max_points upper-bounds the point count (0 or a value ≥ buffer length = one point per sample, identical to metering_vectorscope); otherwise deterministically decimated, keeping the largest-radius sample per bucket
metering_phase_scope(left, right, sample_rate?, *, validate?)PhaseScopeReportPhase-scope point series plus summary stats
metering_phase_scope_decimated(left, right, sample_rate?, max_points?, *, validate?)PhaseScopeReportDisplay-sized phase-scope (Lissajous + summary stats); max_points upper-bounds the point cloud the same way; summary stats are always computed over the full-resolution signal
metering_spectrum(samples, sample_rate?, n_fft?, apply_octave_smoothing?, octave_fraction?, db_ref?, db_amin?, *, validate?)SpectrumReportWelch-averaged magnitude/power/dB spectrum over the whole buffer (Hann-windowed, 50%-overlapping n_fft frames; not a single-frame snapshot); pass 0 for n_fft/octave_fraction/db_ref/db_amin defaults (2048 / 3 / 1.0 / floor)
metering_spectrum_frame(samples, sample_rate?, frame_offset?, n_fft?, apply_octave_smoothing?, octave_fraction?, db_ref?, db_amin?, *, validate?)SpectrumReportTrue single-frame spectrum (one Hann-windowed FFT) spanning [frame_offset, frame_offset + n_fft), zero-padded past the end; pass 0 for frame_offset/n_fft/octave_fraction/db_ref/db_amin defaults

Scale Quantization

12-TET scale helpers for building pitch-correction targets. mode_mask is a 12-bit mask where bit i enables the i-th pitch class relative to root (PitchClass, C = 0); natural major is 0b101010110101. reference_midi is the tuning anchor (pass 0.0 for A4 = 69). Pair with pitch_correct_to_midi(...) to retune to the nearest scale degree.

FunctionReturn TypeDescription
scale_quantize_midi(root, mode_mask, midi, reference_midi?)floatSnap a (fractional) MIDI number to the nearest enabled pitch class
scale_correction_semitones(root, mode_mask, midi, reference_midi?)floatCorrection (quantized − input), in semitones
scale_pitch_class_enabled(root, mode_mask, pitch_class)boolWhether pitch_class (0..11) is enabled relative to root

librosa-Compatible Helpers

These mirror the corresponding librosa functions — see librosa Compatibility for the function each helper matches.

What each helper is for

  • preemphasis / deemphasis — classic one-tap IIR pre-processing that boosts (or undoes) high frequencies.
  • trim_silence / split_silence — trim leading/trailing silence or split on silent gaps.
  • frame_signal / pad_center / fix_length / fix_frames — framing and size-alignment utilities for fixed-frame DSP.
  • peak_pick / vector_normalize — peak detection on 1-D signals (e.g. onset envelopes) and vector-norm normalization.
  • pcen — dynamic range compression for mel spectrograms; features that are robust to gain and background noise.
  • tonnetz — projects chroma into a 6-D harmonic space for chord-relation and modulation analysis.
  • tempogram / plp — time-varying tempo representation from the onset envelope (autocorrelation or mode="cosine"), and the dominant local pulse on top.
  • fourier_tempogram / cyclic_tempogram / tempogram_ratio — the FFT-based tempogram, an octave-folded cyclic tempogram, and tempo-ratio features.
FunctionReturn TypeDescription
preemphasis(samples, coef?, zi?)list[float]Pre-emphasis filter (librosa.effects.preemphasis)
deemphasis(samples, coef?, zi?)list[float]Inverse pre-emphasis (librosa.effects.deemphasis)
trim_silence(samples, top_db?, frame_length?, hop_length?)tuple[list[float], int, int]librosa.effects.trim — returns (audio, start_sample, end_sample)
split_silence(samples, top_db?, frame_length?, hop_length?)list[tuple[int, int]]librosa.effects.split — non-silent intervals as sample pairs
frame_signal(samples, frame_length, hop_length)tuple[int, list[float]]librosa.util.frame — returns (n_frames, row-major frames)
pad_center(values, size, pad_value?)list[float]librosa.util.pad_center
fix_length(values, size, pad_value?)list[float]librosa.util.fix_length
fix_frames(frames, x_min?, x_max?, pad?)list[int]librosa.util.fix_frames
peak_pick(values, pre_max, post_max, pre_avg, post_avg, delta, wait)list[int]librosa.util.peak_pick — returns peak indices
vector_normalize(values, norm_type?, threshold?)list[float]librosa.util.normalize. norm_type: 0=inf, 1=L1, 2=L2, 3=power
pcen(values, n_bins, n_frames, sample_rate?, hop_length?, time_constant?, gain?, bias?, power?, eps?)list[float]librosa.pcen — input is row-major [n_bins x n_frames] mel
tonnetz(chromagram, n_chroma, n_frames)list[float]librosa.feature.tonnetz — returns row-major [6 x n_frames]
tempogram(onset_envelope, sample_rate?, hop_length?, win_length?, center?, norm?, mode?)tuple[int, list[float]]librosa.feature.tempogram. mode: "autocorrelation" (default) or "cosine"
fourier_tempogram(onset_envelope, sample_rate?, hop_length?, win_length?, center?, norm?)tuple[int, list[float]]FFT-based tempogram — STFT of the onset envelope
cyclic_tempogram(onset_envelope, sample_rate?, hop_length?, win_length?, bpm_min?, n_bins?)tuple[int, list[float]]Octave-folded cyclic tempogram
tempogram_ratio(tempogram_data, win_length?, sample_rate?, hop_length?, factors?)list[float]Tempo-ratio features from a tempogram
plp(onset_envelope, sample_rate?, hop_length?, tempo_min?, tempo_max?, win_length?)list[float]librosa.beat.plp — predominant local pulse

Conversion Functions

FunctionDescription
hz_to_mel(hz)Hertz → Mel scale
mel_to_hz(mel)Mel scale → Hertz
hz_to_midi(hz)Hertz → MIDI note number
midi_to_hz(midi)MIDI note number → Hertz
hz_to_note(hz)Hertz → note name (e.g., "A4")
note_to_hz(note)Note name → Hertz
frames_to_time(frames, sr, hop_length)Frame index → seconds
time_to_frames(time, sr, hop_length)Seconds → frame index
frames_to_samples(frames, hop_length?, n_fft?)Frame index → sample index (librosa.frames_to_samples)
samples_to_frames(samples, hop_length?, n_fft?)Sample index → frame index (librosa.samples_to_frames)
power_to_db(values, ref?, amin?, top_db?)Power → dB (librosa.power_to_db)
amplitude_to_db(values, ref?, amin?, top_db?)Amplitude → dB (librosa.amplitude_to_db)
db_to_power(values, ref?)dB → power
db_to_amplitude(values, ref?)dB → amplitude

Types

Result objects are plain classes with attribute access; many also expose camelCase property aliases (e.g. bpm_confidence / bpmConfidence) for JS-parity. Shapes below show the data fields.

python
class PitchClass(IntEnum):
    C, CS, D, DS, E, F, FS, G, GS, A, AS, B

class Mode(IntEnum):
    MAJOR = 0
    MINOR = 1
    DORIAN = 2
    PHRYGIAN = 3
    LYDIAN = 4
    MIXOLYDIAN = 5
    LOCRIAN = 6

class KeyProfile(IntEnum):
    KRUMHANSL_SCHMUCKLER = 0
    TEMPERLEY = 1
    SHAATH = 2
    FARALDO_EDMT = 3
    FARALDO_EDMA = 4
    FARALDO_EDMM = 5
    BELLMAN_BUDGE = 6

class Key:
    root: PitchClass
    mode: Mode
    confidence: float
    name: str          # property -> "C major", "A minor"
    short_name: str    # property -> "C", "Am"

class TimeSignature:
    numerator: int
    denominator: int
    confidence: float

class AnalysisResult:
    bpm: float
    bpm_confidence: float
    key: Key
    time_signature: TimeSignature
    beat_times: list[float]
    beat_strengths: list[float]    # per-beat strength
    beats: list[Beat]              # property: per-beat objects with strength
    chords: list[Chord]
    sections: list[Section]
    timbre: AnalysisTimbre | None
    dynamics: AnalysisDynamics | None
    rhythm: AnalysisRhythm | None
    melody: AnalysisMelody | None
    form: str
    # The focused detect_chords() / analyze_sections() / analyze_timbre() / ...
    # functions remain useful for a single facet or per-call options.

class HpssResult:
    harmonic: list[float]
    percussive: list[float]
    length: int
    sample_rate: int

class StftResult:
    n_bins: int
    n_frames: int
    n_fft: int
    hop_length: int
    sample_rate: int
    magnitude: list[float]   # n_bins × n_frames, row-major
    power: list[float]       # n_bins × n_frames, row-major

class MelSpectrogramResult:
    n_mels: int
    n_frames: int
    sample_rate: int
    hop_length: int
    power: list[float]       # n_mels × n_frames, row-major
    db: list[float]          # n_mels × n_frames, row-major

class MfccResult:
    n_mfcc: int
    n_frames: int
    coefficients: list[float]  # n_mfcc × n_frames, row-major

class ChromaResult:
    n_chroma: int
    n_frames: int
    sample_rate: int
    hop_length: int
    features: list[float]    # n_chroma × n_frames, row-major
    mean_energy: list[float] # n_chroma values

class PitchResult:
    n_frames: int
    f0: list[float]          # Fundamental frequency per frame (Hz)
    voiced_prob: list[float] # Voicing probability per frame (0–1)
    voiced_flag: list[bool]  # Voiced/unvoiced decision per frame
    median_f0: float
    mean_f0: float

class WaveformPeaksReport:
    min: NDArray[np.float32]   # channel-major: channel * bucket_count + bucket
    max: NDArray[np.float32]   # channel-major
    channels: int
    bucket_count: int
    samples_per_bucket: int

class StreamConfig:
    sample_rate: int = 44100
    n_fft: int = 2048
    hop_length: int = 512
    n_mels: int = 128
    fmin: float = 0.0
    fmax: float = 0.0
    tuning_ref_hz: float = 440.0
    compute_magnitude: bool = False
    compute_mel: bool = True
    compute_chroma: bool = True
    compute_onset: bool = True
    compute_spectral: bool = True
    emit_every_n_frames: int = 1
    magnitude_downsample: int = 1
    key_update_interval_sec: float = 5.0
    bpm_update_interval_sec: float = 10.0
    window: int = 0          # 0=Hann, 1=Hamming, 2=Blackman, 3=Rectangular
    output_format: int = 0  # 0=Float32, 1=Int16, 2=Uint8

class StreamFrames:
    n_frames: int
    n_mels: int
    timestamps: list[float]
    mel: list[float]        # n_frames × n_mels, row-major
    chroma: list[float]     # n_frames × 12, row-major
    onset_strength: list[float]
    rms_energy: list[float]
    spectral_centroid: list[float]
    spectral_flatness: list[float]
    chord_root: list[int]
    chord_quality: list[int]
    chord_confidence: list[float]

class StreamChordChange:
    root: int
    quality: int
    start_time: float
    confidence: float

class StreamBarChord:
    bar_index: int
    root: int
    quality: int
    start_time: float
    confidence: float

class StreamPatternScore:
    name: str
    score: float

class StreamStats:
    total_frames: int
    total_samples: int
    duration_seconds: float
    bpm: float
    bpm_confidence: float
    bpm_candidate_count: int
    key: int
    key_minor: bool
    key_confidence: float
    chord_root: int
    chord_quality: int
    chord_confidence: float
    chord_start_time: float
    current_bar: int
    bar_duration: float
    chord_progression: list[StreamChordChange]
    bar_chord_progression: list[StreamBarChord]
    voted_pattern: list[StreamBarChord]
    pattern_length: int
    detected_pattern_name: str
    detected_pattern_score: float
    all_pattern_scores: list[StreamPatternScore]
    accumulated_seconds: float
    used_frames: int
    updated: bool

Additional Python result classes used by focused APIs:

AreaClasses
MeteringClippingRegion, StreamFramesU8, StreamFramesI16, WaveformPeaksReport
MasteringMasteringResult, MasteringStereoResult
MixingMixerStereoResult
ProjectsAssistSidecar (return type of project.get_assist_sidecar(index) / project.assist_sidecars() — see Project Editing), NotePairValidation
Realtime engine telemetryMeterTelemetryRecord

Streaming Analysis API

Use StreamAnalyzer when audio arrives in blocks: live capture, a callback loop, a long file you do not want to analyze all at once, or a visualization that needs frame-by-frame features. It keeps a small internal buffer, emits mel/chroma/onset/spectral frames, and periodically updates BPM, key, chord, bar, and pattern estimates.

python
import libsonare as sonare

stream = sonare.StreamAnalyzer(
    sonare.StreamConfig(
        sample_rate=44100,
        n_mels=64,
        emit_every_n_frames=4,
        output_format=0,  # 0=Float32, 1=Int16, 2=Uint8
    )
)

for block in audio_blocks:
    stream.process(block)

    frames = stream.read_frames(stream.available_frames())
    # frames.mel is flattened [n_frames * n_mels]
    # frames.chroma is flattened [n_frames * 12]

    stats = stream.stats()
    if stats.bpm > 0:
        print(stats.bpm, stats.bpm_confidence)

stream.close()

For lower-bandwidth UI transfer, use a quantized read instead of read_frames(max_frames):

MethodWhat changes
read_frames_u8(max_frames, quantize_config?)Feature arrays are quantized to unsigned 8-bit values.
read_frames_i16(max_frames, quantize_config?)Feature arrays are quantized to signed 16-bit values.

quantize_config is an optional QuantizeConfig (exported from libsonare) that widens the quantization ranges for streams much louder or quieter than the defaults; omit it to use the defaults. Its fields and defaults are mel_db_min=-80.0, mel_db_max=0.0, onset_max=50.0, rms_max=1.0, centroid_max=11025.0. The quantizers clamp normalized values to [0, 1], so a signal outside these ranges otherwise saturates silently to the endpoints. This mirrors StreamQuantizeConfig in the JS/WASM streaming docs.

Both return timestamps as floats. If you synchronize against an external audio clock, feed chunks with process_with_offset(samples, sample_offset) so returned timestamps follow that timeline.

Streaming Equalizer API

StreamingEqualizer wraps the native block-by-block EQ engine. Use it for live preview, processor UIs, or matching a source tone to a reference without assembling a mastering chain.

python
with sonare.StreamingEqualizer(sample_rate=48000, max_block_size=512) as eq:
    eq.set_band(0, {"type": "bell", "frequencyHz": 2500, "gainDb": 2.5, "q": 1.0})
    eq.set_phase_mode("natural")
    eq.set_auto_gain(True)
    eq.match(source_samples, reference_samples, max_bands=8)
    out = eq.process_mono(input_block)
    snapshot = eq.spectrum()

Bands can be Python dictionaries or JSON strings. set_phase_mode(...) accepts zero / natural / linear names or numeric values. The wrapper also exposes output gain/pan, sidechain input for dynamic bands, process_stereo(...), spectrum(), latency_samples, and last_auto_gain_db.

Mastering API

Python exposes the same named mastering processors as the browser demo. Use the name-list helpers to inspect the active build, then call mono, stereo, pair, or analysis APIs with explicit parameters.

python
import json
import libsonare as sonare

print(sonare.mastering_processor_names())
# e.g. ['dynamics.compressor', 'eq.parametric', 'spectral.airBand', 'stereo.imager', ...]

result = sonare.mastering_process(
    "spectral.airBand",
    samples,
    sample_rate=sample_rate,
    params={
        "amount": 0.4,
        "shelfFrequencyHz": 14000,
    },
)

report = sonare.mastering_stereo_analyze(
    "stereo.monoCompatCheck",
    left,
    right,
    sample_rate=sample_rate,
)
print(json.loads(report))

catalog = sonare.mastering_processor_catalog()
insert_params = sonare.mastering_insert_param_info("eq.parametric")

# Preset-driven chain (one-shot)
sonare.mastering_preset_names()
# -> ['pop', 'edm', 'acoustic', 'hipHop', 'aiMusic', 'speech', 'streaming', 'youtube', 'broadcast', 'podcast', 'audiobook', 'cinema', 'jpop', 'ambient', 'lofi', 'classical', 'drumAndBass', 'techno', 'metal', 'trap', 'rnb', 'jazz', 'kpop', 'trance', 'gameOst']
chain_result = sonare.master_audio(
    samples,
    sample_rate=sample_rate,
    preset_name="aiMusic",
    overrides={
        "loudness.targetLufs": -13,
        "maximizer.truePeakLimiter.releaseMs": 50,
        "maximizer.truePeakLimiter.applyGainAtInputRate": False,
    },
)
print(chain_result.output_lufs, chain_result.applied_gain_db)

# Block-by-block streaming variant
with sonare.StreamingMasteringChain({
    "eq.tilt.tiltDb": 0.5,
    "dynamics.compressor.thresholdDb": -20.0,
}) as chain:
    chain.prepare(sample_rate=48000, max_block_size=512, num_channels=1)
    out_block = chain.process_mono([0.0] * 512)

profile = json.loads(sonare.mastering_audio_profile(samples, sample_rate=sample_rate, params={
    "n_fft": 2048,
    "hop_length": 512,
    "true_peak_oversample": 4,
}))
suggestions = json.loads(sonare.mastering_assistant_suggest(samples, sample_rate=sample_rate, params={
    "target_lufs": -14,
    "ceiling_db": -1,
    "prefer_streaming_safe": True,
}))
preview = json.loads(sonare.mastering_streaming_preview(samples, sample_rate=sample_rate, platforms=[
    {"name": "YouTube", "targetLufs": -14, "ceilingDb": -1},
    {"name": "Podcast", "targetLufs": -16, "ceilingDb": -1},
]))

mastering_audio_profile() accepts optional profile params: n_fft, hop_length, and true_peak_oversample. mastering_assistant_suggest() accepts target_lufs, ceiling_db, enable_repair, prefer_streaming_safe, and speech_mono_amount; camelCase aliases also work through the shared native parser.

Mastering helpers also accept limiter-release and static-gain staging controls. The simple mastering() helper uses release_ms (0 keeps the 50 ms library default) and apply_gain_at_input_rate. Preset/chain overrides use the flat keys "maximizer.truePeakLimiter.releaseMs" and "maximizer.truePeakLimiter.applyGainAtInputRate"; supplied override values are applied directly.

Reference-track workflows use mastering_pair_processor_names(), mastering_pair_process(), mastering_pair_analysis_names(), and mastering_pair_analyze(). Pair inputs should use the same sample rate and comparable length.

Standalone dynamics and repair

Every named stage is also a one-shot module-level function, so you can run a single processor without assembling a chain. Parameters are keyword-only and mirror the corresponding MasteringChainConfig keys in snake_case. The dynamics processors return (processed_samples, latency_samples), where latency_samples is an int; the repair processors return processed samples (np.ndarray).

FunctionReturnsKey parameters
mastering_dynamics_compressor(samples, sample_rate?, *, ...)tuple[np.ndarray, int]threshold_db=-18.0, ratio=2.0, attack_ms=10.0, release_ms=100.0, knee_db, makeup_gain_db, auto_makeup, detector='rms', sidechain_hpf_enabled, sidechain_hpf_hz, pdr_time_ms, pdr_release_scale
mastering_dynamics_gate(samples, sample_rate?, *, ...)tuple[np.ndarray, int]threshold_db=-50.0, attack_ms=2.0, release_ms=80.0, range_db=-80.0, hold_ms, close_threshold_db, key_hpf_hz
mastering_dynamics_transient_shaper(samples, sample_rate?, *, ...)tuple[np.ndarray, int]attack_gain_db=3.0, sustain_gain_db, fast_attack_ms, fast_release_ms=20.0, slow_attack_ms=15.0, slow_release_ms=200.0, sensitivity=1.0, max_gain_db=12.0, gain_smoothing_ms, lookahead_ms
mastering_repair_declick(samples, sample_rate?, *, ...)np.ndarraythreshold=0.8, neighbor_ratio=4.0, max_click_samples=8, lpc_order=20, residual_ratio=8.0
mastering_repair_declip(samples, sample_rate?, *, ...)np.ndarrayclip_threshold=0.98, lpc_order=36, iterations=2, lpc_blend=0.65
mastering_repair_decrackle(samples, sample_rate?, *, ...)np.ndarraythreshold=0.4, mode='median', levels=4
mastering_repair_dehum(samples, sample_rate?, *, ...)np.ndarrayfundamental_hz=50.0, harmonics=4, q=20.0, adaptive, search_range_hz, adaptation, frame_size, pll_bandwidth
mastering_repair_denoise_classical(samples, sample_rate?, *, ...)np.ndarraymode='logMmse', noise_estimator='quantile', n_fft=1024, hop_length=256, dd_alpha=0.98, gain_floor=0.05, over_subtraction=2.0, spectral_floor=0.05, noise_estimation_quantile=0.1, speech_presence_gain, gain_smoothing
mastering_repair_dereverb_classical(samples, sample_rate?, *, ...)np.ndarraythreshold=0.05, attenuation=0.5, n_fft=1024, hop_length=256, t60_sec=0.4, late_delay_ms=50.0, over_subtraction, spectral_floor, wpe_enabled, wpe_iterations, wpe_taps, wpe_strength
mastering_repair_trim_silence(samples, sample_rate?, *, ...)np.ndarraythreshold=0.001, padding_samples=0, mode='peak', gate_lufs=-60.0, window_ms=400.0

The repair stages are offline-only and are rejected by StreamingMasteringChain — run them with these one-shot helpers or inside mastering_chain* / master_audio*. See Dynamics and Repair.

Progress callbacks

mastering_chain(), mastering_chain_stereo(), master_audio(), and master_audio_stereo() accept an optional on_progress=callable keyword.

The callback receives (progress: float, stage: str) after each stage:

ValueMeaning
progressOverall progress from 0.0 to 1.0.
stageThe named processor that just completed, such as eq.tilt, dynamics.compressor, or loudness.targetLufs.

Use it to drive UI progress bars or to log per-stage timing.

python
def on_step(progress: float, stage: str) -> None:
    print(f"{progress:5.1%}  {stage}")

result = sonare.mastering_chain(
    samples,
    sample_rate=sample_rate,
    config={"loudness": {"targetLufs": -14, "ceilingDb": -1}},
    on_progress=on_step,
)

The named mastering API families are:

PurposeFunction
Apply simple loudness masteringmastering()
List built-in mastering presetsmastering_preset_names()
Apply a preset to mono audiomaster_audio()
Apply a preset to stereo audiomaster_audio_stereo()
Run a full mono chainmastering_chain()
Run a full stereo chainmastering_chain_stereo()
Run a streaming chain (block-by-block)StreamingMasteringChain
Generate an audio profile for mastering decisionsmastering_audio_profile()
Generate assistant suggestions from source analysismastering_assistant_suggest()
Preview delivery loudness by platformmastering_streaming_preview()
List mono/stereo processorsmastering_processor_names()
Get machine-readable processor classificationsmastering_processor_catalog()
List chain insert processorsmastering_insert_names()
List the parameter keys an insert acceptsmastering_insert_param_names(name)
List realtime-automatable insert parametersmastering_insert_param_info(name)
Process mono audiomastering_process()
Process stereo audiomastering_process_stereo()
List pair processorsmastering_pair_processor_names()
Process source/reference pairmastering_pair_process()
List pair analysesmastering_pair_analysis_names()
Analyze source/reference pairmastering_pair_analyze()
List stereo analysesmastering_stereo_analysis_names()
Analyze stereo channelsmastering_stereo_analyze()

Related mastering guides: Preset selection, Delivery targets, Meter reading, Quality checklist.

Mixing API

Python also exposes the libsonare mixing engine. Use mix_stereo(...) for one-shot stem rendering, or keep a Mixer loaded from scene JSON when you need sends, buses, automation, meters, and scene serialization. List the built-in scene presets with mixing_scene_preset_names().

python
import libsonare as sonare

print(sonare.mixing_scene_preset_names())
scene_json = sonare.mixing_scene_preset_json("vocalReverbSend")

offline = sonare.mix_stereo(
    [(vocal_l, vocal_r), (music_l, music_r)],
    sample_rate=48000,
    input_trim_db=[3, 0],
    fader_db=[-3, -12],
    pan=[0, -0.2],
    width=[1, 0.9],
)

# Mixer is not a context manager — call close() when done.
mixer = sonare.Mixer.from_scene_json(scene_json, sample_rate=48000, block_size=512)
try:
    print(mixer.scene_warnings())  # non-fatal: insert params no processor reads (typos)
    block = mixer.process_stereo([vocal_block_l, music_block_l], [vocal_block_r, music_block_r])
    meter = mixer.strip_meter(0, tap="postFader")
    mixer.schedule_fader_automation(0, 48000 * 8, -6, curve="s-curve")
finally:
    mixer.close()

mixer.process_stereo(...) returns a MixerStereoResult named tuple with .left and .right (list[float]) and .sample_rate (int), mirroring the Node/WASM {left, right, sampleRate} shape.

See Mixing Engine for routing concepts, scene presets, and real-time notes.

Projects, Instruments & Live MIDI

The headless-DAW surface is available in Python as well: author arrangements with Project, render them through the built-in instruments, and drive the realtime engine with live MIDI. The dedicated guides carry the depth — this is the Python entry-point map.

TaskAPIGuide
Author tracks, clips, tempo, markers, undo/redoProject (a context manager — use with)Project Editing
Render MIDI through the built-in synthesizerProject.bounce_with_synth_instrument(...), synth_preset_names(), synth_preset_patch(name), SynthPatchBuilt-in Synthesizer, Bouncing Projects
Render MIDI through a SoundFontProject.load_soundfont(data), Project.bounce_with_sf2_instrument(...)SoundFont Player
Host your own instrument during a bounceProject.bounce_with_instruments(...) with the ExternalInstrument protocol — a render(channels, num_frames) callback plus optional prepare/on_event hooks and latency_samples. Python-only.Bouncing Projects
Play instruments live from MIDI eventsRealtimeEngine.set_synth_instrument(...), RealtimeEngine.load_soundfont(...), plus the engine's MIDI input queueMIDI Input
Schedule MIDI clips into the live engine, sample-accuratelyRealtimeEngine.set_midi_clips([...]) with EngineMidiClipSchedule / EngineMidiEvent, RealtimeEngine.sample_at_ppq(ppq)Realtime and Streaming
Mix the engine's tracks live with lanes, buses, sends, and stripsRealtimeEngine.set_track_lanes(...), set_track_buses(...), set_track_strip_json(...), set_master_strip_json(...), set_bus_strip_json(...), set_solo_mute(...), set_track_strip_pan(...), set_track_strip_pan_law(...), set_track_strip_pan_mode(...), set_track_strip_dual_pan(...), set_track_strip_channel_delay_samples(...), set_track_strip_insert_param_by_name(...), set_master_strip_insert_param_by_name(...), drain_meter_telemetry_wide(...), configure_scope_telemetry(...), drain_scope_telemetry(...)Realtime and Streaming
python
import libsonare as sonare

with sonare.Project() as project:
    project.set_sample_rate(48000)
    track = project.add_track(kind="midi")
    # ... add clips and MIDI events (see the Project Editing guide) ...
    audio = project.bounce_with_synth_instrument("e-piano", num_channels=2)

Note that Project supports with for automatic cleanup, while Mixer does not (call mixer.close() explicitly).

For synth preset introspection, synth_preset_patch(name) returns a named catalog preset as a SynthPatch (it raises SonareError for unknown names and accepts a 'va:' routing prefix) so you can inspect and tweak fields before binding it. synth_enum_tables() returns the runtime enum-name tables (dict[str, tuple[str, ...]]) for validating SynthModRouting source/destination names against the loaded build.

Opaque assist sidecars

Project can carry per-project, undoable, module-owned opaque byte blobs (assist sidecars), scoped by module ID, target track, and a region. Set one with project.set_assist_sidecar(module_id, payload, *, schema_version=0, target_track_id=0, region_start_ppq=0.0, region_end_ppq=0.0); read them back with project.assist_sidecar_count(), project.get_assist_sidecar(index) -> AssistSidecar, and project.assist_sidecars(). See Project Editing for the cross-binding details.