Architecture
This document describes the internal architecture of libsonare.
Read this page once you are comfortable with the Getting Started guide and your language's runtime page. It is an internal map for people extending libsonare or wiring it into a larger system, not a tutorial — if you only need to call an API, start with Getting Started. It shows how public APIs connect to the C++ core.
How to read the layers
The outer API layers are what apps call. The core and feature layers are where reusable signal-processing work happens. Bindings should stay thin: they translate language shapes into the same C++ behavior rather than reimplementing DSP.
What You Will Learn
By the end of this page you should be able to:
- trace a public call from WASM, C, quick API, or
sonare.hinto analysis, streaming, effects, mastering, mixing, and engine modules; - identify which source directories own each subsystem;
- understand where shared DSP, feature extraction, realtime processing, and language bindings meet;
- decide whether a change belongs in a core module, a binding wrapper, a demo component, or documentation.
Module Overview
Page Map
| If you are looking at... | Read... |
|---|---|
analysis/ and feature/ | JavaScript API, Python API, librosa Compatibility |
analysis/acoustic_analyzer.*, analysis/room_estimator.*, src/acoustic/, or effects/acoustic/ | Room Acoustics, Algorithm References |
streaming/ | Realtime and Streaming |
mastering/ | Mastering Processors, DSP Implementation Notes, Mastering Assistant |
mixing/ | Mixing Engine, Mixing Scene JSON |
engine/, transport/, automation/, graph/, rt/ | Realtime and Streaming, especially RealtimeEngine |
editing/ and effects/ | Editing DSP, DSP Implementation Notes |
sonare_c*.h and binding folders | Binding Parity, Native Bindings, C++ API |
Directory Structure
src/
├── util/ # Level 0: Basic utilities
│ ├── types.h # MatrixView, ErrorCode, enums
│ ├── exception.h # SonareException
│ └── math_utils.h # mean, variance, argmax, etc.
│
├── core/ # Level 1-3: Core DSP
│ ├── convert.h # Hz/Mel/MIDI conversion
│ ├── window.h # Hann, Hamming, Blackman
│ ├── fft.h # KissFFT wrapper
│ ├── spectrum.h # STFT/iSTFT
│ ├── audio.h # Audio buffer
│ ├── audio_io.h # WAV/MP3 loading, optional FFmpeg-backed formats
│ └── resample.h # r8brain resampling
│
├── filters/ # Level 4: Filterbanks
│ ├── mel.h # Mel filterbank
│ ├── chroma.h # Chroma filterbank
│ ├── dct.h # DCT for MFCC
│ └── iir.h # IIR filters
│
├── feature/ # Level 4: Feature extraction
│ ├── mel_spectrogram.h
│ ├── chroma.h
│ ├── cqt.h
│ ├── vqt.h
│ ├── inverse.h
│ ├── spectral.h
│ ├── onset.h
│ └── pitch.h
│
├── effects/ # Level 5: Audio effects
│ ├── hpss.h
│ ├── phase_vocoder.h
│ ├── time_stretch.h
│ ├── pitch_shift.h
│ ├── normalize.h
│ ├── preemphasis.h
│ ├── silence.h
│ ├── decompose.h
│ ├── remix.h
│ ├── delay/ modulation/ reverb/
│ ├── acoustic/ # room_morph
│ └── common/
│
├── acoustic/ # Geometric room acoustics
│ ├── room_model.* room_types.* material.*
│ ├── image_source.* # early reflections
│ ├── late_reverb.* # deterministic late tail
│ └── rir_synthesizer.*
│
├── analysis/ # Level 6: Music analysis
│ ├── music_analyzer.h
│ ├── bpm_analyzer.h
│ ├── key_analyzer.h
│ ├── beat_analyzer.h
│ ├── downbeat_analyzer.h
│ ├── meter_analyzer.h
│ ├── chord_analyzer.h
│ ├── section_analyzer.h
│ ├── boundary_detector.h
│ ├── melody_analyzer.h
│ ├── rhythm_analyzer.h
│ ├── timbre_analyzer.h
│ ├── dynamics_analyzer.h
│ ├── acoustic_analyzer.h
│ ├── room_estimator.h
│ └── ...
│
├── streaming/ # Level 6: Real-time streaming
│ ├── stream_analyzer.h # Main streaming analyzer
│ ├── stream_config.h # Configuration options
│ └── stream_frame.h # Frame and buffer types
│
├── mastering/ # Mastering engine
│ ├── api/ # Chain, registry, 25 presets, 76 solo processors + pair/stereo registries
│ ├── eq/ dynamics/ spectral/ stereo/ final/
│ ├── maximizer/ multiband/ saturation/ repair/
│ ├── match/ assistant/ # Reference match + assistant/profile
│ └── common/ # Shared biquad/loudness helpers
│
├── mixing/ # Mixing engine
│ ├── channel_strip.* # Strip: trim/insert/pan/width/sends
│ ├── bus.* sends.* vca_group.* panner.*
│ └── api/ # Scene JSON + scene presets
│
├── engine/ # Realtime engine (transport/clips/graph)
├── automation/ # Automation lanes + curve shapes
├── metering/ # LUFS, true-peak, phase scope/goniometer
├── graph/ rt/ transport/ # DSP graph, RT-safe primitives, transport
├── editing/ # Pitch editor, voice changer, note stretch
│
├── quick.h # Simple function API
├── sonare.h # Unified include header
├── sonare_c*.h # C API aggregate and module headers
└── wasm/
└── bindings.cpp # Embind bindingsData Flow
Audio Analysis Pipeline
Audio Effects Pipeline
What is a phase vocoder?
A phase vocoder is the standard way to time-stretch audio (or, combined with resampling, pitch-shift it) without obvious artifacts. It takes the STFT and advances the phase of each frequency bin to fit the new timeline before reconstructing, so a sound can be made longer or shorter while its pitch and spectral character stay intact. libsonare uses it for timeStretch / pitchShift and the editing-DSP voice tools.
Streaming Pipeline
The streaming pipeline processes audio in real time, maintaining overlap state between chunks.
Progressive Estimation
As more audio streams in, the pipeline accumulates chroma and onset data, so its BPM/key estimates have more evidence to work from. Estimates are refreshed periodically (default: BPM every 10s, key every 5s) and grow more confident the longer the stream runs.
Key Design Decisions
Lazy Initialization
MusicAnalyzer initializes sub-analyzers on demand. Each intermediate (STFT, chroma, onset envelope, etc.) is computed the first time it's needed and reused afterwards.
// BPM only (computes onset envelope)
float bpm = analyzer.bpm();
// Key detection triggers chroma computation
Key key = analyzer.key();
// Full analysis fills in what's left
AnalysisResult result = analyzer.analyze();Why this matters
Asking just for the key does not force chord recognition or section detection to run. Conversely, calling analyze() once reuses any intermediates already computed — no redundant FFTs.
Zero-Copy Audio Slicing
Audio uses shared_ptr with offset/size for zero-copy slicing:
auto full = Audio::from_file("song.mp3");
// Both share same underlying buffer
auto intro = full.slice(0, 30); // 0-30 sec
auto chorus = full.slice(60, 90); // 60-90 secWASM Compatibility
"Decoded samples" means raw audio amplitude values (a Float32Array), not the bytes of a .mp3 or .wav file — decoding is the step that turns the compressed file into those values. Most WASM calls expect samples that are already decoded.
The npm/WebAssembly package exposes mostly sample-based APIs. Most calls expect decoded mono Float32Array samples. For encoded bytes, Audio.fromMemory(...) decodes WAV/MP3 in memory, while Audio.fromMemoryWithBrowserFallback(...) can fall back to the Web Audio API or another browser codec path before calling the same sample-based methods.
WASM builds avoid native file I/O and FFmpeg-backed decoding. Runtime behavior is single-threaded unless a future build explicitly enables browser threading.
librosa Compatibility
Many DSP defaults intentionally mirror common librosa values, but libsonare is not a drop-in replacement. In particular, libsonare usually requires the caller to provide the sample rate; it does not implicitly resample to 22050 Hz the way librosa.load() does by default.
| Parameter | Default |
|---|---|
| sample_rate | User-provided |
| n_fft | 2048 |
| hop_length | 512 |
| n_mels | 128 |
| fmin | 0 |
| fmax | sr/2 |
Third-Party Libraries
| Library | Purpose | License |
|---|---|---|
| KissFFT | FFT | BSD-3-Clause |
| Eigen3 | Matrix ops | MPL-2.0 |
| dr_libs | WAV decode | Public Domain |
| minimp3 | MP3 decode | CC0-1.0 |
| FFmpeg | Optional extended file decoding | LGPL/GPL depending on linked build |
| r8brain | Resampling | MIT |
WASM Compilation
Output: ~2,986 KB WASM (~1,070 KB gzipped) plus the JS glue;
~3,210 KB total (~1,121 KB gzipped) — see src/wasm/meta.json
Build: Emscripten with Embind
Flags: -sWASM=1 -sMODULARIZE=1 -sEXPORT_ES6=1The full mastering + mixing + analysis surface accounts for the bundle size; an analysis-only build would be smaller.