Skip to content

Architecture

This document describes the internal architecture of libsonare.

Read this page once you are comfortable with the Getting Started guide and your language's runtime page. It is an internal map for people extending libsonare or wiring it into a larger system, not a tutorial — if you only need to call an API, start with Getting Started. It shows how public APIs connect to the C++ core.

How to read the layers

The outer API layers are what apps call. The core and feature layers are where reusable signal-processing work happens. Bindings should stay thin: they translate language shapes into the same C++ behavior rather than reimplementing DSP.

What You Will Learn

By the end of this page you should be able to:

  • trace a public call from WASM, C, quick API, or sonare.h into analysis, streaming, effects, mastering, mixing, and engine modules;
  • identify which source directories own each subsystem;
  • understand where shared DSP, feature extraction, realtime processing, and language bindings meet;
  • decide whether a change belongs in a core module, a binding wrapper, a demo component, or documentation.

Module Overview

Page Map

If you are looking at...Read...
analysis/ and feature/JavaScript API, Python API, librosa Compatibility
analysis/acoustic_analyzer.*, analysis/room_estimator.*, src/acoustic/, or effects/acoustic/Room Acoustics, Algorithm References
streaming/Realtime and Streaming
mastering/Mastering Processors, DSP Implementation Notes, Mastering Assistant
mixing/Mixing Engine, Mixing Scene JSON
engine/, transport/, automation/, graph/, rt/Realtime and Streaming, especially RealtimeEngine
editing/ and effects/Editing DSP, DSP Implementation Notes
sonare_c*.h and binding foldersBinding Parity, Native Bindings, C++ API

Directory Structure

src/
├── util/               # Level 0: Basic utilities
│   ├── types.h         # MatrixView, ErrorCode, enums
│   ├── exception.h     # SonareException
│   └── math_utils.h    # mean, variance, argmax, etc.

├── core/               # Level 1-3: Core DSP
│   ├── convert.h       # Hz/Mel/MIDI conversion
│   ├── window.h        # Hann, Hamming, Blackman
│   ├── fft.h           # KissFFT wrapper
│   ├── spectrum.h      # STFT/iSTFT
│   ├── audio.h         # Audio buffer
│   ├── audio_io.h      # WAV/MP3 loading, optional FFmpeg-backed formats
│   └── resample.h      # r8brain resampling

├── filters/            # Level 4: Filterbanks
│   ├── mel.h           # Mel filterbank
│   ├── chroma.h        # Chroma filterbank
│   ├── dct.h           # DCT for MFCC
│   └── iir.h           # IIR filters

├── feature/            # Level 4: Feature extraction
│   ├── mel_spectrogram.h
│   ├── chroma.h
│   ├── cqt.h
│   ├── vqt.h
│   ├── inverse.h
│   ├── spectral.h
│   ├── onset.h
│   └── pitch.h

├── effects/            # Level 5: Audio effects
│   ├── hpss.h
│   ├── phase_vocoder.h
│   ├── time_stretch.h
│   ├── pitch_shift.h
│   ├── normalize.h
│   ├── preemphasis.h
│   ├── silence.h
│   ├── decompose.h
│   ├── remix.h
│   ├── delay/ modulation/ reverb/
│   ├── acoustic/       # room_morph
│   └── common/

├── acoustic/           # Geometric room acoustics
│   ├── room_model.* room_types.* material.*
│   ├── image_source.*  # early reflections
│   ├── late_reverb.*   # deterministic late tail
│   └── rir_synthesizer.*

├── analysis/           # Level 6: Music analysis
│   ├── music_analyzer.h
│   ├── bpm_analyzer.h
│   ├── key_analyzer.h
│   ├── beat_analyzer.h
│   ├── downbeat_analyzer.h
│   ├── meter_analyzer.h
│   ├── chord_analyzer.h
│   ├── section_analyzer.h
│   ├── boundary_detector.h
│   ├── melody_analyzer.h
│   ├── rhythm_analyzer.h
│   ├── timbre_analyzer.h
│   ├── dynamics_analyzer.h
│   ├── acoustic_analyzer.h
│   ├── room_estimator.h
│   └── ...

├── streaming/          # Level 6: Real-time streaming
│   ├── stream_analyzer.h   # Main streaming analyzer
│   ├── stream_config.h     # Configuration options
│   └── stream_frame.h      # Frame and buffer types

├── mastering/          # Mastering engine
│   ├── api/            # Chain, registry, 25 presets, 76 solo processors + pair/stereo registries
│   ├── eq/ dynamics/ spectral/ stereo/ final/
│   ├── maximizer/ multiband/ saturation/ repair/
│   ├── match/ assistant/                 # Reference match + assistant/profile
│   └── common/        # Shared biquad/loudness helpers

├── mixing/             # Mixing engine
│   ├── channel_strip.* # Strip: trim/insert/pan/width/sends
│   ├── bus.* sends.* vca_group.* panner.*
│   └── api/            # Scene JSON + scene presets

├── engine/             # Realtime engine (transport/clips/graph)
├── automation/         # Automation lanes + curve shapes
├── metering/           # LUFS, true-peak, phase scope/goniometer
├── graph/  rt/  transport/   # DSP graph, RT-safe primitives, transport
├── editing/            # Pitch editor, voice changer, note stretch

├── quick.h             # Simple function API
├── sonare.h            # Unified include header
├── sonare_c*.h         # C API aggregate and module headers
└── wasm/
    └── bindings.cpp    # Embind bindings

Data Flow

Audio Analysis Pipeline

Audio Effects Pipeline

What is a phase vocoder?

A phase vocoder is the standard way to time-stretch audio (or, combined with resampling, pitch-shift it) without obvious artifacts. It takes the STFT and advances the phase of each frequency bin to fit the new timeline before reconstructing, so a sound can be made longer or shorter while its pitch and spectral character stay intact. libsonare uses it for timeStretch / pitchShift and the editing-DSP voice tools.

PARAM SWEEP · TIME STRETCHIDLE
Time stretch — changing length, not pitch

Time stretching is pitch shift's exact opposite: it changes how long the audio lasts while leaving the pitch alone. Drag the rate and the drum hits spread out or bunch up — the waveform fills more or less of the panel — but the spectrum below barely moves. Below 1.0 the clip slows down and grows; above 1.0 it speeds up and shrinks. Press play to hear the groove change tempo with no chipmunk effect.

Rate
1 ×

Streaming Pipeline

The streaming pipeline processes audio in real time, maintaining overlap state between chunks.

Progressive Estimation

As more audio streams in, the pipeline accumulates chroma and onset data, so its BPM/key estimates have more evidence to work from. Estimates are refreshed periodically (default: BPM every 10s, key every 5s) and grow more confident the longer the stream runs.

Key Design Decisions

Lazy Initialization

MusicAnalyzer initializes sub-analyzers on demand. Each intermediate (STFT, chroma, onset envelope, etc.) is computed the first time it's needed and reused afterwards.

cpp
// BPM only (computes onset envelope)
float bpm = analyzer.bpm();

// Key detection triggers chroma computation
Key key = analyzer.key();

// Full analysis fills in what's left
AnalysisResult result = analyzer.analyze();

Why this matters

Asking just for the key does not force chord recognition or section detection to run. Conversely, calling analyze() once reuses any intermediates already computed — no redundant FFTs.

Zero-Copy Audio Slicing

Audio uses shared_ptr with offset/size for zero-copy slicing:

cpp
auto full = Audio::from_file("song.mp3");

// Both share same underlying buffer
auto intro = full.slice(0, 30);     // 0-30 sec
auto chorus = full.slice(60, 90);   // 60-90 sec

WASM Compatibility

"Decoded samples" means raw audio amplitude values (a Float32Array), not the bytes of a .mp3 or .wav file — decoding is the step that turns the compressed file into those values. Most WASM calls expect samples that are already decoded.

The npm/WebAssembly package exposes mostly sample-based APIs. Most calls expect decoded mono Float32Array samples. For encoded bytes, Audio.fromMemory(...) decodes WAV/MP3 in memory, while Audio.fromMemoryWithBrowserFallback(...) can fall back to the Web Audio API or another browser codec path before calling the same sample-based methods.

WASM builds avoid native file I/O and FFmpeg-backed decoding. Runtime behavior is single-threaded unless a future build explicitly enables browser threading.

librosa Compatibility

Many DSP defaults intentionally mirror common librosa values, but libsonare is not a drop-in replacement. In particular, libsonare usually requires the caller to provide the sample rate; it does not implicitly resample to 22050 Hz the way librosa.load() does by default.

ParameterDefault
sample_rateUser-provided
n_fft2048
hop_length512
n_mels128
fmin0
fmaxsr/2

Third-Party Libraries

LibraryPurposeLicense
KissFFTFFTBSD-3-Clause
Eigen3Matrix opsMPL-2.0
dr_libsWAV decodePublic Domain
minimp3MP3 decodeCC0-1.0
FFmpegOptional extended file decodingLGPL/GPL depending on linked build
r8brainResamplingMIT

WASM Compilation

Output: ~2,986 KB WASM (~1,070 KB gzipped) plus the JS glue;
        ~3,210 KB total (~1,121 KB gzipped) — see src/wasm/meta.json
Build:  Emscripten with Embind
Flags:  -sWASM=1 -sMODULARIZE=1 -sEXPORT_ES6=1

The full mastering + mixing + analysis surface accounts for the bundle size; an analysis-only build would be smaller.