Architecture

This document describes the internal architecture of libsonare.

Read this page once you are comfortable with the Getting Started guide and your language's runtime page. It is an internal map for people extending libsonare or wiring it into a larger system, not a tutorial — if you only need to call an API, start with Getting Started. It shows how public APIs connect to the C++ core.

How to read the layers

The outer API layers are what apps call. The core and feature layers are where reusable signal-processing work happens. Bindings should stay thin: they translate language shapes into the same C++ behavior rather than reimplementing DSP.

What You Will Learn

By the end of this page you should be able to:

trace a public call from WASM, C, quick API, or sonare.h into analysis, streaming, effects, mastering, mixing, and engine modules;
identify which source directories own each subsystem;
understand where shared DSP, feature extraction, realtime processing, and language bindings meet;
decide whether a change belongs in a core module, a binding wrapper, a demo component, or documentation.

Module Overview

Page Map

If you are looking at...	Read...
`analysis/` and `feature/`	JavaScript API, Python API, librosa Compatibility
`analysis/acoustic_analyzer.`, `analysis/room_estimator.`, `src/acoustic/`, or `effects/acoustic/`	Room Acoustics, Algorithm References
`streaming/`	Realtime and Streaming
`mastering/`	Mastering Processors, DSP Implementation Notes, Mastering Assistant
`mixing/`	Mixing Engine, Mixing Scene JSON
`engine/`, `transport/`, `automation/`, `graph/`, `rt/`	Realtime and Streaming, especially `RealtimeEngine`
`editing/` and `effects/`	Editing DSP, DSP Implementation Notes
`sonare_c*.h` and binding folders	Binding Parity, Native Bindings, C++ API

Directory Structure

src/
├── util/               # Level 0: Basic utilities
│   ├── types.h         # MatrixView, ErrorCode, enums
│   ├── exception.h     # SonareException
│   └── math_utils.h    # mean, variance, argmax, etc.
│
├── core/               # Level 1-3: Core DSP
│   ├── convert.h       # Hz/Mel/MIDI conversion
│   ├── window.h        # Hann, Hamming, Blackman
│   ├── fft.h           # KissFFT wrapper
│   ├── spectrum.h      # STFT/iSTFT
│   ├── audio.h         # Audio buffer
│   ├── audio_io.h      # WAV/MP3 loading, optional FFmpeg-backed formats
│   └── resample.h      # r8brain resampling
│
├── filters/            # Level 4: Filterbanks
│   ├── mel.h           # Mel filterbank
│   ├── chroma.h        # Chroma filterbank
│   ├── dct.h           # DCT for MFCC
│   └── iir.h           # IIR filters
│
├── feature/            # Level 4: Feature extraction
│   ├── mel_spectrogram.h
│   ├── chroma.h
│   ├── cqt.h
│   ├── vqt.h
│   ├── inverse.h
│   ├── spectral.h
│   ├── onset.h
│   └── pitch.h
│
├── effects/            # Level 5: Audio effects
│   ├── hpss.h
│   ├── phase_vocoder.h
│   ├── time_stretch.h
│   ├── pitch_shift.h
│   ├── normalize.h
│   ├── preemphasis.h
│   ├── silence.h
│   ├── decompose.h
│   ├── remix.h
│   ├── delay/ modulation/ reverb/
│   ├── acoustic/       # room_morph
│   └── common/
│
├── acoustic/           # Geometric room acoustics
│   ├── room_model.* room_types.* material.*
│   ├── image_source.*  # early reflections
│   ├── late_reverb.*   # deterministic late tail
│   └── rir_synthesizer.*
│
├── analysis/           # Level 6: Music analysis
│   ├── music_analyzer.h
│   ├── bpm_analyzer.h
│   ├── key_analyzer.h
│   ├── beat_analyzer.h
│   ├── downbeat_analyzer.h
│   ├── meter_analyzer.h
│   ├── chord_analyzer.h
│   ├── section_analyzer.h
│   ├── boundary_detector.h
│   ├── melody_analyzer.h
│   ├── rhythm_analyzer.h
│   ├── timbre_analyzer.h
│   ├── dynamics_analyzer.h
│   ├── acoustic_analyzer.h
│   ├── room_estimator.h
│   └── ...
│
├── streaming/          # Level 6: Real-time streaming
│   ├── stream_analyzer.h   # Main streaming analyzer
│   ├── stream_config.h     # Configuration options
│   └── stream_frame.h      # Frame and buffer types
│
├── mastering/          # Mastering engine
│   ├── api/            # Chain, registry, 25 presets, 76 solo processors + pair/stereo registries
│   ├── eq/ dynamics/ spectral/ stereo/ final/
│   ├── maximizer/ multiband/ saturation/ repair/
│   ├── match/ assistant/                 # Reference match + assistant/profile
│   └── common/        # Shared biquad/loudness helpers
│
├── mixing/             # Mixing engine
│   ├── channel_strip.* # Strip: trim/insert/pan/width/sends
│   ├── bus.* sends.* vca_group.* panner.*
│   └── api/            # Scene JSON + scene presets
│
├── engine/             # Realtime engine (transport/clips/graph)
├── automation/         # Automation lanes + curve shapes
├── metering/           # LUFS, true-peak, phase scope/goniometer
├── graph/  rt/  transport/   # DSP graph, RT-safe primitives, transport
├── editing/            # Pitch editor, voice changer, note stretch
│
├── quick.h             # Simple function API
├── sonare.h            # Unified include header
├── sonare_c*.h         # C API aggregate and module headers
└── wasm/
    └── bindings.cpp    # Embind bindings

Data Flow

Audio Analysis Pipeline

Audio Effects Pipeline

What is a phase vocoder?

A phase vocoder is the standard way to time-stretch audio (or, combined with resampling, pitch-shift it) without obvious artifacts. It takes the STFT and advances the phase of each frequency bin to fit the new timeline before reconstructing, so a sound can be made longer or shorter while its pitch and spectral character stay intact. libsonare uses it for timeStretch / pitchShift and the editing-DSP voice tools.

PARAM SWEEP · TIME STRETCHIDLE

Time stretch — changing length, not pitch

Streaming Pipeline

The streaming pipeline processes audio in real time, maintaining overlap state between chunks.

Progressive Estimation

As more audio streams in, the pipeline accumulates chroma and onset data, so its BPM/key estimates have more evidence to work from. Estimates are refreshed periodically (default: BPM every 10s, key every 5s) and grow more confident the longer the stream runs.

Key Design Decisions

Lazy Initialization

MusicAnalyzer initializes sub-analyzers on demand. Each intermediate (STFT, chroma, onset envelope, etc.) is computed the first time it's needed and reused afterwards.

cpp

// BPM only (computes onset envelope)
float bpm = analyzer.bpm();

// Key detection triggers chroma computation
Key key = analyzer.key();

// Full analysis fills in what's left
AnalysisResult result = analyzer.analyze();

Why this matters

Asking just for the key does not force chord recognition or section detection to run. Conversely, calling analyze() once reuses any intermediates already computed — no redundant FFTs.

Zero-Copy Audio Slicing

Audio uses shared_ptr with offset/size for zero-copy slicing:

cpp

auto full = Audio::from_file("song.mp3");

// Both share same underlying buffer
auto intro = full.slice(0, 30);     // 0-30 sec
auto chorus = full.slice(60, 90);   // 60-90 sec

WASM Compatibility

"Decoded samples" means raw audio amplitude values (a Float32Array), not the bytes of a .mp3 or .wav file — decoding is the step that turns the compressed file into those values. Most WASM calls expect samples that are already decoded.

The npm/WebAssembly package exposes mostly sample-based APIs. Most calls expect decoded mono Float32Array samples. For encoded bytes, Audio.fromMemory(...) decodes WAV/MP3 in memory, while Audio.fromMemoryWithBrowserFallback(...) can fall back to the Web Audio API or another browser codec path before calling the same sample-based methods.

WASM builds avoid native file I/O and FFmpeg-backed decoding. Runtime behavior is single-threaded unless a future build explicitly enables browser threading.

librosa Compatibility

Many DSP defaults intentionally mirror common librosa values, but libsonare is not a drop-in replacement. In particular, libsonare usually requires the caller to provide the sample rate; it does not implicitly resample to 22050 Hz the way librosa.load() does by default.

Parameter	Default
sample_rate	User-provided
n_fft	2048
hop_length	512
n_mels	128
fmin	0
fmax	sr/2

Third-Party Libraries

Library	Purpose	License
KissFFT	FFT	BSD-3-Clause
Eigen3	Matrix ops	MPL-2.0
dr_libs	WAV decode	Public Domain
minimp3	MP3 decode	CC0-1.0
FFmpeg	Optional extended file decoding	LGPL/GPL depending on linked build
r8brain	Resampling	MIT

WASM Compilation

Output: ~2,986 KB WASM (~1,070 KB gzipped) plus the JS glue;
        ~3,210 KB total (~1,121 KB gzipped) — see src/wasm/meta.json
Build:  Emscripten with Embind
Flags:  -sWASM=1 -sMODULARIZE=1 -sEXPORT_ES6=1

The full mastering + mixing + analysis surface accounts for the bundle size; an analysis-only build would be smaller.

Glossary

Foundations

Analysis Guides

Mixing Guides

Editing Guides

Instruments and MIDI

Arrangement and Projects

Realtime Guides

Room Acoustics

Mastering Concepts

Mastering Guides

Architecture

What You Will Learn

Module Overview

Page Map

Directory Structure

Data Flow

Audio Analysis Pipeline

Audio Effects Pipeline

Streaming Pipeline

Key Design Decisions

Lazy Initialization

Zero-Copy Audio Slicing

WASM Compatibility

librosa Compatibility

Third-Party Libraries

WASM Compilation

Architecture ​

What You Will Learn ​

Module Overview ​

Page Map ​

Directory Structure ​

Data Flow ​

Audio Analysis Pipeline ​

Audio Effects Pipeline ​

Streaming Pipeline ​

Key Design Decisions ​

Lazy Initialization ​

Zero-Copy Audio Slicing ​

WASM Compatibility ​

librosa Compatibility ​

Third-Party Libraries ​

WASM Compilation ​

Architecture

What You Will Learn

Module Overview

Page Map

Directory Structure

Data Flow

Audio Analysis Pipeline

Audio Effects Pipeline

Streaming Pipeline

Key Design Decisions

Lazy Initialization

Zero-Copy Audio Slicing

WASM Compatibility

librosa Compatibility

Third-Party Libraries

WASM Compilation