Skip to content

librosa Compatibility

This document describes how libsonare functions correspond to Python's librosa library.

How to read this page

Overview

libsonare provides many of the same MIR (music information retrieval — extracting tempo, key, pitch, and other musical features from audio) building blocks as librosa, but targets C++, Python bindings, Node.js native bindings, and WebAssembly.

It is not a drop-in replacement for librosa. APIs, defaults, and numerical details can differ.

The libsonare test suite contains reference checks against librosa 0.11 for selected features.

Task-Oriented Map

Use this table before scanning the long compatibility matrix:

If your librosa code does...Use in libsonareWatch out for
Load files and run one-off analysisAudio.from_file(...), then detect_bpm, detect_key, analyzeBrowser/WASM APIs usually take decoded PCM; use Web Audio or Audio.fromMemory* to turn encoded bytes into samples
Build spectrogram features for MLstft, melSpectrogram / mel_spectrogram, mfcc, pcenPass n_fft, hop_length, n_mels, and n_mfcc explicitly when comparing
Compute chroma or harmonic featureschroma, nnlsChroma / nnls_chroma, tonnetznnlsChroma is not a strict librosa.feature.chroma_cqt clone
Track tempo, beats, onsets, or pulseonsetEnvelope, detectOnsets, detectBeats, detectBpm, tempogram, plplibsonare returns beat/onset times in seconds for high-level detectors
Edit or transform audiohpss, hpssWithResidual / hpss_with_residual, timeStretch / time_stretch, phaseVocoder / phase_vocoder, pitchShift / pitch_shift, remix, trimSilence / trim_silencePhase-vocoder, HPSS, and resampling details differ from librosa
Convert units or reshape feature arraysframesToSamples, samplesToFrames, frameSignal, padCenter, fixLength, vectorNormalizeJS names are camelCase, Python names are snake_case
Reconstruct approximate audio from featuresmelToAudio, mfccToAudioGriffin-Lim reconstruction is lossy and phase-estimated

What is PCEN?

PCEN (Per-Channel Energy Normalization) is an alternative to log-scaling a mel spectrogram. It applies automatic gain control per frequency band, which suppresses steady background noise and evens out loudness — often improving robustness for ML and sound-event detection.

Validation Coverage

Compatibility here means "checked against reference behavior", not "the APIs are identical".

The libsonare repository has librosa comparison tests for STFT, mel/MFCC, chroma, CQT, pitch, tuning, onset/beat/tempo, tempogram/PLP, PCEN, spectral contrast/poly features/zero crossings, dB conversion, framing/sequence helpers, silence trim/split, HPSS, harmonic/decompose/NN filtering/remix/phase vocoder, tonnetz, and related utilities.

Use the tolerances below as migration guidance. They are not exact numerical guarantees for every input.

Feature Comparison

Supported Features

Core / IO

librosalibsonareNotes
librosa.load()Audio::from_file()WAV/MP3 by default; FFmpeg-supported formats in FFmpeg builds
librosa.resample()resample()librosa 0.11 defaults to soxr; libsonare uses r8brain
librosa.stft()Spectrogram::compute() / stft()Compatible defaults; small numerical differences are expected
librosa.istft()Spectrogram::to_audio()OLA reconstruction
librosa.power_to_db()powerToDb() / power_to_db()ref, amin, top_db all supported
librosa.amplitude_to_db()amplitudeToDb() / amplitude_to_db()Same parameters
librosa.db_to_power()dbToPower() / db_to_power()Inverse of power_to_db
librosa.db_to_amplitude()dbToAmplitude() / db_to_amplitude()Inverse of amplitude_to_db

Effects

librosalibsonareNotes
librosa.effects.hpss()hpss()Median filtering
librosa.effects.hpss() with residual handlinghpssWithResidual() / hpss_with_residual()Returns harmonic, percussive, and residual signals
librosa.effects.harmonic()harmonic()Harmonic component only
librosa.effects.percussive()percussive()Percussive component only
librosa.effects.time_stretch()time_stretch() / timeStretch()Phase vocoder
librosa.phase_vocoder()phaseVocoder() / phase_vocoder()Direct phase-vocoder time scaling
librosa.effects.pitch_shift()pitch_shift() / pitchShift()Time stretch plus resampling
librosa.effects.remix()remix()Reorder or concatenate interval slices; interval units are samples
librosa.effects.preemphasis()preemphasis()coef, optional zi
librosa.effects.deemphasis()deemphasis()Inverse pre-emphasis
librosa.effects.trim()trimSilence() / trim_silence()WASM/Node return { audio, startSample, endSample }; Python returns (audio, start_sample, end_sample)
librosa.effects.split()splitSilence() / split_silence()WASM/Node return a flat Int32Array; Python returns list[tuple[int, int]]

Features (spectral / pitch / chroma)

librosalibsonareNotes
librosa.feature.melspectrogram()MelSpectrogram::compute() / melSpectrogram()Slaney normalization
librosa.feature.mfcc()MelSpectrogram::mfcc() / mfcc()DCT-II; specify n_mfcc explicitly when matching librosa
librosa.feature.chroma_stft()Chroma::compute() / chroma()STFT-based; each frame is scaled so its largest of the 12 pitch-class values is 1.0 (L-infinity / max normalization), matching librosa's default norm=np.inf
librosa.feature.spectral_centroid()spectralCentroid() / spectral_centroid()Per-frame
librosa.feature.spectral_bandwidth()spectralBandwidth() / spectral_bandwidth()Per-frame
librosa.feature.spectral_rolloff()spectralRolloff() / spectral_rolloff()roll_percent supported
librosa.feature.spectral_flatness()spectralFlatness() / spectral_flatness()Per-frame
librosa.feature.spectral_contrast()spectralContrast() / spectral_contrast()Matrix shape (n_bands + 1) x n_frames
librosa.feature.poly_features()polyFeatures() / poly_features()Matrix shape (order + 1) x n_frames
librosa.feature.zero_crossing_rate()zeroCrossingRate() / zero_crossing_rate()Per-frame
librosa.zero_crossings()zeroCrossings() / zero_crossings()Zero-crossing sample indices
librosa.feature.rms()rmsEnergy() / rms_energy()Per-frame
librosa.feature.tonnetz()tonnetz()Input: row-major chromagram
librosa.cqt()cqt()Constant-Q transform magnitude
librosa.vqt()vqt()Variable-Q transform; gamma controls Q
librosa.feature.chroma_cqt() (closest)nnlsChroma() / nnls_chroma()NNLS note-activation chroma; no exact librosa equivalent
librosa.feature.chroma_cens()chromaCens() / chroma_cens()CENS (Energy Normalized Statistics) chroma; smoothed/L1-normalized
(no exact librosa equivalent)bassChroma() / bass_chroma()CQT-based low-register chroma for bass/inversion estimation
librosa.hybrid_cqt()hybridCqt() / hybrid_cqt()Hybrid CQT (CQT for low bins, pseudo-CQT for high bins)
librosa.pseudo_cqt()pseudoCqt() / pseudo_cqt()Approximate (lower-fidelity) CQT
librosa.feature.tempogram()tempogram()Autocorrelation (default) or mode='cosine' window-local cosine similarity
librosa.feature.fourier_tempogram()fourierTempogram() / fourier_tempogram()Complex Fourier tempogram
(tempo-octave-invariant variant)cyclicTempogram() / cyclic_tempogram()Cyclic tempogram; no exact librosa equivalent
librosa.feature.tempogram_ratio()tempogramRatio() / tempogram_ratio()Tempogram ratio features
librosa.pcen()pcen()time_constant, gain, bias, power, eps
librosa.pyin()pitchPyin() / pitch_pyin()Probabilistic YIN; fillNa / fill_na controls whether unvoiced f0 is NaN or 0
librosa.yin()pitchYin() / pitch_yin()YIN; fillNa / fill_na controls whether unvoiced f0 is NaN or 0
librosa.pitch_tuning()pitchTuning() / pitch_tuning()Tuning offset from frequencies
librosa.estimate_tuning()estimateTuning() / estimate_tuning()Tuning offset from audio

Decomposition and loudness

librosa / standardlibsonareNotes
librosa.decompose.decompose()decompose()NMF factor matrices from a row-major spectrogram
librosa.decompose.decompose(init=...)decomposeWithInit() / decompose_with_init()NMF with selectable init: init='random' (default) or init='nndsvd' (SVD warm start)
librosa.decompose.nn_filter()nnFilter() / nn_filter()Nearest-neighbor filtering
ITU-R BS.1770 / EBU R128lufsInterleaved() / lufs_interleaved()Multichannel integrated loudness from interleaved samples
EBU Tech 3342 LRAebur128LoudnessRange() / ebur128_loudness_range()Loudness range in LU
What are NMF and NNLS here?

NMF (non-negative matrix factorization) breaks a spectrogram into a small set of recurring spectral "templates" and how strongly each is active over time — useful for pulling apart layered sounds. NNLS (non-negative least squares) is the related fit used by nnlsChroma to estimate how much each musical note is active, giving a cleaner chroma than a raw spectrogram.

Inverse reconstruction

These mirror librosa.feature.inverse.* and use Griffin-Lim for phase, so round-trips are approximate. See Inverse Features.

librosalibsonareNotes
librosa.feature.inverse.mel_to_stft()melToStft() / mel_to_stft()Mel power → linear STFT power; custom Mel ranges / HTK round-trip via fmin/fmax/htk
(mel_to_stft + librosa.griffinlim)melToAudio() / mel_to_audio()Mel power → audio (Griffin-Lim)
librosa.feature.inverse.mfcc_to_mel()mfccToMel() / mfcc_to_mel()MFCC → mel power
librosa.feature.inverse.mfcc_to_audio()mfccToAudio() / mfcc_to_audio()MFCC → audio (Griffin-Lim)

Onset / Beat / Tempo

librosalibsonareNotes
librosa.onset.onset_strength()onsetEnvelope() / onset_envelope()Spectral flux (C++ free function is compute_onset_strength())
librosa.onset.onset_strength_multi()onsetStrengthMulti() / onset_strength_multi()Multi-band onset strength (default nBands=3); WASM/Node return { nBands, nFrames, data }
librosa.onset.onset_detect()detectOnsets() / detect_onsets()Returns onset times
librosa.beat.beat_track()BeatAnalyzer / detectBeats()DP-based
librosa.beat.tempo()BpmAnalyzer / detectBpm()Tempogram
librosa.beat.plp()plp()Predominant local pulse
librosa.util.peak_pick()peakPick() / peak_pick()Returns peak indices

Utilities

librosalibsonareNotes
librosa.util.frame()frameSignal() / frame_signal()Row-major frames
librosa.util.pad_center()padCenter() / pad_center()Pad to size, centered
librosa.util.fix_length()fixLength() / fix_length()Crop or pad
librosa.util.fix_frames()fixFrames() / fix_frames()Bound frame indices
librosa.util.normalize()vectorNormalize() / vector_normalize()norm_type: 0=inf, 1=L1, 2=L2, 3=power
librosa.frames_to_samples()framesToSamples() / frames_to_samples()Frame → sample index
librosa.samples_to_frames()samplesToFrames() / samples_to_frames()Sample → frame index
librosa.frames_to_time()framesToTime() / frames_to_time()Frame → seconds
librosa.time_to_frames()timeToFrames() / time_to_frames()Seconds → frame

Higher-level analyzers not in librosa

librosa's strength is low-level DSP — higher-level music understanding is typically delegated to separate tools. libsonare bundles those higher-level tasks in one library.

libsonareDescription
KeyAnalyzerMusical key detection (Krumhansl-Schmuckler profiles)
ChordAnalyzerChord recognition (108 chord-type templates)
SectionAnalyzerSong structure analysis (intro / verse / chorus, etc.)
TimbreAnalyzerTimbre characteristics (brightness, warmth, density, …)
DynamicsAnalyzerLoudness, dynamic range, crest factor
RhythmAnalyzerTime signature, groove, syncopation, regularity

Streaming

StreamAnalyzer — progressive real-time BPM / key / chord estimates — is also libsonare-specific. librosa has no streaming API.

Function Mapping

The libsonare snippets below assume import libsonare as sonare (Python) or the named imports from @libraz/libsonare (Browser).

STFT

librosa:

python
S = librosa.stft(
    y,
    n_fft=2048,
    hop_length=512,
    win_length=None,
    window='hann',
    center=True
)

libsonare:

typescript
const result = stft(samples, sampleRate, 2048, 512);
python
result = sonare.stft(samples, sample_rate, n_fft=2048, hop_length=512)
cpp
sonare::StftConfig config;
config.n_fft = 2048;
config.hop_length = 512;
config.window = sonare::WindowType::Hann;
config.center = true;

auto spec = sonare::Spectrogram::compute(audio, config);
bash
# No CLI command dumps the full STFT matrix; use spectral summaries from the CLI.
sonare spectral song.wav --n-fft 2048 --hop-length 512 --json

Mel Spectrogram

librosa:

python
mel = librosa.feature.melspectrogram(
    y=y, sr=sr,
    n_fft=2048, hop_length=512,
    n_mels=128, fmin=0.0, fmax=None,
    htk=False, norm='slaney'
)
mel_db = librosa.power_to_db(mel, ref=np.max)

libsonare:

typescript
const mel = melSpectrogram(samples, sampleRate, 2048, 512, 128);
// mel.power - power spectrum
// mel.db - dB scale
python
mel = sonare.mel_spectrogram(samples, sample_rate, n_fft=2048, hop_length=512, n_mels=128)
# mel.power - power spectrum
# mel.db - dB scale
bash
sonare mel song.wav --n-fft 2048 --hop-length 512 --n-mels 128 --json
MEL · SPECTRALIDLE
Mel spectrogram — frequency the way we hear it

The same STFT, re-mapped onto the mel scale: fine resolution low down where the ear discriminates, coarser up high. The harmonic stack and formant bands of this vowel-like tone sit closer together than on a linear axis — the view our ears (and most ML front-ends) actually use.

MFCC

librosa:

python
mfcc = librosa.feature.mfcc(
    y=y, sr=sr, n_mfcc=13,
    n_fft=2048, hop_length=512, n_mels=128
)

libsonare:

typescript
const result = mfcc(samples, sampleRate, 2048, 512, 128, 13);
// result.coefficients - [nMfcc x nFrames]
python
result = sonare.mfcc(samples, sample_rate, n_fft=2048, hop_length=512, n_mels=128, n_mfcc=13)
# result.coefficients - [n_mfcc x n_frames]
bash
# No CLI command dumps MFCC coefficients; use inverse MFCC preview from the source-built C++ CLI.
sonare mfcc-to-audio song.wav --n-fft 2048 --hop-length 512 --n-mels 128 --n-mfcc 13 -o mfcc-preview.wav
What is DCT-II, and why does MFCC use it?

MFCC stands for Mel-Frequency Cepstral Coefficients. After building a mel spectrogram and taking its log, libsonare applies a DCT-II (type-II Discrete Cosine Transform) along the frequency axis. The DCT is like an FFT that produces only real numbers and packs most of the energy into the first few coefficients, so keeping just n_mfcc of them gives a compact summary of the spectral shape (the "cepstrum") while discarding fine detail. That is why MFCCs are a small, ML-friendly timbre descriptor rather than a full spectrum.

HPSS

librosa:

python
y_harm, y_perc = librosa.effects.hpss(y, kernel_size=31)

libsonare:

typescript
const result = hpss(samples, sampleRate, 31, 31);
// result.harmonic
// result.percussive
python
result = sonare.hpss(samples, sample_rate, kernel_harmonic=31, kernel_percussive=31)
# result.harmonic
# result.percussive
bash
sonare hpss song.wav --json

Beat Tracking

librosa:

python
tempo, beats = librosa.beat.beat_track(y=y, sr=sr, start_bpm=120)
beat_times = librosa.frames_to_time(beats, sr=sr, hop_length=512)

libsonare:

typescript
const bpm = detectBpm(samples, sampleRate);
const beats = detectBeats(samples, sampleRate);  // Already in seconds
python
bpm = sonare.detect_bpm(samples, sample_rate)
beats = sonare.detect_beats(samples, sample_rate)  # Already in seconds
bash
sonare bpm song.wav --json
sonare beats song.wav --json

Default Parameters

Parameterlibrosalibsonare
sr2205022050 (default for all standalone analysis/feature/loudness helpers; pass the buffer's true rate for lufs/true-peak. Audio/Spectrogram class methods use the loaded file's rate.)
n_fft20482048
hop_length512512
win_lengthn_fftn_fft
window'hann'Hann
centerTruetrue
n_mels128128
fmin0.00.0 for mel/inverse helpers; pitch helpers default to 65.0 Hz; CQT/VQT default to C1 (32.70319566 Hz)
fmaxsr/2sr/2
n_mfcc2020 for the standalone mfcc() / Audio.mfcc() across all bindings; the analyzer/timbre extraction path uses 13
n_chroma1212

When matching librosa output, pass parameters explicitly instead of relying on wrapper defaults.

Mel Scale Formulas

Slaney (librosa default, libsonare default)

For f < 1000 Hz:  mel = 3 * f / 200
For f >= 1000 Hz: mel = 15 + 27 * log10(f / 1000) / log10(6.4)

HTK

mel = 2595 * log10(1 + f / 700)

libsonare provides Slaney conversion helpers publicly and supports HTK Mel filterbank generation through Mel configuration in the C++ core:

typescript
const melSlaney = hzToMel(hz);     // Slaney (default)
python
mel_slaney = sonare.hz_to_mel(hz)  # Slaney (default)
bash
# No direct Hz/Mel conversion CLI; this conversion is exposed through the APIs above.

Reference Tolerance Guidelines

These are practical comparison thresholds seen in libsonare's reference tests against librosa 0.11. They are regression-test guidance, not guaranteed accuracy bounds for arbitrary audio.

FeatureToleranceNotes
STFT magnitude~0.1% for aggregate values; individual bins can differ moreFloat32 vs float64, windowing, and near-zero bins
Mel filterbank~0.1% for sums/max values in reference testsFilterbank generation
MFCC~10-15% for mean/std reference checksDCT and log-Mel details
Chroma< 5%Pitch mapping
BPMwithin a few percent; half/double-tempo cases can occurTempo-candidate differences
Beat timestens of ms to ~80 ms in impulse-train testsPhase alignment

Known Differences

1. Resampling

  • librosa 0.11: Defaults to soxr_hq and supports multiple resamplers
  • libsonare: Uses r8brain-free (24-bit quality)

This can slightly change downstream features after resampling.

2. CQT

  • librosa: CQT/VQT APIs with librosa-specific parameterization
  • libsonare: CQT and VQT implementations with libsonare-specific APIs

3. Window Normalization

  • librosa: Normalizes window for COLA
  • libsonare: Uses raw window values

Expect small amplitude differences after an inverse STFT. If your workflow depends on the absolute output level matching librosa, rescale the reconstructed audio (for example, to the source's peak or RMS) after reconstruction.

What is COLA?

COLA stands for Constant Overlap-Add. When you reconstruct audio from an STFT, the overlapping windowed frames are added back together. If those overlapping windows sum to a constant value at every sample position, the reconstruction has even gain everywhere — that is the COLA condition. librosa normalizes the window so this holds exactly; libsonare uses the raw window, so the summed level can differ slightly. It only matters if you depend on the absolute output level after an inverse STFT.

Migration Guide

Python to TypeScript

librosa.load() defaults to mono audio at 22050 Hz. In the browser, decode the file first, downmix to mono if needed, and resample explicitly if you want to match that behavior.

Before (Python):

python
import librosa

y, sr = librosa.load('audio.mp3', sr=22050)
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
print(f"BPM: {tempo}")

After (TypeScript):

typescript
import { init, detectBpm, resample } from '@libraz/libsonare';

await init();

// Get audio from AudioContext
// Downmix stereo to mono before analysis if needed.
const samples = audioBuffer.getChannelData(0);

// Optionally resample to 22050
const resampled = resample(samples, audioBuffer.sampleRate, 22050);

const bpm = detectBpm(resampled, 22050);
console.log(`BPM: ${bpm}`);

Python to C++

Audio::from_file() keeps the decoded file sample rate. Resample explicitly if you want to match librosa.load(..., sr=22050).

Before (Python):

python
import librosa

y, sr = librosa.load('audio.mp3')
chroma = librosa.feature.chroma_stft(y=y, sr=sr)

After (C++):

cpp
#include <sonare.h>

auto audio = sonare::Audio::from_file("audio.mp3");
auto chroma = sonare::Chroma::compute(audio);
auto energy = chroma.mean_energy();

Performance Comparison

The short version: core spectral transforms (STFT, Mel, MFCC) are roughly tied with librosa, while iterative algorithms (HPSS, pYIN) and the full analysis pipeline are dramatically faster. The per-feature numbers, hardware details, methodology, and reproduction steps live in one place — see Benchmarks — so this page doesn't carry a copy that can drift out of date.

WebAssembly is generally slower than native C++; exact performance depends on the feature, browser, input length, and build settings. See Benchmarks for measured results.