librosa Compatibility
This document describes how libsonare functions correspond to Python's librosa library.
Overview
libsonare aims to provide functionality similar to librosa while being optimized for C++ and WebAssembly environments. Most core features use the same algorithms with compatible default parameters.
Feature Comparison
Supported Features
| librosa | libsonare | Notes |
|---|---|---|
librosa.load() | Audio::from_file() | WAV, MP3 support |
librosa.resample() | resample() | Uses r8brain |
librosa.stft() | Spectrogram::compute() | Full compatibility |
librosa.istft() | Spectrogram::to_audio() | OLA reconstruction |
librosa.feature.melspectrogram() | MelSpectrogram::compute() | Slaney normalization |
librosa.feature.mfcc() | MelSpectrogram::mfcc() | DCT-II, liftering |
librosa.feature.chroma_stft() | Chroma::compute() | STFT-based |
librosa.onset.onset_strength() | compute_onset_strength() | Spectral flux |
librosa.beat.beat_track() | BeatAnalyzer | DP-based |
librosa.beat.tempo() | BpmAnalyzer | Tempogram |
librosa.effects.hpss() | hpss() | Median filtering |
librosa.effects.time_stretch() | time_stretch() | Phase vocoder |
librosa.effects.pitch_shift() | pitch_shift() | WSOLA-like |
Features Not in librosa
| libsonare | Description |
|---|---|
KeyAnalyzer | Musical key detection (Krumhansl-Schmuckler) |
ChordAnalyzer | Chord recognition (template matching) |
SectionAnalyzer | Song structure analysis |
TimbreAnalyzer | Timbre characteristics |
DynamicsAnalyzer | Loudness and dynamics |
RhythmAnalyzer | Time signature, groove |
Function Mapping
STFT
librosa:
S = librosa.stft(
y,
n_fft=2048,
hop_length=512,
win_length=None,
window='hann',
center=True
)libsonare (C++):
sonare::StftConfig config;
config.n_fft = 2048;
config.hop_length = 512;
config.window = sonare::WindowType::Hann;
config.center = true;
auto spec = sonare::Spectrogram::compute(audio, config);libsonare (JS):
const result = stft(samples, sampleRate, 2048, 512);Mel Spectrogram
librosa:
mel = librosa.feature.melspectrogram(
y=y, sr=sr,
n_fft=2048, hop_length=512,
n_mels=128, fmin=0.0, fmax=None,
htk=False, norm='slaney'
)
mel_db = librosa.power_to_db(mel, ref=np.max)libsonare:
const mel = melSpectrogram(samples, sampleRate, 2048, 512, 128);
// mel.power - power spectrum
// mel.db - dB scaleMFCC
librosa:
mfcc = librosa.feature.mfcc(
y=y, sr=sr, n_mfcc=13,
n_fft=2048, hop_length=512, n_mels=128
)libsonare:
const result = mfcc(samples, sampleRate, 2048, 512, 128, 13);
// result.coefficients - [nMfcc x nFrames]HPSS
librosa:
y_harm, y_perc = librosa.effects.hpss(y, kernel_size=31)libsonare:
const result = hpss(samples, sampleRate, 31, 31);
// result.harmonic
// result.percussiveBeat Tracking
librosa:
tempo, beats = librosa.beat.beat_track(y=y, sr=sr, start_bpm=120)
beat_times = librosa.frames_to_time(beats, sr=sr, hop_length=512)libsonare:
const bpm = detectBpm(samples, sampleRate);
const beats = detectBeats(samples, sampleRate); // Already in secondsDefault Parameters
| Parameter | librosa | libsonare |
|---|---|---|
sr | 22050 | User-provided |
n_fft | 2048 | 2048 |
hop_length | 512 | 512 |
win_length | n_fft | n_fft |
window | 'hann' | Hann |
center | True | true |
n_mels | 128 | 128 |
fmin | 0.0 | 0.0 |
fmax | sr/2 | sr/2 |
n_mfcc | 20 | 13 |
n_chroma | 12 | 12 |
Mel Scale Formulas
Slaney (librosa default, libsonare default)
For f < 1000 Hz: mel = 3 * f / 200
For f >= 1000 Hz: mel = 15 + 27 * log10(f / 1000) / log10(6.4)HTK
mel = 2595 * log10(1 + f / 700)libsonare provides both:
const melSlaney = hzToMel(hz); // Slaney (default)
// HTK available in C++ APITolerance Guidelines
| Feature | Tolerance | Notes |
|---|---|---|
| STFT magnitude | < 1e-6 | Floating point precision |
| Mel spectrogram | < 1% | Filterbank differences |
| MFCC | < 2% | DCT normalization |
| Chroma | < 5% | Pitch mapping |
| BPM | ±2 BPM | Algorithm differences |
| Beat times | ±50ms | Phase alignment |
Known Differences
1. Resampling
- librosa: Uses
resampy(Kaiser best) - libsonare: Uses
r8brain-free(24-bit quality)
Minimal impact on downstream features.
2. CQT
- librosa: Full CQT implementation
- libsonare: Full CQT and VQT implementation
3. Window Normalization
- librosa: Normalizes window for COLA
- libsonare: Uses raw window values
Use normalize() to correct amplitude differences in iSTFT.
Migration Guide
Python to TypeScript
Before (Python):
import librosa
y, sr = librosa.load('audio.mp3', sr=22050)
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
print(f"BPM: {tempo}")After (TypeScript):
import { init, detectBpm, resample } from '@libraz/libsonare';
await init();
// Get audio from AudioContext
const samples = audioBuffer.getChannelData(0);
// Optionally resample to 22050
const resampled = resample(samples, audioBuffer.sampleRate, 22050);
const bpm = detectBpm(resampled, 22050);
console.log(`BPM: ${bpm}`);Python to C++
Before (Python):
import librosa
y, sr = librosa.load('audio.mp3')
chroma = librosa.feature.chroma_stft(y=y, sr=sr)After (C++):
#include <sonare/sonare.h>
auto audio = sonare::Audio::from_file("audio.mp3");
auto chroma = sonare::Chroma::compute(audio);
auto energy = chroma.mean_energy();Performance Comparison
| Operation | librosa (Python) | libsonare (C++) | Speedup |
|---|---|---|---|
| STFT | 13ms | 14ms | tied |
| Mel spectrogram | 20ms | 23ms | tied |
| MFCC | 22ms | 24ms | tied |
| Beat tracking | 36ms | 24ms | 1.5x |
| Chroma | 45ms | 15ms | 2.9x |
| HPSS | 1,762ms | 89ms | 19.7x |
| pYIN | 5,825ms | 474ms | 12.3x |
| Full analysis | 36.4s | 0.67s | ~54x |
Benchmarked on Apple M5 Max (16 cores, 128GB), 73-second WAV at 44100 Hz stereo, all per-feature calls measured standalone from raw audio. See Benchmarks for full methodology and reproduction steps.
WebAssembly is ~2-3x slower than native C++, but still faster than Python.