Python API

libsonare provides Python bindings using cffi for high-performance audio analysis on desktop platforms. Pre-built wheels are available on PyPI for Linux (x86_64, aarch64) and macOS (Apple Silicon).

Installation

Requires Python 3.11 or later (3.11, 3.12, 3.13).

bash

pip install libsonare

This also installs the sonare CLI command. See CLI Reference for details.

Building from Source (alternative)

If pre-built wheels are not available for your platform, you can build from source:

bash

git clone https://github.com/libraz/libsonare.git
cd libsonare
cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED=ON
cmake --build build -j

cd bindings/python
pip install -e .

Requirements for building from source:

Python 3.11+
CMake 3.16+
C++17 compiler (GCC 9+, Clang 10+, MSVC 2019+)

Quick Start

python

from libsonare import Audio, analyze, detect_bpm, detect_key, detect_beats

# Load audio from file
audio = Audio.from_file("music.mp3")

# Individual analysis
bpm = detect_bpm(audio.data, audio.sample_rate)
key = detect_key(audio.data, audio.sample_rate)
beats = detect_beats(audio.data, audio.sample_rate)

# Full analysis
result = analyze(audio.data, audio.sample_rate)
print(f"BPM: {result.bpm} ({result.bpm_confidence:.0%})")
print(f"Key: {result.key}")
print(f"Time Signature: {result.time_signature}")
print(f"Beats: {len(result.beat_times)} detected")

Audio Effects

python

from libsonare import Audio

audio = Audio.from_file("music.mp3")

# Harmonic-Percussive Source Separation
hpss_result = audio.hpss()
harmonic = audio.harmonic()
percussive = audio.percussive()

# Time stretch / pitch shift
stretched = audio.time_stretch(rate=1.5)       # 1.5x speed
shifted = audio.pitch_shift(semitones=2.0)     # Up 2 semitones

# Normalize and trim silence
normalized = audio.normalize(target_db=-3.0)
trimmed = audio.trim(threshold_db=-60.0)

# Resample
resampled = audio.resample(target_sr=44100)

Feature Extraction

python

from libsonare import Audio

audio = Audio.from_file("music.mp3")

# Spectrogram features
stft_result = audio.stft(n_fft=2048, hop_length=512)
mel = audio.mel_spectrogram(n_fft=2048, hop_length=512, n_mels=128)
mfcc = audio.mfcc(n_fft=2048, hop_length=512, n_mels=128, n_mfcc=20)
chroma = audio.chroma(n_fft=2048, hop_length=512)

# Spectral features
centroid = audio.spectral_centroid()
bandwidth = audio.spectral_bandwidth()
rolloff = audio.spectral_rolloff(roll_percent=0.85)
flatness = audio.spectral_flatness()
zcr = audio.zero_crossing_rate()
rms = audio.rms_energy()

# Pitch detection
pitch_yin = audio.pitch_yin(fmin=65.0, fmax=2093.0)
pitch_pyin = audio.pitch_pyin(fmin=65.0, fmax=2093.0)
print(f"Median F0: {pitch_pyin.median_f0:.1f} Hz")

Unit Conversions

python

from libsonare import hz_to_mel, mel_to_hz, hz_to_midi, midi_to_hz
from libsonare import hz_to_note, note_to_hz, frames_to_time, time_to_frames

hz_to_mel(440.0)       # → Mel scale value
mel_to_hz(549.64)      # → Hz
hz_to_midi(440.0)      # → 69.0
midi_to_hz(69.0)       # → 440.0
hz_to_note(440.0)      # → "A4"
note_to_hz("A4")       # → 440.0

frames_to_time(100, sr=22050, hop_length=512)  # → seconds
time_to_frames(2.32, sr=22050, hop_length=512) # → frame index

API Reference

Audio

Method	Description
`Audio.from_file(path)`	Load WAV/MP3 from disk
`Audio.from_buffer(data, sample_rate)`	Create from float samples
`Audio.from_memory(data)`	Decode from binary WAV/MP3 in memory
`audio.data`	Raw float samples
`audio.sample_rate`	Sample rate (Hz)
`audio.duration`	Duration (seconds)
`audio.length`	Number of samples
`audio.close()`	Free native memory

Supports context manager for automatic cleanup:

python

with Audio.from_file("music.mp3") as audio:
    result = audio.analyze()

Analysis Functions

Function	Return Type	Description
`detect_bpm(samples, sample_rate)`	`float`	Tempo in BPM
`detect_key(samples, sample_rate)`	`Key`	Root, mode, confidence
`detect_beats(samples, sample_rate)`	`list[float]`	Beat timestamps (seconds)
`detect_onsets(samples, sample_rate)`	`list[float]`	Onset timestamps (seconds)
`analyze(samples, sample_rate)`	`AnalysisResult`	Full analysis
`version()`	`str`	Library version

All functions also available as Audio instance methods (e.g., audio.detect_bpm()).

Effects Functions

Function	Return Type	Description
`hpss(samples, sr, kernel_harmonic?, kernel_percussive?)`	`HpssResult`	Harmonic-Percussive Source Separation
`harmonic(samples, sr)`	`list[float]`	Extract harmonic component
`percussive(samples, sr)`	`list[float]`	Extract percussive component
`time_stretch(samples, sr, rate)`	`list[float]`	Time-stretch without pitch change
`pitch_shift(samples, sr, semitones)`	`list[float]`	Pitch-shift without tempo change
`normalize(samples, sr, target_db?)`	`list[float]`	Normalize to target dB (default: -3.0)
`trim(samples, sr, threshold_db?)`	`list[float]`	Trim silence (default: -60.0 dB)
`resample(samples, src_sr, target_sr)`	`list[float]`	Resample to target sample rate

Feature Extraction Functions

Function	Return Type	Description
`stft(samples, sr, n_fft?, hop_length?)`	`StftResult`	Short-Time Fourier Transform
`stft_db(samples, sr, n_fft?, hop_length?)`	`tuple`	STFT in decibels
`mel_spectrogram(samples, sr, n_fft?, hop_length?, n_mels?)`	`MelSpectrogramResult`	Mel spectrogram
`mfcc(samples, sr, n_fft?, hop_length?, n_mels?, n_mfcc?)`	`MfccResult`	Mel-Frequency Cepstral Coefficients
`chroma(samples, sr, n_fft?, hop_length?)`	`ChromaResult`	Chroma features (pitch class distribution)
`spectral_centroid(samples, sr, n_fft?, hop_length?)`	`list[float]`	Spectral centroid per frame
`spectral_bandwidth(samples, sr, n_fft?, hop_length?)`	`list[float]`	Spectral bandwidth per frame
`spectral_rolloff(samples, sr, n_fft?, hop_length?, roll_percent?)`	`list[float]`	Spectral rolloff per frame
`spectral_flatness(samples, sr, n_fft?, hop_length?)`	`list[float]`	Spectral flatness per frame
`zero_crossing_rate(samples, sr, frame_length?, hop_length?)`	`list[float]`	Zero-crossing rate per frame
`rms_energy(samples, sr, frame_length?, hop_length?)`	`list[float]`	RMS energy per frame
`pitch_yin(samples, sr, frame_length?, hop_length?, fmin?, fmax?, threshold?)`	`PitchResult`	YIN pitch estimation
`pitch_pyin(samples, sr, frame_length?, hop_length?, fmin?, fmax?, threshold?)`	`PitchResult`	pYIN pitch estimation

Default parameters: n_fft=2048, hop_length=512, n_mels=128, n_mfcc=20, fmin=65.0, fmax=2093.0, threshold=0.3, roll_percent=0.85.

Conversion Functions

Function	Description
`hz_to_mel(hz)`	Hertz → Mel scale
`mel_to_hz(mel)`	Mel scale → Hertz
`hz_to_midi(hz)`	Hertz → MIDI note number
`midi_to_hz(midi)`	MIDI note number → Hertz
`hz_to_note(hz)`	Hertz → note name (e.g., "A4")
`note_to_hz(note)`	Note name → Hertz
`frames_to_time(frames, sr, hop_length)`	Frame index → seconds
`time_to_frames(time, sr, hop_length)`	Seconds → frame index

Types

python

class PitchClass(IntEnum):
    C, Cs, D, Ds, E, F, Fs, G, Gs, A, As, B

class Mode(IntEnum):
    MAJOR = 0
    MINOR = 1

@dataclass(frozen=True)
class Key:
    root: PitchClass
    mode: Mode
    confidence: float

@dataclass(frozen=True)
class TimeSignature:
    numerator: int
    denominator: int
    confidence: float

@dataclass(frozen=True)
class AnalysisResult:
    bpm: float
    bpm_confidence: float
    key: Key
    time_signature: TimeSignature
    beat_times: list[float]

@dataclass(frozen=True)
class HpssResult:
    harmonic: list[float]
    percussive: list[float]
    length: int
    sample_rate: int

@dataclass(frozen=True)
class StftResult:
    n_bins: int
    n_frames: int
    n_fft: int
    hop_length: int
    sample_rate: int
    magnitude: list[float]   # n_bins × n_frames, row-major
    power: list[float]       # n_bins × n_frames, row-major

@dataclass(frozen=True)
class MelSpectrogramResult:
    n_mels: int
    n_frames: int
    sample_rate: int
    hop_length: int
    power: list[float]       # n_mels × n_frames, row-major
    db: list[float]          # n_mels × n_frames, row-major

@dataclass(frozen=True)
class MfccResult:
    n_mfcc: int
    n_frames: int
    coefficients: list[float]  # n_mfcc × n_frames, row-major

@dataclass(frozen=True)
class ChromaResult:
    n_chroma: int
    n_frames: int
    sample_rate: int
    hop_length: int
    features: list[float]    # n_chroma × n_frames, row-major
    mean_energy: list[float] # n_chroma values

@dataclass(frozen=True)
class PitchResult:
    n_frames: int
    f0: list[float]          # Fundamental frequency per frame (Hz)
    voiced_prob: list[float] # Voicing probability per frame (0–1)
    voiced_flag: list[bool]  # Voiced/unvoiced decision per frame
    median_f0: float
    mean_f0: float

Python API ​

Installation ​

Building from Source (alternative) ​

Quick Start ​

Audio Effects ​

Feature Extraction ​

Unit Conversions ​

API Reference ​

Audio ​

Analysis Functions ​

Effects Functions ​

Feature Extraction Functions ​

Conversion Functions ​

Types ​

Python API

Installation

Building from Source (alternative)

Quick Start

Audio Effects

Feature Extraction

Unit Conversions

API Reference

Audio

Analysis Functions

Effects Functions

Feature Extraction Functions

Conversion Functions

Types